# Question

I have a set of sets. Each set is unbounded.

I would like to find a methodology to encode (vectorize) each subset.

I am more specifically interested in memory efficient solutions.

# Example

``````Let `X` be the superset and `A` and `B` be subsets.
``````

$$X = {A, B}$$
$$A = {1,2,3}$$
$$B = {2,3,4}$$

``````A simple methodology to encode would be to use one-hot encoding:
``````

$$vec A = [1, 1, 1, 0]$$
$$vec B = [0, 1, 1, 1]$$

# Issue

Now my issue is when the subsets are large,
one-hot encoding can be unrealistic.
(10-30 thousand Sparse vector of unique values).

Any suggestions on encoding the inputs into a more dense vector would be appreciated.

Get this bounty!!!

This site uses Akismet to reduce spam. Learn how your comment data is processed.