Any modality starts with simple codes. Text modality starts with one-character word codes. Visual modality starts with position codes. Let's have a look at graph presentation of most primitive codes.

When a modality is empty we see the **root** (1) and the starting clusters of **zero** (2,3) and **one** (5,6).

Suppose, one needs to encode 0, as it's the vertical position of a dot on an image. The graph will look like this.

The **vertex 8** was added to the cluster representing zero. This **vertex 8** means position 0 and can be used in other layers of a multimodal graph.

Now let's compare the last graph to a graph which contains position 1 in an image.

Both graphs are symmetric.

Let's consider a number that consists of 2 digits in binary format. You see the graph for 3. The actual vertex with meaning 3 has ID 11 here.

Now a number with 3 digits in binary format - 4. It's vertex is 14 below.