the process does nothing that applies vector arithmetic
The training process has nothing to do with vector arithmetic, but when the arrays are produced, it turns out they have pretty nice properties, so that one can think of "word linear space".
For example, what words have embeddings closest to a given word in this space?
Put it differently, words with similar meaning form a cloud. Here's a 2-D t-SNE representation:
Another example, the distance between "man" and "woman" is very close to the distance between "uncle" and "aunt":
As a result, you have pretty much reasonable arithmetic:
W("woman") ? W("man") ? W("aunt") ? W("uncle")
W("woman") ? W("man") ? W("queen") ? W("king")
So it's not far fetched to call them vectors. All pictures are from this wonderful post that I very much recommend to read.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…