This article describes about what are word embeddings and how they are created.
What is Word Embedding
Word embedding is one of the technique to vectorize text, basically to convert text into numbers to fed into models, and the idea of word embeddings is to vectorize text in such a way that it captures similarities between the words. For instance, a vector representation of Queen and Women will be similar and vector representation of King and man will be similar and the reason behind is to create vector meaningful rather than based on number of occurrences that we have in one hot encoding or tdf-idf.
How Word Embeddings are created
Word Embeddings are nothing but weights of neural network trained with bunch of text, we can explore the steps in detail.
Consider a customer review dataset on an e-commerce website, where customer updates Positive or Negative. So the reviews are the features and Positive or Negative are the labels.
- Identify number of unique words (V) in the dataset which we call Vocabulary. Let say we identified 50 unique words.
- In Word embeddings, each word will have same dimension, so we need to decide dimension (D) for the word. Let say 3
- Now create a matrix E with random numbers of the shape D * V i.e in our case 4 by 50 matrix which contains 3 rows and 50 columns ; where each column represent unique word in 3 dimensions.
4. Create one hot encoded vector (H) for our vocabulary V as below which will be V * V (50 * 50 )matrix having “1” for the same word and zero for other words.
5. We have all the setup now, the next step is to feed the data into the model-a) Take the first datapoint of customer review dataset
b) Create a for loop which iterates over all the one hot encoded words in the first review and multiple with matrix E; this will basically select the respective random vector that belongs to the word present in the matrix E, suppose the current word is “the” we will get the output of the multiplication as [0.21 0.62 0.32] , after we iterate all the words in the first review , flatten all the vectors.
i.e Let say the first review as only two words “the Horse” then the resultant vector will be [0.21 0.62 0.32 0.12 0.87 0.53] now use this as feature to train the model with label as the 1 or 0 based on positive or negative review.
6. As a result of backpropagation during training, all the weights of matrix E will get updated to optimize the model , so as outcome of training, the randomly created values for the words gets updated with valid values and now we can use this matrix E wherever necessary instead of One hot vectors or TDF-IDF Vectors and the matrix E is now called as Word Embedding Matrix.
Banner Image and next image from : https://blogs.mathworks.com/loren/2017/09/21/math-with-words-word-embeddings-with-matlab-and-text-analytics-toolbox/