Online Courses
Free Tutorials  Go to Your University  Placement Preparation 
Artificial Intelligence(AI) & Machine Learning(ML) Training in Jaipur
Online Training - Youtube Live Class Link
0 like 0 dislike
in Artificial Intelligence(AI) & Machine Learning by Goeduhub's Expert (2.1k points)
edited by
In this article, we will learn what is word2vec and how it is important in text preprocessing to convert a text into a vector to perform various machine learning and deep learning applications. And finally Implementation of word2vec in  python.

Goeduhub's Online Courses @Udemy

For Indian Students- INR 570/- || For International Students- $12.99/-


Course Name

Apply Coupon


Tensorflow 2 & Keras:Deep Learning & Artificial Intelligence

Apply Coupon


Computer Vision with OpenCV | Deep Learning CNN Projects

Apply Coupon


Complete Machine Learning & Data Science with Python Apply Coupon


Natural Language Processing-NLP with Deep Learning in Python Apply Coupon


Computer Vision OpenCV Python | YOLO| Deep Learning in Colab Apply Coupon


Complete Python Programming from scratch with Projects Apply Coupon

1 Answer

0 like 0 dislike
by Goeduhub's Expert (2.1k points)
edited by
Best answer

To start with word2vec, first, we need to understand, what is most important in text pre-processing, in language translation in text generation, etc...

As we noticed the first thing that comes to our mind, if it's related to text or natural language is the text pre-processing or we can say converting a text into the numerical form or vector to use it further. 

We, humans, can think of a language directly, but what about a machine, a machine only understands numbers not characters. So, natural language data in a machine depends on how we convert a character into a vector. And for this our researchers trying hard to develop a better technique and to improve an existing one. 

We already discussed word embedding and different word embedding techniques or numerical representation of words, that is

Word Embedding: Converting a text into a vector (numerical representation of a text) 

CountVector, Tf-IDF , One-hot-encoding, BOW (Bag of words) ...

What is the problem with these techniques: 

In TF-IDF (Term-frequency and inverse document frequency) and BOW, we convert a text into a vector where all values in a vector are zero except the index of a particular word, or we count the frequency of a particular word at the respective index in a text.

 And we get a sparse matrix with lots of zero which does not contain semantic information about the text and also causes overfitting because of lots of sparse vectors in a huge amount of data. 

A Semantic information is a logical structure of sentences to identify the most relevant elements in text to understand it. For example Grammar and order of sentences in the English language.


In this specific model, each word in a text is represented as a vector of a fixed dimension instead of based on the amount of data.

Word2vec is not a single algorithm but a combination of model architectures and optimizations that can be used to learn word embeddings from large datasets. Which preserved semantic information and relation between different words in a text.

For example, king-man+women = queen, the relation  between words is truly magical  which a word2vec learns through a huge amount of data.

How this happens, actually word2vec uses neural networks to generate word embedding of a text which finds out similar contexts to have similar embeddings.

For example; The kid play cricket in the street 

                       The child play cricket in the  street 

In this case, the child and kid have similar vectors because of similar context.

word2vec comprises two techniques (algorithms) that is CBOW(Continuous bag of words) and the Skip-gram model. 

CBOW (Continuous Bag of word)

 A CBOW algorithm predicts the middle word based on surrounding context words.

For example 

The quick brown fox over the lazy dog --predict -- jump 

The order of words in this context not that important the matter here is words.

Working of CBOW

The working process of CBOW involves one-hot encoding of each word in a sentence. 

For example, we have a sentence "Python training Goeduhub Technologies "


In the above diagram: First, we created a one-hot vector of each word in a sentence.

After that, we consider the first three words that are "Python training Goeduhub" Where we are trying to predict the middle word based on context, that is "Python and Goeduhub". 

We get a Predicted output of the word "training" and we will try to match this output to our actual output (one-hot vector of "training") and will update weights for good accuracy or a good match.

We do this process, making a continuous bag of word vectors that is continuous bag of the word (CBOW). For the  above example First BOW - Python training Goeduhub  second - training Goeduhub Technologies like this.


A skip-gram algorithm predicts the context based on the surrounding middle word.

For example,

jump -- predict-- The quick brown fox over the lazy dog.


In the above diagram, we are predicting the context based on the center word. which  is predicting Goeduhub, and Python based on Center word "training".

In the end, the softmax function is used to maximize the probability of predicting context. For a sequence of words w1, w2, ... wT, the objective can be written as the average log probability.


where c is the size of the training context. The basic skip-gram formulation defines this probability using the softmax function.


where v and v' are target and context vector representations of words and W is vocabulary size.

Note: Word2Vec comprises these two algorithms to give a fixed dimension numerical representation of words in a context and relation between words.

Our Mentors(For AI-ML)

Sharda Godara Chaudhary

Mrs. Sharda Godara Chaudhary

An alumna of MNIT-Jaipur and ACCENTURE, Pune


Ms. Nisha

An alumna of IIT-BHU


About Us | Contact Us || Terms & Conditions | Privacy Policy || Youtube Channel || Telegram Channel © Social::   |  |