NLP Pipeline

3 min readMar 29, 2022

This is the second Blog on NLP, and today we are going to see the NLP pipeline.

First of all, we will see what is NLP pipeline?

As per definition, NLP is a set of steps followed to build an end-to-end NLP software. In simple terms, To create a complete NLP software, we follow a series of steps and that is called the NLP pipeline.

NLP software consists of the following steps :

1] Data Acquisition: In any deep learning, machine learning-based pipeline, we need data, without data we can not create an NLP pipeline or NLP software.

2] Text Preparation: When we acquire any data, that data has some kind of messy or some problem. So for that, we prepare the data for the next steps.

What do we do for the data preparation?

a] Text Cleanup: In this method, we focus on basic mistakes like spelling mistakes, removing emojis.

b] Basic Preprocessing: Here we do tokenization (separate text word by word), remove stopwords, remove punctuation.

c] Advance Preprocessing: In the advance preprocessing we basically do POS tagging(Part of Speech), chunking, or coreference resolution.

3] Feature Engineering: If we learn machine learning then we will be familiar with feature engineering. In simple terms, when we are trying to apply a machine learning algorithm, we need data in the correct format. For example, we take we convert text into numbers because we know that the machine learning model works on numbers. Here we use some techniques like Bag of words, tf-idf (term frequency-inverse document frequency), Word2Vec, etc.

4] Modelling: Modelling is the step where we apply actual algorithms. In the modeling, we do model building and model evaluation.

a] Model Building: The model building is basically a process of applying algorithms to your data.

b]Model Evaluation: In model evaluation, we check how our model is performing and we evaluate the model, compare it with other models and from that, we select the best model performing.

5] Deployment: In the deployment, there are three stages.

a]Deployment: Basically, here we deployed our software on some cloud service like AWS, GCP, Azure, etc.

b] Monitoring: In monitoring, when software is running on some cloud service or on our server, we constantly monitor for any kind of changes.

c] Model Update: Sometime, some kind of situation comes where we update our current model with an advanced model or improved the model.

Notes:

1]In Above NLP pipeline which has five steps is not universal. A universal pipeline means these five steps are not going to use for all the NLP projects. It can be possible that we can use this pipeline in a different ways for different projects. This pipeline we can use when we are doing projects like Text Classification and sentiment analysis but when we are creating ChatBot or text summarization then there is a possibility that the pipeline will be a little bit different.

2] Deep learning pipelines are slightly different if we compare them with NLP pipelines. NLP pipelines are based mostly on machine learning pipelines.

3]NLP Pipeline is non-linear. if we take an example i.e. Like the time we realize after deployment that the performance is not much good as per requirement then we can go back to the feature engineering step or the modeling step where we can change the algorithm as per requirement. So that we can build the best product.

NLP Pipeline

Notes:

Written by Sunilkumar Prajapati