ABOUT NLP:

Before discussing about the Natural Language Processing (NLP), it is good idea to understand what exactly the natural language is?

In simple words, Natural Language is a day to day language which we use to communicate with each other. These languages are not developed with a plan like constructed languages (such as programming languages like C, C++, Java etc.) but developed slowly as they are being used and took ages to take a shape what we see now in current world. The Natural Language could be a sign language or speech or any other method of communication using which we can convey the thoughts or views to others and as a result of the same, there are around 6000+ languages are spoken in world today. Isn't it amazing?

To understand more about the importance of Natural Language, lets go to the era where the human was not developed and living like other animals on the planet. Now let's take a pause and try to answer the question that, why human are the only species in the planet who developed like what we see now and why not other species? Possibly, you may come up with multiple reasons of that, but we cannot ignore the role of communication and language in it. One of the major factors which helped the human to develop is that, the human was able to successfully develop some sort of medium to communicate with each other, to convey their views and ideas with other and which helped them to plan and work in group. One more factor we can think is, the evolution of machines. Don't you think, from the time when the machines are started to being used by human, their growth got wings? The work which used to take days, now can be done in fraction of seconds. For Example: In earlier time, to send a message from one place to other, one person used to travel physically with the message but now that can be done in few seconds using computers and mobile phones irrespective of the distance.

Now apart from this, we can think of a next evolution which will take the development of human to a new height and that will be the communication with machines in the same way how we communicate with each other. Let's imagine the world where machines and human are exchanging their ideas together in same way how we use to do now a days with each other. Won't it be great if we can verbally instruct a machine to perform some task instead of programming them all day? Or the scenario when ATM machines can guide an elderly person (who is not familiar with usage of ATM machine) about money withdrawal in the same language the person can understand. Won't it be a time when human development will be on its best? If you believe the same, the good news is that, the Natural Language Processing can make these possible.

Now as we have some idea about the natural language, it is good time a have a glimpse on what is Natural Language Processing.

Natural language processing (NLP) is a subfield of artificial intelligence, linguistics and computer science mainly deal with the technology used for interactions between computers and human language or in simple terms providing the ability to computer to understand, process and generate the human language.

So, as the definition says, NLP is all about providing the ability to computers to listen and talk like human.

Real World Applications:

In current world, there are quite a few applications where NLP is helping the businesses to get more profits. Below are some interesting examples:

1) Contextual Advertisement: In earlier time, when we use to watch any videos on online platform, there use to be a predefined set of advertisements which was shown to everyone. The business strategy was to show the advertisement to everyone and assume that there will be some audience who really needs that advertisement and buy the product. But don't you feel these strategies are really not that effective as there is no guarantee that we will find the right audience all the time. But now the field of advertisement has been drastically changed and the business are able to target only the right audience for whom the product might be useful. Now the question comes, how exactly is it possible? How the online platform like YouTube or Facebook knows who is the best customer for a particular business and may buy the product? The answer lies in data which we produce daily. Whenever we browse something on internet, we produce some data about our likings, behaviors, interest and so many things based on which companies are able to create a profile and based on these profiles, companies decides the right audience for some product. All these would not have been possible without NLP as the developers used the NLP techniques to design the model which can predict the right audience for any product.

2) Spam Filtering and Smart Reply: Almost all the mailing applications like Gmail is equipped with Spam filters which can filter out the irrelevant or promotional mail(generally used for marketing purpose) from going to your inbox and allow only the mails which might be useful for you. As well as they are also equipped with the smart reply features which can provide you the reply suggestions based on the content of your email to save your typing time.

3) Social Media — Removing Inappropriate Content, opinion Mining: When we talk about social media, the very first name now a days comes to our mind is Facebook with roughly 2.89 billion monthly active users. Don't you think, one of biggest challenge for Facebook is to filter out the contents which is being posted on platform which may not be suitable for all the audience like Adult contents, hatred speeches etc. These types of content can be a big harm to Facebook. There is one more interesting application out of many where, by analyzing to posts on social media, the companies can extract public opinion and sentiment about some topic like who will win the next election or which player will perform best in coming tournaments etc. All these are possible now a days using NLP.

4) Search Engines: Now a days the search engines like Google have advanced so much that when you question anything to it, Google is able to fetch the result by itself and to give you the answer as a first search result itself and you don't have to go and check for it inside any search list. You can try it out by asking any question to Google by yourself. Currently, most of state-of-the-art functionalities of search engine is NLP powered.

5) Chatbots: For big companies like Zomato or Swiggy, it is very difficult to address the queries of millions of customers at same time. So, they generally use the chatbots who will try to understand your problem and try to resolve it on initial level and in case if it not possible for the chatbots to help then, it pass-on the issue to the customer executive. Imagine how much work load the chatbots can reduce on daily basis. Here, these chatbots are also developed and trained using the NLP techniques.

Common NLP Tasks:

There are so many examples are present in real would where NLP is helping the business to grow like never before. Now we have idea about some of the real-world application, lets discuss about some of the common tasks which can be done using NLP:

None
  1. Text / Document Classification: Consider you have a news article and you are not sure about the category of the text like it belongs to sports or politics or movies then it could be a challenging task for you to read the complete article and find out what category does it belongs to and imagine if you have many of these, then it will take days to do the same. Here, NLP can help you in classifying the text easily without wasting any human efforts in reading all the text and manually classifying them.
None

2) Sentiment Analysis: It is also one of the most used techniques now a days to get the sentiment of public about some product or agenda. Consider you have launched a product and there are thousands of reviews present on your product. Now, it is going to be very difficult for your team to read all the reviews one by one and find out what people feel about your product. So, in this scenario NLP can help you analyze the public sentiments and give you the overall sentiment about your product.

None

3) Information Retrieval: NLP can help you to extract the entities like dates, price, product name or any other kind information present inside a text. Currently, search engines are using this technique a lot to extract any kind of information from the web to show as a result.

None

4) Parts of Speech Tagging: The part of speech tagging is very helpful in applications like chatbots where it is important to understand the exact meaning of the queries to reply. This is not possible until the chatbot is not able to classify the part of speech like noun, verb, adjective etc. or each query. So, NLP can help is tagging each word in the query with a part of speech.

None

5) Language Detection and Machine Translation: Now a days, it is very easy to translate any text written in some language to other using translation tools like Google Translators. For doing this task, the tool first analyses the language on which the text is written and then check, on which language it needs to be translated and then translates it. So, NLP is used in all the phases of these task.

None

6) Conversational Agents: As we seen above example of chatbot which is subset of this task itself. Currently, there are two types of chatbots available i.e. text based, and speech based. The example of speech based are Siri or Cortana which is again using the NLP.

None

7) Knowledge Graph and QA System: In this task, if you have a big database of some information, then you can create a knowledge graph out of it by extracting entities from one piece of information, and link them with other related entities of other piece of information. This can be used to create a question-answering system.

None

8) Text Summarization: In this task, NLP can help you summarize a big text in few words or sentence. It understands the complete text and extract the main information from it, then creates a summary.

None

9) Topic Modelling: Topic Modelling is used in the case where we want to find out the topic of the text i.e. suppose if you have a text about cricket then by using NLP you can easily find out if the text is about IPL or T20 World cup etc.

None

10) Text Generation: As we saw the earlier example where the mail applications can provide you the quick suggestions of reply by understanding the context of your mail. This is the part of text generation task which is possible using NLP.

None

11) Spell Checking and Grammar Correction: All the word-based tools now a days are having the feature to check your sentences for grammar and provide a quick suggestion also to fix them. This is again a very good feature which is possible using NLP techniques.

None

12) Text Parsing: In this task, we can break a sentence into parts to help machine understand the meaning of the sentence which is again quite useful in making chatbot like applications.

None

13) Speech to Text: This is a very interesting task where you speak something, and it will be converted to text. Once the text is written, a reply can be formed and converted back to speech.

The scope of NLP is not limited only to these tasks but also there are many other tasks which can be done quite easily using NLP.

Approaches to NLP:

Now, let's discuss on the approaches followed till now in NLP:

1)Heuristic Methods (From 1950): Before understanding this method, we must understand the meaning of word Heuristic. Heuristic is termed as any type of technique or shortcut which we take to solve a problem quickly and efficiently (in short "Jugaad"). These methods are not always best method to be used to solve the problem but give the appropriate result.

In this method, NLP practitioners had created rule-based approaches to get the solution.

For example: Consider we are making a sentiment analysis application for reviews, so using the heuristic method, we can count the number of positive and negative words in all the reviews and based on number of positive or negative words we can conclude that the reviews are positive or negative.

None

So many of heuristic approaches has been introduced between 1950–1990 and below are few examples of most famous approaches:

i) Regular Expressions: If you are from programming background then you must have heard or worked with Regular Expressions. Regular Expression are very frequently used now a days also. The technique behind Regular Expression is, we can create a pattern of text and search all the text in document which matches this pattern.

For Example: When we do web scrapping, we get so many of HTML tags in the text which might not be useful. So, using the Regular Expression techniques, we can remove all those HTML tags very easily from the text.

ii) Wordnet: Wordnet is a kind of lexical dictionary in which we derive a relationship of each word with all other possible words in an organized way.

For example: Consider the word RUN and JOG, the relationship between them is, both represent similar kind of action and other example could be SHOES and RUN, the relationship between them is "while running we use shoes".

So, like these we create all these kind of possible mapping with all the words in Wordnet.

These Wordnet library are then available and can be used in any type of complex NLP applications also.

iii) Open Mind Common Sense: This is a kind of community formed by NLP practitioners where all of them started collecting several common-sensical facts related to the language. This is an open source platform where developers or practitioners from all around the world can contribute. The main idea behind this was to create a big database for all these language facts which can be further used while developing any NLP applications.

The main advantage of Heuristic method is that, these methods are very quick and accurate because the rules are made by human and it is very less probable to get an inaccurate result.

The other question can also come in mind that, does the Heuristic approaches are being followed now a days also? The answer is Yes. We still use many of the Heuristic Approaches.

2) Machine Learning Approach (from 1990): In 1990, the revolution has come in field of machine learning and it also impacted the NLP a lot and started being used very widely. But the question is, what was the main advantage of Machine Learning Approach over the traditional Heuristic Approaches? Why it being preferred more over the Heuristic Approaches? The problem with Heuristic Methods was, we use to create rules manually, but in the real world there are many open-ended problems exist for which it is very difficult to create the rules manually. If we create rules manually then there will be so many rules need to be created to solve the problem. In the Machine Learning approach, the rules are not created manually but the algorithms create these rules automatically based on data provided i.e. inputs and outputs. This advantage of machine learning inspired the developers to start using this technique more over the heuristic approaches in open ended problem. Slowly, it became the GOTO approach for all the NLP problems.

None

In machine learning approach, most widely used algorithms is Naive Bayes (Probability based algorithm) which perform quite well on textual data related problems. We also use Logistic Regression, SVM, LDA and Hidden Markov Model (Used basically in Part of Speech Tagging).

3) Deep Learning Approach (from 2010): Now after machine learning, deep learning-based approaches came and became very famous. But again, the question remains same i.e. what are the advantage of Deep Learning approaches over machine Learning approaches? The main problem with machine learning approaches is, we need to convert the complete data in numbers as the machine learning algorithms can only work with numbers due to which the sequential information of the data is lost and the machine learning algorithms were not able to work properly on problems where the sequence of word matter a lot. But in Deep learning, the algorithms are powerful enough to retain the sequence and due to which the Deep learning algorithms started giving more accurate result over machine learning. The second biggest advantage of deep learning is, the Deep Learning Algorithms are capable enough to generate the features automatically whereas in machine learning approach we need to create features manually and give it to model for training.

None

In Deep Learning approach, below are the main architectures followed:

i) Recurrent Neural Network (RNN): The RNN gives very good result over sequential data like text or time series data and once it was most widely used architecture in Deep Learning but the problem with this architecture is that, if we give a very long sentence to RNN, it will not be able to retain the context of the data and won't be able to perform that good.

ii) Long Short Term Memory (LSTM): This architecture was mainly evolved because of the limitation of RNN approach. The LSTM is very good in storing the context of big data also. And due to this reason, LSTM is used in majority of the NLP based application now a days.

iii) Gated Recurrent Unit (GRU): This architecture is very widely used to the problems where text generation is required.

iv) Convolutional Neural Network (CNN): CNN is mainly used for image classification tasks but there are few NLP problems also where CNN is used and gives good results like text classification.

v) Transformers: This is one architecture which changed the complete NLP field. The main benefit of Transformers is that it can give attention to a specific part of the sentence. These days, all the state-of-the-art models are transformer based like BERT (Bidirectional Encoder Representations from Transformers).

vi) Autoencoders: It is mainly set of two LSTM based neural network (encoder and decoder). This architecture is very widely used for machine translation works.

Challenges in NLP:

There are so may challenges present in front of successful use of NLP throughout. Let's discuss on some of the challenges:

1) Ambiguity: This is always possible in our day to day life that we say a sentence that can have more than one meaning. We as human being are matured enough to understand the context of each sentence and find out the exact meaning of sentence, but it is very difficult for the machine to understand the way we do. Let's see some example:

i) "I saw the boy on the beach with my binoculars"

The meaning of the above sentence could be:

a) On the beach, I saw a boy who was holding my binoculars.

b) I saw a boy who was standing on the beach, using my binoculars.

ii) "I have never tasted a cake quite like that one before"

By reading this line in context of paragraph, we can easily find out that the cake might have tasted pretty good or bad, but it is very difficult for algorithms to find out the same.

2) Contextual Words: The same word can have difference meaning based on context. For example:

"I ran to the store because we ran out of the milk"

Here, if a human read this line, they can easily find out the different meaning of word "ran" in the context of this line but it is quite difficult to make a machine understand the difference.

3) Colloquialisms and slangs: In our day to day communication, there are a lot of hidden knowledge, which we as human can understand easily. So generally, we speak some sentence which is having a different textual meaning, but the real meaning is something else which we can understand based on our knowledge. But for machine it is very difficult to understand. For example:

i) "Piece of cake" — For human based on context, it can mean "a very easy task" but for machine it is quite difficult to understand.

ii) "Pulling someone leg" — For human it can mean "making fun of someone" but for machine it can mean literally pulling someone's leg.

4) Synonyms: Synonyms means words have same meaning. For human it could be easy to understand because we use these synonyms in day to day life, but it is burden for the machine to have a record of all the synonyms.

5) Irony, Sarcasm and Tonal Difference: Some time a sentence can give different meaning based on the tone on which it is said. And in some other situation we tell something which mean opposite of what we say.

6) Spelling Errors: Our brain is trained enough to find out the correct word if there is a spelling mistake in the word, but for machine it is very difficult to analyze the incorrectly spelled words as it does not mean anything.

7) Creativity: It is very difficult for machine to understand the Poems, Dialogues, Scripts because their literal meaning is quite different from what exactly it means.

8) Diversity: There are so many languages are there in the world. For human, to collect the data for all the language and its grammar is quite difficult and for any algorithm to work correctly we need data on which the NLP models will be trained. Currently, NLP practitioners can work only on major languages, but we can expect some time in future where we will have enough data related to all the language to train the algorithms.

This list consists of only few points and challenges which is currently faced by the NLP developers and practitioners. There could be many more.

Currently, we can assume to be on just 5% of real potential of NLP and 95% is still pending because we have these kinds of challenges in front of us.

But we human have never stopped due the challenges and difficulties in our path and the day will definitely come when we will be able to use the true potential of NLP and the world will never be the same again.