NLP Model, Deployment with flask, Deployment on web (Heroku)

You can check the working deployed model here: http://www.nltkbot.com/

None

Before going into the article you will need to have Python properly installed on your system and you need to have basic understanding of ML models

This article covers only the model deployment part with flask and not in-depth model building. So if you already have a working model, you can use that model to deploy and can directly skip to the deployment part or else you can use the below example model.

PART 1) — The NLP Model

As model building is beyond the scope of this article I will just plainly explain the model and what it does. If anyone need datasets to further improve the model, github links are provided at the end of the article in the Links sections

We have a very basic model which was trained on tech tweets from customers about various tech firms who manufacture and sell mobiles, computers, laptops, etc, the task is to identify if the tweets have a negative sentiment towards such companies or products

The data cleaning process involved removing all URLs, punctuations and NLTK library was used for Tokenization, Lemmetization and removing stop words. Make sure you have NLTK library already installed or else it will throw errors

import pandas as pd
import numpy as np
import nltk
import string
import re
import pickle

nltk.download('stopwords')
nltk.download('wordnet')

train_data = pd.read_csv('train.csv')
test_data = pd.read_csv('test.csv')

data = pd.concat([train_data, test_data], axis=0)

df = data.copy()

df['label'].value_counts()


# Removing URLs
def remove_url(text):
    return re.sub(r"http\S+", "", text)

#Removing Punctuations
def remove_punct(text):
    new_text = []
    for t in text:
        if t not in string.punctuation:
            new_text.append(t)
    return ''.join(new_text)


#Tokenizer
from nltk.tokenize import RegexpTokenizer
tokenizer = RegexpTokenizer(r'\w+')



#Removing Stop words
from nltk.corpus import stopwords
def remove_sw(text):
    new_text = []
    for t in text:
        if t not in stopwords.words('english'):
            new_text.append(t)
    return new_text

#Lemmatizaion
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()

def word_lemmatizer(text):
    new_text = []
    for t in text:
        lem_text = lemmatizer.lemmatize(t)
        new_text.append(lem_text)
    return new_text

df['tweet'] = df['tweet'].apply(lambda t: remove_url(t))

df['tweet'] = df['tweet'].apply(lambda t: remove_punct(t))

df['tweet'] = df['tweet'].apply(lambda t: tokenizer.tokenize(t.lower()))

df['tweet'] = df['tweet'].apply(lambda t: remove_sw(t))

df['tweet'] = df['tweet'].apply(lambda t: word_lemmatizer(t))

Later the text was vectorized with TfidVectorizer before sending it to model for prediction. LightGbm model was used for prediction without any hyperparameter tuning, as I already mentioned earlier that model building was beyond the scope of this article. Finally the model and vectorizer should be saved using pickle so we can later load these models in our app to predict the results of user inputs

features_set = df.copy()

train_set = features_set.iloc[:len(train_data), :]

test_set = features_set.iloc[len(train_data):, :]

X = train_set['tweet']


for i in range(0, len(X)):
    X.iloc[i] = ' '.join(X.iloc[i])


Y = train_set['label']


from sklearn.feature_extraction.text import TfidfVectorizer

TfidV = TfidfVectorizer()

X = TfidV.fit_transform(X)



from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size = 0.1, random_state = 1234)


from lightgbm import LGBMClassifier

lgb = LGBMClassifier(scale_pos_weight=3)

lgb.fit(X, Y)

#lgb.fit(x_train, y_train)

y_predict_lgb = lgb.predict(x_test)

from sklearn.metrics import confusion_matrix, f1_score

cm_lgb = confusion_matrix(y_test, y_predict_lgb)

f1_lgb = f1_score(y_test, y_predict_lgb)

score_lgb = lgb.score(x_test, y_test)


with open('twitter_predictions.pkl', 'wb') as file:
    pickle.dump(lgb, file)
    
with open('vectorizer.pkl', 'wb') as file:
    pickle.dump(TfidV, file)

PART 2) — Deployment using Flask

I hope you have a working model file and vectorizer file saved using pickle, before proceeding to the deployment

Flask is a microframework for Python, it is fun and easy to setup. Before installing flask create a directoy (I named it flask_app) where you are going to have the flask app and create a virtual environment (I named it nlp_env) in this directory and activate it and it should look something like below

C:\Users\Surya\Documents\flask_app>py -m venv nlp_env
C:\Users\Surya\Documents\flask_app>nlp_env\Scripts\activate
(nlp_env) C:\Users\Surya\Documents\flask_app>

Now we can install flask, to install flask we use python package installer pip.

(nlp_env) C:\Users\Surya\Documents\flask_app>pip install flask

Once the installation is done, we can go and create our flask app. The first step is to create a file in the flask_app directory and name it (I named it app.py). You can open this file in your favourite text editor. Since this being your main app file we need to set an environment variable which will allow flask to know which file to look for when it wants to run

(nlp_env) C:\Users\Surya\Documents\flask_app>set FLASK_APP=app.py

Now we will check the famous 'Hello World' working with our flask app. Open your text editor and write the below code.

import flask

app = flask.Flask(__name__)

@app.route('/')
def index():
 return "<h1>Hello World</h1>"

The code is pretty straightforward, we were just importing flask library and instantiating the Flask class and __name__ references the name of the current module we are working in (which is app.py here). In the next line we create a route followed by a function which returns directly to the browser.

Time to see our flask app up and running in the browser, just type flask run and you should see the app running on localhost address (something like this Running on http://127.0.0.1:5000/)

(nlp_env) C:\Users\Surya\Documents\flask_app>flask run

Writing html directly in app.py is not encouraged and it also doesn't look good so we will create a separate folder in our directly and will reference it in our app.py. We should also create a separate folder to keep our saved models which later we will be using to predict. After creating the folders your folder structure should look like below

None

We will update our app.py to reflect these changes. First we will load our model files and then we will update our Flask class with the templates folder. In addition to that we will also update our routing to take GET and POST requests

import flask

#Use pickle to load in the pre-trained model.

with open(f'model/twitter_predictions.pkl', 'rb') as f:
    model = pickle.load(f)

with open(f'model/vectorizer.pkl', 'rb') as f:
    vectorizer = pickle.load(f)

app = flask.Flask(__name__, template_folder='templates')

@app.route('/', methods=['GET', 'POST'])
def index():
    return "<h1>Hello World</h1>"

Our app still returns 'Hello World' with the GET request. So we need our app to return a proper html and for that we will create a html file (index.html) in our templates folder. I am going to add a simple html-form in index.html which will allow user to enter some text and submit.

<!doctype html>
<html>

<head>
    <title>Enter Text</title>
</head>
<form action="{{ url_for('index') }}" method="POST">
    <fieldset>
        <legend>Enter Review:</legend>
        <textarea name="tweet" rows="10" cols="80" required></textarea>
        <br>
        <br>
        <input type="submit">
    </fieldset>
</form>

When user enters the text and submits, our app receives a POST request with text attached to 'tweet'. We need to update our app.py to handle this POST request and render result accordingly

import flask

#Use pickle to load in the pre-trained model.
with open(f'model/twitter_predictions.pkl', 'rb') as f:
    model = pickle.load(f)

with open(f'model/vectorizer.pkl', 'rb') as f:
    vectorizer = pickle.load(f)

app = flask.Flask(__name__, template_folder='templates')

@app.route('/', methods=['GET', 'POST'])
def index():
    
    if flask.request.method == 'GET':
        return(flask.render_template('index.html'))
    
    if flask.request.method == 'POST':
        
        tweet = flask.request.form['tweet']

We got the user input which we need to predict from the POST request, so we will be using our earlier loaded model to predict the results. Since the user input is still in its raw form we need to convert it to machine readable the same way we did while we were building the model in first part. First we will convert it to a dataframe with the same column names which our saved model understands and later all the steps we used in model building will follow in the same order. So our final app.py should look like below

import flask
import pickle
import pandas as pd
import numpy as np
import nltk
import string
import re


# Use pickle to load in the pre-trained model.
with open(f'model/twitter_predictions.pkl', 'rb') as f:
    model = pickle.load(f)


with open(f'model/vectorizer.pkl', 'rb') as f:
    vectorizer = pickle.load(f)



# Removing URLs
def remove_url(text):
    return re.sub(r"http\S+", "", text)

#Removing Punctuations
def remove_punct(text):
    new_text = []
    for t in text:
        if t not in string.punctuation:
            new_text.append(t)
    return ''.join(new_text)


#Tokenizer
from nltk.tokenize import RegexpTokenizer
tokenizer = RegexpTokenizer(r'\w+')



#Removing Stop words
from nltk.corpus import stopwords

def remove_sw(text):
    new_text = []
    for t in text:
        if t not in stopwords.words('english'):
            new_text.append(t)
    return new_text

#Lemmatizaion
from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()

def word_lemmatizer(text):
    new_text = []
    for t in text:
        lem_text = lemmatizer.lemmatize(t)
        new_text.append(lem_text)
    return new_text



app = flask.Flask(__name__, template_folder='templates')

@app.route('/', methods=['GET', 'POST'])

def index():
    
    if flask.request.method == 'GET':
        return(flask.render_template('index.html'))
    
    if flask.request.method == 'POST':
        
        tweet = flask.request.form['tweet']

        df = pd.DataFrame([tweet], columns=['tweet'])



        df['tweet'] = df['tweet'].apply(lambda t: remove_url(t))

        df['tweet'] = df['tweet'].apply(lambda t: remove_punct(t))

        df['tweet'] = df['tweet'].apply(lambda t: tokenizer.tokenize(t.lower()))

        df['tweet'] = df['tweet'].apply(lambda t: remove_sw(t))

        df['tweet'] = df['tweet'].apply(lambda t: word_lemmatizer(t))




        final_text = df['tweet']

        final_text.iloc[0] = ' '.join(final_text.iloc[0])

        final_text = vectorizer.transform(final_text)



        prediction = model.predict(final_text)
        
        return flask.render_template('index.html', result=prediction, original_input={'Mobile Review':tweet})




if __name__ == '__main__':
    app.run()

The predicted result in app.py is stored in a result variable which will display the result in index.html. We will create a new div element to display the result along with user input when POST request renders index.html. The div element has a simple if else statement to check the result type and add color to the result. I have added some styling to html page to make it look nice. So our final html file will look like below

<!doctype html>
<html>
<style>
form {
    margin: auto;
    width: 35%;
}
.result {
    margin: auto;
    width: 35%;
    border: 1px solid #ccc;
}
</style>
<head>
    <title>Enter Text</title>
</head>
<form action="{{ url_for('index') }}" method="POST">
    <fieldset>
        <legend>Enter Mobile Review:</legend>
        <textarea name="tweet" rows="10" cols="80" required></textarea>
        <br>
        <br>
        <input type="submit">
    </fieldset>
</form>
<br>
<div class="result" align="center">
    {% if result == 0 %}
        {% for variable, value in original_input.items() %}
            <b>{{ variable }}</b> : {{ value }}
        {% endfor %}
        <br>
        <br> 
           <p style="font-size:30px">The Review is:</p>
           <p style="font-size:40px; color: green;">{{ result }}</p>
    {% elif result == 1 %}
        {% for variable, value in original_input.items() %}
            <b>{{ variable }}</b> : {{ value }}
        {% endfor %}
        <br>
        <br> 
           <p style="font-size:30px">The Review is:</p>
           <p style="font-size:40px; color: red;">{{ result }}</p>
    {% endif %}
</div>
</html>

Now type, flask run and see the ML model in browser up and running and ready to predict some tweets. Yay !! you have successfully deployed NLP model with flask on your local machine. You can additionally add some styling and background to the app to make it look like more vibrant or else you can check my app for full template https://github.com/vijjeswarapusuryateja/mobile-review-rating

PART 3) — Deployment on Web (Heroku)

Here I am only covering the part on deploying NLP model on Heroku assuming you already have an account in Heroku. If you are not aware of Heroku, there are lot of rich information sources available online about Heroku. Explaining about Heroku signup process is not in the scope of this article. So you need to have git, Heroku CLI installed before proceeding to the next step

First step is to create a git repository for our web app

(nlp_env) C:\Users\Surya\Documents\flask_app>git init

Authenticate with Heroku and create a new heroku app

(nlp_env) C:\Users\Surya\Documents\flask_app>heroku login
(nlp_env) C:\Users\Surya\Documents\flask_app>heroku create flask-app-nlp

For our app to successfully run on Heroku we need to add three additional files

Create a file requirements.txt in your flask_app directory and it should have the following

flask
pandas
numpy
gunicorn
lightgbm
nltk

Similarly create a nltk.txt file and add the following

stopwords
wordnet
pros_cons
reuters
omw-1.4

Finally create a Procfile in your flask_app directory and add the following line

web: gunicorn app:app

Your final flask_app folder should now look like this:

None

Add all these files to repositiory using the git commands as below

(nlp_env) C:\Users\Surya\Documents\flask_app>git add .
(nlp_env) C:\Users\Surya\Documents\flask_app>git commit -m "First commit!"

Set the remote destination for pushing from git to heroku

(nlp_env) C:\Users\Surya\Documents\flask_app>heroku git:remote -a flask-app-nlp

After you run the below command your app will be online. Heroku will upload your app files, install the packages it needs and starts the app running

(nlp_env) C:\Users\Surya\Documents\flask_app>git push heroku master

You will get the below screen once everything is pushed to Heroku

None

Once uploading is done you can type heroku open and see the app running on a web browser. Alternatively you can look your app with your created name http://flask-app-nlp.herokuapp.com. I have already created an app with this name in Heroku so use a different name instead

(nlp_env) C:\Users\Surya\Documents\flask_app>heroku open

Possible Errors !

If you run into problems with heroku deployment it is always advisable to check the heroku logs to troubleshoot the problem. One of the solutions for most common problem caused in NLP deployment is due to NLTK library and it can be rectified by specifying the respective corpus in nltk.txt file which shows up in heroku logs

HELPFUL LINKS (Dataset, Github repo, Deployed Model)

Links for the datasets which the model was trained on: https://github.com/vijjeswarapusuryateja/Datasets

Code link for deployed app: https://github.com/vijjeswarapusuryateja/mobile-review-rating

Working Deployed Model: http://www.nltkbot.com/

Portfolio: https://vijjeswarapusuryateja.github.io/