Context
In this article, I'm going to walk through the process of using Python and the Tensorflow framework (via the Keras module) to sort user reviews of video games and gaming products into positive or negative categories based on the language included in each review. These are common tools and methods of Natural Language Processing (NLP) and sentiment analysis. Text-based problems are understood to be well-approached by recurrent neural networks (RNN) due to their internal memory, which allows them to model sequential data. This type of model is being used in many modern applications such as Google Translate. Long Short-Term Memory (LSTM) modeling will be used in this report- this is an extension of RNN that enables the use of input memorized over a long range, such as with the context of text paragraphs. ¹
Along the way, it will be necessary to employ a method of word embedding- the system by which words are given a numeric representation and their similarity quantified. For this I am using GloVe, or Global Vectors for Word Representation. This is a fantastic project from Stanford University that derives quantifiable relationships between English words based on the statistics of their co-occurence². More on this implementation in the corresponding section!
I'm now going to display the modules that are utilized for this purpose as well as all the supporting processing and visualization tasks.
#pandas for dataframe management import pandas as pd #additional manipulation import numpy as np from numpy import zeros from numpy import array from numpy import asarray #basic statistics from statistics import median, mean #regex for strings import re #plotting and wordclouds import seaborn as sns import matplotlib.pyplot as plt from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator from PIL import Image #keras tokenizer and modeling from keras.preprocessing.text import Tokenizer from keras.models import Sequential from keras.layers.core import Activation, Dropout, Dense from keras.layers import Flatten from tensorflow.keras.layers import Embedding, LSTM from keras.callbacks import EarlyStopping #keras string padding from keras_preprocessing.sequence import pad_sequences #training and testing from sklearn.model_selection import train_test_split #model metrics from sklearn import metrics
Data Retrieval
The dataset for this tutorial comes from Jianmo Ni's collection of Amazon product reviews³. It comes in the .json format, which we will convert to a pandas dataframe before previewing.
#read json file into pandas dataframe df = pd.read_json("/Volumes/EXT128/WGU/Advanced Analytics/nlp/Video_Games_5.json",lines=True) #investigate head df.head()
The dimensions of the original review set are (497577, 12) .#report dimensions dims = df.shape print('The dimensions of the original review set are',dims,'.')
Well there you have it! this dataframe gives us an idea of the extent of the original data and its character. There are almost 500,000 reviews along with miscellaneous metadata, as well as an attempt by Amazon to summarize the review using key words or a star rating.
Data Processing and Exploration
There are a number of things to accomplish before we can begin modeling. The first thing I notice about the original data is that there are more variables than we need- for this project, I only care about the review text and the star rating, which are under the column names reviewText and overall respectively. The first task in the cleaning process will be to remove the columns that aren't of interest.
#remove undesired columns df.drop(columns=['verified','reviewTime','reviewerID','asin','reviewerName','summary','unixReviewTime','vote','style','image'],inplace=True)
Now it's time to make the kinds of decisions analysts must- remember that in a business situation, the needs of the stakeholders must be carefully understood before taking these steps. It has been established that this is a binary classification problem, and it seems prudent to consider 4 or 5 stars as positive sentiment and 1 or 2 stars as negative. That leaves us with the 3-star reviews- are they positive or negative? Any decision seems completely subjective. In my opinion, the presence of 3-star reviews adds noise to the model so the decision for this project will be to remove those observations. Once that's done, I'm going to assign a value of 0 to the negative reviews and 1 to the positive ones.
#remove 3-star reviews
df = df[df['overall'] != 3]
#assign binary outcome
df['overall'] = np.where(df['overall'] > 3, 1, 0)
#reset indexes for skips introduced by removing rows
df = df.reset_index()
df.drop(columns=['index'],inplace=True)
The next step also involves some decision making. Some degree of preprocessing is always required with NLP tasks, even in the presence of tokenizers like Keras and embedding methods like GloVe, which are both equipped to some degree for handling raw text. Removing punctuation and numbers is pretty standard, along with making the text lowercase. Where I have seen differing opinions is the removal of stopwords, when the embedding process is typically equipped to handle them by assigning them appropriate weights⁴. As usual we have to make a decision, and the one I am going with here is to remove them as part of the cleaning procedure. I believe this to be a step that reduces noise in the model, reduces computational expense in the training process, and allows for the display of more meaningful patterns in visualizations like the wordcloud. We'll instantiate a list of stopwords to filter for and also definine a function to perform the basic preprocessing steps.
#remove 3-star reviews df = df[df['overall'] != 3] #assign binary outcome df['overall'] = np.where(df['overall'] > 3, 1, 0) #reset indexes for skips introduced by removing rows df = df.reset_index() df.drop(columns=['index'],inplace=True)
#define list of stopwords stopwords = [ "a", "about", "above", "after", "again", "against", "all", "am", "an", "and", "any", "are", "as", "at", "be", "because", "been", "before", "being", "below", "between", "both", "but", "by", "could", "did", "do", "does", "doing", "down", "during", "each", "few", "for", "from", "further", "had", "has", "have", "having", "he", "he'd", "he'll", "he's", "her", "here", "here's", "hers", "herself", "him", "himself", "his", "how", "how's", "i", "i'd", "i'll", "i'm", "i've", "if", "in", "into", "is", "it", "it's", "its", "itself", "let's", "me", "more", "most", "my", "myself", "nor", "of", "on", "once", "only", "or", "other", "ought", "our", "ours", "ourselves", "out", "over", "own", "same", "she", "she'd", "she'll", "she's", "should", "so", "some", "such", "than", "that", "that's", "the", "their", "theirs", "them", "themselves", "then", "there", "there's", "these", "they", "they'd", "they'll", "they're", "they've", "this", "those", "through", "to", "too", "under", "until", "up", "very", "was", "we", "we'd", "we'll", "we're", "we've", "were", "what", "what's", "when", "when's", "where", "where's", "which", "while", "who", "who's", "whom", "why", "why's", "with", "would", "you", "you'd", "you'll", "you're", "you've", "your", "yours", "yourself", "yourselves"] #define standardizer function def standardize(text): #make text lowercase txt = text.lower() #remove stopwords txt = ' '.join([word for word in txt.split() if word not in stopwords]) #remove punctuation and numbers txt = re.sub('[^a-z]', ' ', txt) #remove single characters txt = re.sub(r'(?:^| )\w(?:$| )', ' ', txt) #collapse multiple spaces txt = re.sub(r'\s+', ' ', txt) return txt #cast reviews as string df['reviewText'] = df['reviewText'].astype(str) #list for review storage lst = [] #standardize each review for review in df['reviewText']: lst.append(standardize(review)) #place list into dataframe df['reviewText'] = lst #allow display of full text pd.options.display.max_colwidth = 5000 #review processed df contents df.head(5)
Alright! That's much cleaner than before. The reviews have lost their syntactical structure but retained the key words a machine learning algorithm would be interested in. The ones that are positive, as well as the single visible negative review, make sense with respect to the words they feature. Let's generate the image featured for this article, which is the wordcloud representing the positive review keywords. For comparison, I'm also going to generate the negative version.
#import image for styling
mask = np.array(Image.open('/Volumes/EXT128/WGU/Advanced Analytics/nlp/Vector-Game-Controller-PNG-Clipart.png'))
#generate positive wordcloud
pos = WordCloud(background_color ='white',mask=mask,colormap='crest').generate(" ".join(i for i in df.reviewText[df.overall==1]))
#show
plt.imshow(pos)
plt.axis("off")
plt.show()
#import image for styling mask = np.array(Image.open('/Volumes/EXT128/WGU/Advanced Analytics/nlp/Vector-Game-Controller-PNG-Clipart.png')) #generate positive wordcloud pos = WordCloud(background_color ='white',mask=mask,colormap='crest').generate(" ".join(i for i in df.reviewText[df.overall==1])) #show plt.imshow(pos) plt.axis("off") plt.show()
#generate negative wordcloud
neg = WordCloud(background_color ='white',mask=mask,colormap='rocket').generate(" ".join(i for i in df.reviewText[df.overall==0]))
#show
plt.imshow(neg)
plt.axis("off")
plt.show()
#generate negative wordcloud neg = WordCloud(background_color ='white',mask=mask,colormap='rocket').generate(" ".join(i for i in df.reviewText[df.overall==0])) #show plt.imshow(neg) plt.axis("off") plt.show()
Note that by default the wordcloud module uses colocation- especially in the positive version, you're seeing combinations that you'd expect such as 'great game' and 'video game' rather than a giant 'game' dominating the cloud. This is nice, since from the head call I'd imagine the word 'game' appears in just about all the reviews. That said, the positive cloud is about what I would expect but the negative one is not as loaded with typical language as I thought it would. If you look carefully, there are words like 'bad' but also words like 'good'- the language of the negative reviews is overall more nuanced than I expected and the benefit of LSTM is that it can often pick up on those nuances. A side note from the perspective of someone who has played video games for much of his life: it's interesting to see words like 'still' so visible- it reminds me of common complaints against major game franchises that release similar content year after year.
It's now time to start processing the text in ways specific to the embedding and modeling process. First, embedding. Text doesn't mean anything to computers, so we express the textual data we have in a numeric form- a vector, specifically. When we eventually assign numbers to words, the vectors they populate must all have the same dimension in order to be used as inputs for a neural network. +Since there will be very short and very long reviews, we have to decide on the maximum review length that will also correspond to the maximum vector dimensions. I'm going to devise a statistical justification for the max length. Consider the distribution of review lengths:
#import basic statistics
from statistics import median, mean
#initialize list of sequence lengths
lengths = []
#loop through reviews, counting words and storing the counts
for review in df.reviewText:
length = len(review.strip().split(" "))
lengths.append(length)
#average review length
meanlen = mean(lengths)
print('The average review is', round(meanlen), 'words.')
#median review length
medlen = median(lengths)
print('The median review length is', medlen,'words.')
The average review is 67 words.
The median review length is 21 words.
#import basic statistics from statistics import median, mean #initialize list of sequence lengths lengths = [] #loop through reviews, counting words and storing the counts for review in df.reviewText: length = len(review.strip().split(" ")) lengths.append(length) #average review length meanlen = mean(lengths) print('The average review is', round(meanlen), 'words.') #median review length medlen = median(lengths) print('The median review length is', medlen,'words.')
#histogram of review length
pd.Series(lengths).hist(bins=100,range=[0,1000])
plt.title('Review Lengths Histogram')
plt.xlabel('Length')
plt.ylabel('Frequency')
plt.show()
#histogram of review length pd.Series(lengths).hist(bins=100,range=[0,1000]) plt.title('Review Lengths Histogram') plt.xlabel('Length') plt.ylabel('Frequency') plt.show()
#length quantiles
np.set_printoptions(suppress=True)
np.quantile(lengths,[0,0.25,0.5,0.75,1])
array([ 1., 6., 20., 65., 3232.])
#length quantiles
np.set_printoptions(suppress=True)
np.quantile(lengths,[0,0.25,0.5,0.75,1])
It's clear that reviews less than 100 words in length are dominant, and specifically the lengths of 65 words represent the third quartile. Let's remove longer reviews from the analysis. We will make the decision to keep small review lengths, because they tend to contain single-word sentiments that will be relatively simple for the model to pick up and learn. This is the final paring-down of the raw data set and the following code will report the final number of reviews.
#fix maximum length
maxlen = 65
#subset raw reviews for maximum length
reduced_lengths = list(np.where(np.array(lengths) <= maxlen)[0])
rev_in = [df.reviewText[i] for i in reduced_lengths]
#create vector of sentiment labels
labels = [df['overall'][i] for i in reduced_lengths]
#concatenate into dataframe
DF = pd.DataFrame(list(zip(rev_in, labels)),
columns =['review_text', 'sentiment'])
#verify final number of reviews
rev_tot = len(DF)
print("There are a total of",rev_tot,'reviews under study.')
There are a total of 336419 reviews under study.
#fix maximum length maxlen = 65 #subset raw reviews for maximum length reduced_lengths = list(np.where(np.array(lengths) <= maxlen)[0]) rev_in = [df.reviewText[i] for i in reduced_lengths] #create vector of sentiment labels labels = [df['overall'][i] for i in reduced_lengths] #concatenate into dataframe DF = pd.DataFrame(list(zip(rev_in, labels)), columns =['review_text', 'sentiment']) #verify final number of reviews rev_tot = len(DF) print("There are a total of",rev_tot,'reviews under study.')
This has been a pretty long preprocessing sequence- I'd say we're over half way done! Now that the length of reviews has been set we're going to tokenize the texts with Keras. This is basically the penultimate NLP processing step- the here the individual words in each review are considered the units that contain semantic meaning. This step splits the reviews into their tokens, which creates a structured index of vocabulary that an ML algorithm can model on. Here goes!
#initialize tokenization
tokenizer = Tokenizer()
#tokenize and build vocabulary set
tokenizer.fit_on_texts(rev_in)
#find dictionary length
vocab_len = len(tokenizer.word_index)
print('The final vocabulary size is', vocab_len, 'words.')
The final vocabulary size is 66776 words.
#initialize tokenization tokenizer = Tokenizer() #tokenize and build vocabulary set tokenizer.fit_on_texts(rev_in) #find dictionary length vocab_len = len(tokenizer.word_index) print('The final vocabulary size is', vocab_len, 'words.')
There you have it: 66,776 unique words in the whole body of 336,419. It's a pretty small vocabulary, but it's fair to conclude that this context is pretty restrictive concerning what you can expect people to say. Now that a vocabulary index is established, I want to assign a sequence of indexes to each review and compare the output to the original text.
#create word-index sequences
seq = tokenizer.texts_to_sequences(DF.review_text)
#sample the first review vector
seq[0]
#create word-index sequences seq = tokenizer.texts_to_sequences(DF.review_text) #sample the first review vector seq[0]
[1, 99, 70, 14, 1404, 1126, 2]
#examine standardized review index 0
DF.review_text[0]
#examine standardized review index 0
DF.review_text[0]
#pad out word sequences length 65
pad = pad_sequences(seq, padding='post', maxlen=maxlen)
#compare output
pad[0]
#pad out word sequences length 65 pad = pad_sequences(seq, padding='post', maxlen=maxlen) #compare output pad[0]
array([ 1, 99, 70, 14, 1404, 1126, 2, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
dtype=int32)
#import train/test split function
from sklearn.model_selection import train_test_split
#retrieve explainers and targets
X = pad
y = DF.sentiment
#call split function at 0.1 test size, set random seed
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.10, random_state= 43022)
#import train/test split function from sklearn.model_selection import train_test_split #retrieve explainers and targets X = pad y = DF.sentiment #call split function at 0.1 test size, set random seed X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.10, random_state= 43022)
Model Building
#initialize embedding dictionary
embeddings_dictionary = dict()
#access GloVe matrix
glove_file = open('/Volumes/EXT128/WGU/Advanced Analytics/nlp/glove/glove.6B.100d.txt', encoding="utf8")
#loop through GloVe database vocabulary
for line in glove_file:
#split line and into tokens
records = line.split()
#capture word entry
word = records[0]
#capture the 100 word scores as an array
vector_dimensions = asarray(records[1:], dtype='float32')
#create dictionary entry with word:score pairs
embeddings_dictionary [word] = vector_dimensions
#close file connection after loop completion
glove_file.close()
#initialize a matrix with the size of our vocabulary and 100 columns
embedding_matrix = zeros((vocab_len + 1, 100))
#loop through tokens
for word, index in tokenizer.word_index.items():
#find each word in the vocabulary
embedding_vector = embeddings_dictionary.get(word)
#check to see if vocabulary space has scores
if embedding_vector is not None:
#assign scores to words in the project vocabulary
embedding_matrix[index] = embedding_vector
#initialize embedding dictionary embeddings_dictionary = dict() #access GloVe matrix glove_file = open('/Volumes/EXT128/WGU/Advanced Analytics/nlp/glove/glove.6B.100d.txt', encoding="utf8") #loop through GloVe database vocabulary for line in glove_file: #split line and into tokens records = line.split() #capture word entry word = records[0] #capture the 100 word scores as an array vector_dimensions = asarray(records[1:], dtype='float32') #create dictionary entry with word:score pairs embeddings_dictionary [word] = vector_dimensions #close file connection after loop completion glove_file.close() #initialize a matrix with the size of our vocabulary and 100 columns embedding_matrix = zeros((vocab_len + 1, 100)) #loop through tokens for word, index in tokenizer.word_index.items(): #find each word in the vocabulary embedding_vector = embeddings_dictionary.get(word) #check to see if vocabulary space has scores if embedding_vector is not None: #assign scores to words in the project vocabulary embedding_matrix[index] = embedding_vector
#initialize sequential model
model = Sequential()
#create embedding layer with the vocabulary size, embedding dimension, and input length
embedding_layer = Embedding(vocab_len + 1, 100, weights=[embedding_matrix], input_length=maxlen , trainable=False)
#implement embedding layer
model.add(embedding_layer)
#add LSTM layer with 128 neurons
model.add(LSTM(128,dropout = 0.5))
#add final classification layer
model.add(Dense(1, activation='sigmoid'))
#compile model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
#summarize model
print(model.summary())
#initialize sequential model model = Sequential() #create embedding layer with the vocabulary size, embedding dimension, and input length embedding_layer = Embedding(vocab_len + 1, 100, weights=[embedding_matrix], input_length=maxlen , trainable=False) #implement embedding layer model.add(embedding_layer) #add LSTM layer with 128 neurons model.add(LSTM(128,dropout = 0.5)) #add final classification layer model.add(Dense(1, activation='sigmoid')) #compile model model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc']) #summarize model print(model.summary())
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding (Embedding) (None, 65, 100) 6677700
lstm (LSTM) (None, 128) 117248
dense (Dense) (None, 1) 129
=================================================================
Total params: 6,795,077
Trainable params: 117,377
Non-trainable params: 6,677,700
_________________________________________________________________
None
#import early stopping module
from keras.callbacks import EarlyStopping
#add early stop for successive increases in validation loss across epochs
es = EarlyStopping(monitor="val_loss",verbose=2,mode='min',patience=2)
#fit model with a maximum 20 epochs and early stopping criteria
history = model.fit(X_train, y_train, batch_size=128, epochs=20, verbose=1, validation
#import early stopping module from keras.callbacks import EarlyStopping #add early stop for successive increases in validation loss across epochs es = EarlyStopping(monitor="val_loss",verbose=2,mode='min',patience=2) #fit model with a maximum 20 epochs and early stopping criteria history = model.fit(X_train, y_train, batch_size=128, epochs=20, verbose=1, validation
Epoch 1/20
1893/1893 [==============================] - 258s 135ms/step - loss: 0.2429 - acc: 0.9072 - val_loss: 0.1846 - val_acc: 0.9310
Epoch 2/20
1893/1893 [==============================] - 268s 142ms/step - loss: 0.1920 - acc: 0.9256 - val_loss: 0.1706 - val_acc: 0.9356
Epoch 3/20
1893/1893 [==============================] - 270s 142ms/step - loss: 0.1775 - acc: 0.9316 - val_loss: 0.1810 - val_acc: 0.9405
Epoch 4/20
1893/1893 [==============================] - 268s 142ms/step - loss: 0.1679 - acc: 0.9353 - val_loss: 0.1424 - val_acc: 0.9474
Epoch 5/20
1893/1893 [==============================] - 279s 147ms/step - loss: 0.1608 - acc: 0.9382 - val_loss: 0.1521 - val_acc: 0.9481
Epoch 6/201893/1893 [==============================] - 263s 139ms/step - loss: 0.1559 - acc: 0.9402 - val_loss: 0.1398 - val_acc: 0.9488
Epoch 7/20
1893/1893 [==============================] - 263s 139ms/step - loss: 0.1516 - acc: 0.9419 - val_loss: 0.1319 - val_acc: 0.9509
Epoch 8/20
1893/1893 [==============================] - 261s 138ms/step - loss: 0.1480 - acc: 0.9431 - val_loss: 0.1298 - val_acc: 0.9505
Epoch 9/20
1893/1893 [==============================] - 270s 142ms/step - loss: 0.1447 - acc: 0.9444 - val_loss: 0.1273 - val_acc: 0.9524
Epoch 10/20
1893/1893 [==============================] - 274s 145ms/step - loss: 0.1420 - acc: 0.9462 - val_loss: 0.1277 - val_acc: 0.9522
Epoch 11/20
1893/1893 [==============================] - 264s 139ms/step - loss: 0.1397 - acc: 0.9467 - val_loss: 0.1249 - val_acc: 0.9536
Epoch 12/20
1893/1893 [==============================] - 262s 138ms/step - loss: 0.1366 - acc: 0.9476 - val_loss: 0.1252 - val_acc: 0.9533
Epoch 13/20
1893/1893 [==============================] - 263s 139ms/step - loss: 0.1356 - acc: 0.9475 - val_loss: 0.1228 - val_acc: 0.9533
Epoch 14/20
1893/1893 [==============================] - 226s 120ms/step - loss: 0.1333 - acc: 0.9490 - val_loss: 0.1221 - val_acc: 0.9541
Epoch 15/20
1893/1893 [==============================] - 161s 85ms/step - loss: 0.1332 - acc: 0.9493 - val_loss: 0.1234 - val_acc: 0.9550
Epoch 16/20
1893/1893 [==============================] - 150s 79ms/step - loss: 0.1321 - acc: 0.9496 - val_loss: 0.1280 - val_acc: 0.9548
Epoch 16: early stopping
Model Validation
#plot loss for training and validation
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Train and Validation Model Loss over Epochs')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train','validation'], loc='upper left')
plt.show()
#plot accuracy for loss and validation
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('Train and Validation Model Accuracy over Epochs')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train','validation'], loc='upper left')
plt.show()
#plot loss for training and validation plt.plot(history.history['loss']) plt.plot(history.history['val_loss']) plt.title('Train and Validation Model Loss over Epochs') plt.ylabel('loss') plt.xlabel('epoch') plt.legend(['train','validation'], loc='upper left') plt.show() #plot accuracy for loss and validation plt.plot(history.history['acc']) plt.plot(history.history['val_acc']) plt.title('Train and Validation Model Accuracy over Epochs') plt.ylabel('accuracy') plt.xlabel('epoch') plt.legend(['train','validation'], loc='upper left') plt.show()
#import confusion plotting module
from sklearn import metrics
#create prediction and truth vectors
y_pred = model.predict(X_test)
y_pred = np.where(y_pred > 0.5, 1,0)
y_true = y_test
#create confusion matrix and visualization
confusion_matrix = metrics.confusion_matrix(y_true, y_pred)
cm_display = metrics.ConfusionMatrixDisplay(confusion_matrix = confusion_matrix, display_labels = ['Unfavorable', 'Favorable'])
#plot matrix
cm_display.plot(colorbar=False,cmap = 'PuBuGn')
plt.show()
#create binary classification metrics
Accuracy = metrics.accuracy_score(y_true, y_pred)
Precision = metrics.precision_score(y_true, y_pred)
Sensitivity_recall = metrics.recall_score(y_true, y_pred)
Specificity = metrics.recall_score(y_true, y_pred, pos_label=0)
F1Score = metrics.f1_score(y_true, y_pred)
#disply all binary classification metrics
print("--- Test Set Metrics ---")
print("Accuracy:", Accuracy)
print("Precision:", Precision)
print("Sensitivity:", Sensitivity_recall)
print("Specificity:", Specificity)
print("F1 Score:", F1Score)
#import confusion plotting module from sklearn import metrics #create prediction and truth vectors y_pred = model.predict(X_test) y_pred = np.where(y_pred > 0.5, 1,0) y_true = y_test #create confusion matrix and visualization confusion_matrix = metrics.confusion_matrix(y_true, y_pred) cm_display = metrics.ConfusionMatrixDisplay(confusion_matrix = confusion_matrix, display_labels = ['Unfavorable', 'Favorable']) #plot matrix cm_display.plot(colorbar=False,cmap = 'PuBuGn') plt.show() #create binary classification metrics Accuracy = metrics.accuracy_score(y_true, y_pred) Precision = metrics.precision_score(y_true, y_pred) Sensitivity_recall = metrics.recall_score(y_true, y_pred) Specificity = metrics.recall_score(y_true, y_pred, pos_label=0) F1Score = metrics.f1_score(y_true, y_pred) #disply all binary classification metrics print("--- Test Set Metrics ---") print("Accuracy:", Accuracy) print("Precision:", Precision) print("Sensitivity:", Sensitivity_recall) print("Specificity:", Specificity) print("F1 Score:", F1Score)
--- Test Set Metrics ---
Accuracy: 0.9560073717377088
Precision: 0.964507225198896
Sensitivity: 0.9871381568014889
Specificity: 0.6923726428370391
F1 Score: 0.9756914788778661
References
¹Srivastava, P. Essentials of Deep Learning : Introduction to Long Short Term Memory. Analytics Vidhya. 05/2020. https://www.analyticsvidhya.com/blog/2017/12/fundamentals-of-deep-learning-introduction-to-lstm/
²Pennington, J. et al. GloVe: Global Vectors for Word Representation. Stanford. 08/2014. https://nlp.stanford.edu/projects/glove/
³McAuley, J. Amazon Product Data. UCSD. 05/2021. https://nijianmo.github.io/amazon/index.html