photo credit: Pixabay

Fake News Classification with LSTM or Logistic Regression

Tensorflow, Keras, DataSpell, Scikit-Learn, PyCharm

The Data

The dataset consists of 44,919 news articles, almost equally distributed to the true and fake categories.


There are a number of ways to download DataSpell, if you plan to use DataSpell only not any other JetBrains’ tools in its toolbox, the simplest way is to download it from it’s website directly.

Deep Learning

This section presents an overview of preprocessing techniques, and description of the deep learning model used for classification.


The following steps demonstrated part of the data pre-processing process.

  • We load Fake.csv and True.csv.
  • Remove the useless columns, we only need title & text.
  • Label fake news as 0, and real news as 1.
  • Concatenate two data frames into one.
  • Combine title & text into one column.
  • Standard text cleaning process such as lower case, remove extra spaces and url links.
  • The way we split training and testing data must be the same for deep learning model and Logistic Regression.
  • We put the parameters at the top like this to make it easier to change and edit.


  • Tokenizer does all the heavy lifting for us. In our articles (aka title + text) that it was tokenizing, it will take 10,000 most common words. oov_tok is to put a special value in where an unseen word is encountered. This means I want “OOV” in bracket to be used to for words that are not in the word index. fit_on_text will go through all the text and create dictionary.
  • After tokenization, the next step is to turn those tokens into lists of sequence.
  • When we train neural networks for NLP, we need sequences to be in the same size, that’s why we use padding. Our max_length is 256, so we use pad_sequence to make all of our articles (aka title + text) the same length which is 256.
  • In addition, there are padding type and truncating type, we set both of them “post”., meaning for example, if one article at 200 in length, we padded to 256, and we padded at the end, add 56 zeros.

Building the Model

Now we can implement LSTM. Here is my code that I build a tf.keras.Sequential model and start with an embedding layer.

  • An embedding layer stores one vector per word. When called, it converts the sequences of word indices into sequences of vectors. After training, words with similar meanings often have the similar vectors.
  • Next is how to implement LSTM in code. The Bidirectional wrapper is used with a LSTM layer, this propagates the input forwards and backwards through the LSTM layer and then concatenates the outputs. This helps LSTM to learn long term dependencies. We then fit it to a dense neural network to do classification.
  • In our model summary, we have our embeddings, our Bidirectional contains LSTM, followed by two dense layers. The output from Bidirectional is 128, because it doubled what we put in LSTM. I also stacked LSTM layer to improve the results.
  • We are using early stop, which stops when the validation loss no longer improves.
  • Visualize training over time and the results were good.

Logistic Regression

This time, we are going to create a simple logistic regression model to classify news to either real or fake, using the same data sets, same methods of text cleaning and the same way of train_test_split.


  • In the following pre-processing, we strip off any html tags, punctuation, and make them lower case.
  • The following code combines tokenization and stemming techniques together, and then apply the techniques on “title_text”.


Here we transform “title_text” feature into TF-IDF vectors.

  • Instead of tuning C parameter manually, we can use an estimator which is LogisticRegressionCV.
  • We specify the number of cross validation folds cv=5 to tune this hyperparameter.
  • The measurement of the model is the accuracy of the classification.
  • By setting n_jobs=-1, we dedicate all the CPU cores to solve the problem.
  • We maximize the number of iterations of the optimization algorithm.
  • Evaluate the performance.


We found that the deep learning model and Logistic Regression produced the similar results. Only the training time for Logistic Regression is half of the training time for deep learning.



Changing the world, one post at a time. Sr Data Scientist, Toronto Canada.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store