photo credit: Pixabay

Fake News Classification with LSTM or Logistic Regression

Tensorflow, Keras, DataSpell, Scikit-Learn, PyCharm

The Data


Deep Learning


  • We load Fake.csv and True.csv.
  • Remove the useless columns, we only need title & text.
  • Label fake news as 0, and real news as 1.
  • Concatenate two data frames into one.
  • Combine title & text into one column.
  • Standard text cleaning process such as lower case, remove extra spaces and url links.
  • The way we split training and testing data must be the same for deep learning model and Logistic Regression.
  • We put the parameters at the top like this to make it easier to change and edit.


  • Tokenizer does all the heavy lifting for us. In our articles (aka title + text) that it was tokenizing, it will take 10,000 most common words. oov_tok is to put a special value in where an unseen word is encountered. This means I want “OOV” in bracket to be used to for words that are not in the word index. fit_on_text will go through all the text and create dictionary.
  • After tokenization, the next step is to turn those tokens into lists of sequence.
  • When we train neural networks for NLP, we need sequences to be in the same size, that’s why we use padding. Our max_length is 256, so we use pad_sequence to make all of our articles (aka title + text) the same length which is 256.
  • In addition, there are padding type and truncating type, we set both of them “post”., meaning for example, if one article at 200 in length, we padded to 256, and we padded at the end, add 56 zeros.

Building the Model

  • An embedding layer stores one vector per word. When called, it converts the sequences of word indices into sequences of vectors. After training, words with similar meanings often have the similar vectors.
  • Next is how to implement LSTM in code. The Bidirectional wrapper is used with a LSTM layer, this propagates the input forwards and backwards through the LSTM layer and then concatenates the outputs. This helps LSTM to learn long term dependencies. We then fit it to a dense neural network to do classification.
  • In our model summary, we have our embeddings, our Bidirectional contains LSTM, followed by two dense layers. The output from Bidirectional is 128, because it doubled what we put in LSTM. I also stacked LSTM layer to improve the results.
  • We are using early stop, which stops when the validation loss no longer improves.
  • Visualize training over time and the results were good.

Logistic Regression


  • In the following pre-processing, we strip off any html tags, punctuation, and make them lower case.
  • The following code combines tokenization and stemming techniques together, and then apply the techniques on “title_text”.


  • Instead of tuning C parameter manually, we can use an estimator which is LogisticRegressionCV.
  • We specify the number of cross validation folds cv=5 to tune this hyperparameter.
  • The measurement of the model is the accuracy of the classification.
  • By setting n_jobs=-1, we dedicate all the CPU cores to solve the problem.
  • We maximize the number of iterations of the optimization algorithm.
  • Evaluate the performance.




Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store