The most commonly used method for estimating state- level opinion is called disaggregation. The process is simple and easy to implement: After combining a set of national polls, you calculate the opinion percentages disaggregated by state.
The problem with disaggregation is that it requires a large number of national surveys that collected over 10 years or more to create a sufficient sample size within each state. In addition, disaggregation does not correct for sampling issues and may obscure temporal dynamics in state opinion.
To overcome these drawbacks, Multilevel Regression with Post-stratification (MRP) was developed to estimate American state-level opinions from…
The Granger causality test is a statistical hypothesis test for determining whether one time series is a factor and offer useful information in forecasting another time series.
For example, given a question: Could we use today’s Apple’s stock price to predict tomorrow’s Tesla’s stock price? If this is true, our statement will be Apple’s stock price Granger causes Tesla’s stock price. If this is not true, we say Apple’s stock price does not Granger cause Tesla’s stock price.
So, let’s go to Yahoo Finance to fetch the adjusted close stock price data for Apple, Walmart and Tesla, start from 2010–06–30…
Because of the trials are still ongoing, researchers caution against making head-to-head comparisons of vaccines based on incomplete data. But for the sake of learning, we will do it anyway, just not making any meaningful conclusions.
Recently, the announcements went out that the potential effectiveness of SARS-CoV-2 vaccine candidates developed by Pfizer-Biontech, Moderna, AstraZeneca regimen 1 and AstraZeneca regimen 2 to be 95%, 94.5%, 90% and 62% respectively. We know that some of the data are incomplete. The following analysis will be based on whatever information we currently have.
Based on the information from the announcements and the other scientist’s…
Autoencoders are an unsupervised learning technique, although they are trained using supervised learning methods. The goal is to minimize reconstruction error based on a loss function, such as the mean squared error.
In this post, we will try to detect anomalies in the Johnson & Johnson’s historical stock price time series data with an LSTM autoencoder.
The data can be downloaded from Yahoo Finance. The time period I selected was from 1985–09–04 to 2020–09–03.
The steps we will follow to detect anomalies in Johnson & Johnson stock price data using an LSTM autoencoder:
Most of the researchers submit their research papers to academic conference because its a faster way of making the results available. Finding and selecting a suitable conference has always been challenging especially for young researchers.
However, based on the previous conferences proceeding data, the researchers can increase their chances of paper acceptance and publication. We will try to solve this text classification problem with deep learning using BERT.
Almost all the code were taken from this tutorial, the only difference is the data.
The dataset contains 2,507 research paper titles, and have been manually classified into 5 categories (i.e. …
This time, we are going to create a simple logistic regression model to classify COVID news to either true or fake, using the data I collected a while ago.
The process is surprisingly simple and easy. We will clean and pre-process the text data, perform feature extraction using NLTK library, build and deploy a logistic regression classifier using Scikit-Learn library, and evaluate the model’s accuracy at the end.
The data set contains 586 true news and 578 fake news, almost 50/50 split. …
It’s not easy for ordinary citizens to identify fake news. And fake coronavirus news is no exception.
As part of an effort to combat misinformation about coronavirus, I tried and collected training data and trained a ML model to detect fake news on coronavirus.
My training data is not perfect, but I hope it will be useful to help us understand whether fake news differs systematically from real news in style and language use. So, let’s find out.
As mentioned in the previous article, I collected over 1,100 news articles and social network posts on COVID-19 from a variety of…
It is heart breaking to learn that Half of Canadians fooled by Covid-19 conspiracy theories.
According to the WHO, the COVID-19 related infodemic is just as dangerous as the virus itself. Similarly, conspiracy theories, myths and exaggerated facts could have consequences that go way beyond public health.
To explore the content of COVID-19 fake news, I use strict definitions of what true and fake news stories are. Specifically, true news articles are articles that are known to…
Was intrigued by one of Thomas L. Friedman’s OpEd on New York Times last week: “A Plan to Get America Back to Work”. It advocated a Data-Driven approach to the COVID-19 Pandemic: that is, limiting the number of infections and deaths from the coronavirus, in the same time maximizing the speed at which we can safely fold workers back into the workplace, based on the best data and expert advice.
In another OpEd, he offered detailed plan on how to accomplish this step-by-step.
In the travel and tourism industry, segmentation is an important strategy for developing itineraries and marketing materials targeted towards different groups with varying travel intents and motivations. It helps the businesses to understand the subgroups that make up the audience so that the businesses can better tailor products and messages.
One caveat in the travel industry is that unlike online shopping, leisure travel is an infrequent purchase, most leisure travelers only travel once or twice every year, active customers are either uncommon or very slow in making their bookings.
But the good thing is that people love to travel and…