Computer algorithm finds fake news and April Fools’ articles have linguistic similarities
Siobhan Treacy | April 04, 2019
Natural language experts from Lancaster University used machine learning to compare April Fools’, fake and genuine news articles. The study found that authors of fake news and April Fools’ articles use similar linguistic techniques to attempt to trick the reader into believing the article’s content.
The team wanted to use April Fools’ articles because they are written to convince the reader that whatever the article is about is true. April Fools’ articles gave the researchers insight into the linguistic techniques an author uses when creating a deceptive article.
The researchers created a dataset of 500 April Fools’ articles from 370 websites that were written over the last 14 years. The team compared these articles to fake and genuine news articles. All the articles were written within the same time period but were not published on April 1 (April Fools' Day). During the study, the team looked at the details used, vague words and phrases used, whether a formal writing style was used and how complex the language was.
A machine learning classifier was used to analyze the articles. The classifier was 75% accurate when identifying April Fools' articles and 72% accurate when identifying fake news articles. When the team trained the classifier on April Fools’ stories and tasked it with identifying fake news, it was 65% accurate.
The results proved that April Fools’ articles and fake news articles have many common characteristics. Both kinds of articles used less complex language, were easier to read and had longer sentences than genuine news articles. Both articles used important story details, like names, dates and places, less so than a genuine news story. However, the articles did have some differences. Fake news articles used proper nouns, like the names of politicians, more than April Fools’ or real news articles.
The team also compared fake news and April Fools’ articles to genuine news articles. The fake news and April Fools’ articles were shorter, easier to read and used simple language and more first-person pronouns than real news articles. The April Fools’ articles referred to vague future events, had more references to present events, used more unique words and fewer proper nouns than real news articles. The fake news articles used less punctuation, more proper nouns, simpler language, had more spelling mistakes and were generally less formal. Fake news articles used first names and profanity more than the other articles.
The results suggest that there are similarities between all kinds of fake articles, no matter if their intent is to spread misinformation or an April Fools' joke. The team said that their study could open new doors for future articles on fake news.
The paper on this study will be presented at the 20th International Conference on Computational Linguistics and Intelligent Text Processing.