Machine learning can hinder foreign influence campaigns in social media, study shows
Amy J. Born | July 29, 2020Researchers are using machine learning to identify trolls — malicious internet accounts — based on their past behavior in order to diminish outside interference in the 2020 American election. Software engineers could use this study to create a real-time monitoring system that would expose foreign influence in U.S. politics.
The model, developed by a research team led by Princeton University, investigated misinformation campaigns from China, Russia and Venezuela that targeted the U.S. before and after the 2016 election. By analyzing Twitter and Reddit posts, the team found patterns that the campaigns followed as well as the included URLs and hyperlinks. They were able to effectively identify posts and accounts of foreign influence campaigns, including accounts that had not been used previously.
Jacob N. Shapiro, professor of politics and international affairs at the Princeton School of Public and International Affairs, stated that this research allows someone to estimate how much misinformation is online in a given time, and what these foreign actors are posting about. "You can only imagine how much better this could be if someone puts in the engineering efforts to optimize it," he said.
The team posed the question, "Using only content-based features and examples of known influence campaign activity, could you look at other content and tell whether a given post was part of an influence campaign?" The knowledge that coordinated efforts to spread information requires a large amount of human and bot-driven information sharing led them to theorize that similar posts would appear frequently on multiple platforms over time. They used data that had been collected over many years by NYU's Center for Social Media and Politics on troll campaigns from Twitter and Reddit. That included publicly available data from 8,000 accounts of Chinese, Russian and Venezuelan trolls, as well as 7.2 million posts from these accounts spanning late 2015 through 2019.
The team's analysis considered a number of characteristics of each post, such as the timing, word count, or if the URL domain mentioned is a news website. They also looked at how the messaging related to other information shared at that time, known as metacontent.
Through extensive testing, the team is confident in its ability to distinguish the posts that were part of an influence operation and those that were not, and that content-based features can contribute to finding these coordinated campaigns on social media.
The paper, "Content-Based Features Predict Social Media Influence Operations," appears in Science Advances.