OSCAR at George Mason University: URSP Student Leela Yaddanapudi Develops a Hate Speech Detection Algorithm for Political Hate Incidents on Twitter

Tuesday, May 26, 2020

URSP Student Leela Yaddanapudi Develops a Hate Speech Detection Algorithm for Political Hate Incidents on Twitter

This semester, I worked on a project related to Twitter data analysis on politically motivated race-related events. This includes the trial for Patrick Crusius’s El Paso shooting at a Walmart in Texas, incidents of hate on college campuses, and other crimes committed by political extremists in 2020. Tweets on each of these incidents were gathered and categorized into non-offensive, offensive, and hate speech categories. The goal of this project was to design a machine-learning algorithm that could effectively determine hate speech from tweets related to politically motivated events. The jargon for political hate speech is constantly changing, which was why I decided to specifically focus on current politics to design the most effective algorithm. Additionally, many studies suggest that a strong correlation exists between online hate speech and offline violence. Researchers are constantly trying to improve machine-learning hate speech algorithms and I attempted to design an algorithm of my own that could pick up on the most concealed variations of political hate speech.

For most of this project, I used python to gather and analyze my tweets, the hate base API for testing, and Octave to compare the accuracy of the results from my algorithm with existing hate speech detectors. I fed the machine-learning algorithm four out of the five incidents of hate, which consisted of 80% of the tweets gathered, and the rest were used for testing. The algorithm had an accuracy of about 81% while the Hate Base API was unable to categorize even the most obvious political hate speech. I also performed some qualitative analysis on the tweets gathered by analyzing how sentiments and popularity of tweets changed following the first few weeks after an incident occurred.

My project was originally geared towards qualitative analyzation of the tweets using the hate base API. However, after discovering that it was not able to detect political hate speech at all, I decided to design my own hate speech algorithm, which took up most of my project. Originally, I planned on performing more testing related to political hate incidents in 2020. However, due to Covid-19 and quarantining, I couldn’t find as many political hate incidents that went viral on Twitter, so I was unable to gather as many tweets and perform the testing I wanted to. Instead, I just worked on further qualitative analysis of the tweets and comparing my hate speech detection algorithm against the most widely used hate speech detection platform, Hate Base. In terms of my next steps for this project, I plan on making my hate speech algorithm broader because it is only able to recognize political hate speech currently. Additionally, I hope to utilize the viral tweets generated by future hate incidents on social media to further teach my machine-learning algorithm and increase its accuracy.