This semester, I worked
on a project related to Twitter data analysis on politically motivated
race-related events. This includes the trial for Patrick Crusius’s El Paso
shooting at a Walmart in Texas, incidents of hate on college campuses, and
other crimes committed by political extremists in 2020. Tweets on each of these
incidents were gathered and categorized into non-offensive, offensive, and hate
speech categories. The goal of this project was to design a machine-learning
algorithm that could effectively determine hate speech from tweets related to
politically motivated events. The jargon for political hate speech is
constantly changing, which was why I decided to specifically focus on current
politics to design the most effective algorithm. Additionally, many studies
suggest that a strong correlation exists between online hate speech and offline
violence. Researchers are constantly trying to improve machine-learning hate
speech algorithms and I attempted to design an algorithm of my own that could
pick up on the most concealed variations of political hate speech.
For most of this project,
I used python to gather and analyze my tweets, the hate base API for testing, and
Octave to compare the accuracy of the results from my algorithm with existing
hate speech detectors. I fed the machine-learning algorithm four out of the
five incidents of hate, which consisted of 80% of the tweets gathered, and the
rest were used for testing. The algorithm had an accuracy of about 81% while
the Hate Base API was unable to categorize even the most obvious political hate
speech. I also performed some qualitative analysis on the tweets gathered by
analyzing how sentiments and popularity of tweets changed following the first
few weeks after an incident occurred.
My project was originally
geared towards qualitative analyzation of the tweets using the hate base API.
However, after discovering that it was not able to detect political hate speech
at all, I decided to design my own hate speech algorithm, which took up most of
my project. Originally, I planned on performing more testing related to
political hate incidents in 2020. However, due to Covid-19 and quarantining, I couldn’t
find as many political hate incidents that went viral on Twitter, so I was
unable to gather as many tweets and perform the testing I wanted to. Instead, I
just worked on further qualitative analysis of the tweets and comparing my hate
speech detection algorithm against the most widely used hate speech detection
platform, Hate Base. In terms of my next steps for this project, I plan on
making my hate speech algorithm broader because it is only able to recognize
political hate speech currently. Additionally, I hope to utilize the viral
tweets generated by future hate incidents on social media to further teach my
machine-learning algorithm and increase its accuracy.