File: Tokyo-Olympics-Tweets.ipynb
Name: Corinne Medeiros
Date: 8/8/21
Desc: Analyzing Tweets from Tokyo Olympics
Usage: Program imports and cleans data, generates charts, and calculates sentiment.

Sentiment analysis of tweets about debut sports during the Tokyo 2020 Olympics

Data Source:

Tokyo Olympics 2020 Tweets
https://www.kaggle.com/gpreda/tokyo-olympics-2020-tweets

This dataset from Kaggle contains one csv file with over 150,000 tweets pulled from Twitter using the topic #Tokyo2020. Additional data about each tweet include username, user location, user description, hashtags, date, source, and more (see link above for a full list of available attributes). The data are collected using the Twitter API and the Tweepy Python library. The most updated pull comes from July 28, 2021, which is the version used for this project.

Loading Data

Text Cleanup

Filtering data by sport

Surfing

Skateboarding

Text Mining

Surfing

Skateboarding

Word Frequencies - Surfing Tweets

Plotting Common Words - Surfing

Word Frequencies - Skateboarding Tweets

Plotting Common Words - Skateboarding

Now that we have a better understanding of the content within surfing and skateboarding tweets, let's find out the general sentiment towards these two sports.

Sentiment Analysis of Surfing Tweets

To make this easier to visualize and interpret, we'll remove observations with 0 polarity and create a break at 0.

Now we're able to make out the polarity values much more easily. The majority of tweets about surfing are on the positive side of the scale.

Sentiment Analysis of Skateboarding Tweets

The majority of the tweets about skateboarding are also positive.

With my filtered and analyzed datasets saved, I'll use Tableau to explore the data further and create final visualizations.