File: Dodgers-Marketing_EDA.ipynb
Name: Corinne Medeiros
Date: 9/20/20
Desc: Improving a Dodgers Marketing Promotion (p.1 Python)
Usage: Program previews and summarizes Dodgers data, and also generates exploratory visualizations.

Improving a Dodgers Marketing Promotion - Exploratory Data Analysis in Python

Objective: Determine what night would be the best to run a marketing promotion to increase attendance.

Data source:
Dodgers Major League Baseball data from 2012
dodgers.csv

Importing and Previewing Data

Exploratory Visualizations

The above boxplot illustrates the highest monthly median attendance being in June, with a consistently higher attendance than the rest of the months. October on the other hand has the lowest median and the most consistently lower attendance. April and July have a similar median and wide range of attendance, but overall July attendance is greater.

This boxplot shows us that the highest daily median attendance is on Tuesdays, with a consistently higher attendance than the rest of the days. Mondays have the lowest median and consistently lower attendance. At this point, now that I have an idea about monthly and daily averages, it will help to visualize what other factors could have an effect on attendance.

Looking at the scatterplots of attendance by week day and month, with a few additional factors taken into consideration using colors, it looks like timing is most important. Neither temperature nor fireworks appear to have a large effect on attendance. Based on this initial exploration, I'll now switch over to RStudio to perform a few machine learning calculations in R.