Over the course of this project, I accomplished the following:
Was tasked with finding out how casual riders and member riders differ
Prepared the dataset named divvy2024_tripdata
Combined each month in the dataset into one table named combined_tripdata
Calculated ride duration in minutes
Removed all Null values that would interfere with my analysis
Randomly sampled the data to get a sample of 100 thousand rides
Filtered out tourist hubs as outliers
Discovered that casual riders had a preference for longer travel times and electric bikes
Mostly concentrated around university campuses/ nearby apartment complexes
Above is the interactive dashboard I made from the sampled data. NOTE: This is the RAW data containing the data from tourist hubs, some results may differ from my analysis.
To make sure I was working with the most relevant data, I needed to remove majority-tourist hubs from the dataset. To help with this, I made an overlay from a map of Chicago's tourist areas, and placed it on top of my map of Casual Riders.
Observe that Casual riders take much longer trips than Paid Members
While taking fewer trips.
Prefer riding Electric Bikes versus Pedal Bikes.
Members make more stops, but Casual riders go longer distances. Such as to/from apartments and university.
I had to assemble the data into a Cloud Console Bucket. This dataset can be found on Kaggle, here.
I then had to sample it as randomly as possible, while not adding bias. With the help of Gemini Advanced, I kept the ratio of month/total the same, and sampled 100k records out of the 4.5 million.
A view of the query used to sample the data.
Sources used:
Downloaded raw data from Divvy Bikes.
The data used was collected from Divvy Bikes, and provided by Motivate International Inc. under this license.
Chicago Printable Tourist Map © Tripomatic, 2012. Built using data from Sygic Travel, CloudMade and OpenStreetMap.org. Licensed under CCBYSA.