In this project, I analyze historical bike trip data from a Chicao bike-share company, using R and create powerful visualizations using Tableau.

A more in-depth breakdown of the scenario is included below, followed by my full report.

Scenario

Cyclistic is a bike-share company in Chicago with over 5,824 bikes that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime. There are 2 types of customers: 1) casual riders who purchase single-ride or full-day memberships and 2) annual members who purchase annual memberships.

I am a junior data analyst working in the marketing analyst team and the director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore,your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations.


ASK: Defining the Business Task

Three questions will guide the future marketing program:

  • How do annual members and casual riders use Cyclistic bikes differently?
  • Why would casual riders buy Cyclistic annual memberships?
  • How can Cyclistic use digital media to inuence casual riders to become members?

The dierctor of marketing and my manager has assigned me the first question to answer. Therefore the business task can be stated as follows:


PREPARE: Data sources

We used historical bike trip data from the last full year (12 months): January 2022 – December 2022. This helps us get a full picture of the annual calendar year and the potential effect from seasons factors. The data is from Divvy Bike Trip Data and has been made publicly available by Motivate International Inc. under this license . We downloaed the following CSV files:

1)  2022-01_divvy_trip-data.csv
2)  2022-02_divvy_trip-data.csv  
3)  2022-03_divvy_trip-data.csv  
4)  2022-04_divvy_trip-data.csv  
5)  2022-05_divvy_trip-data.csv  
6)  2022-06_divvy_trip-data.csv  
7)  2022-07_divvy_trip-data.csv  
8)  2022-08_divvy_trip-data.csv  
9)  2022-09_divvy_trip-data.csv //renamed to match other filenames.
10) 2022-10_divvy_trip-data.csv  
11) 2022-11_divvy_trip-data.csv  
12) 2022-12_divvy_trip-data.csv  

The data is organized with each row (record) corresponding to a single trip identified by ride_id and incldues the following columns (fields):

* ride_id               #Ride id - unique
* rideable_type         #Bike type - Classic, Docked, Electric
* started_at            #Trip start day and time
* ended_at              #Trip end day and time
* start_station_name    #Trip start station
* start_station_id      #Trip start station id
* end_station_name      #Trip end station
* end_station_id        #Trip end station id
* start_lat             #Trip start latitude  
* start_lng             #Trip start longitute   
* end_lat               #Trip end latitude  
* end_lat               #Trip end longitute   
* member_casual         #Rider type - Member or Casual  

There are no issues with bias or credibility as all personal identifiable information (PII) has been removed. The data is also credible as it is primary data from the company itself. In other words, it ROCCC’s:

  • Reliable and Original: the data is both reliable and original as it is primary source data.
  • Comprehensive: the data has all of the relevant fields necessary for our historical analysis.
  • Current: the data is current as it is from the specific time frame we need 2022 and is updated monthly.
  • Cited: the data is cited as it is verified as a primary source. I have saved a folder of the original data and made copies to manipulate for my analysis.

PROCESS: Data Cleaning & Manipulation

R (Programming Language): Initial Data Cleaning and Manipulation

Due to the large size and number of files, I performed my processing and analysis using R and exported it for visualization using Tableau.

Step 0: Initialize Workspace

Step 1: Collect Data

I downloaded the files and renamed 2022-09_divvy_trip-data.csv to match the other filenames.

Step 2: Wrangle Data and Combine into a Single File

Note, we will need to compare the column names to see if an rbind is possible. We use the helpful compare_df_cols_same fcn from our janitor library. This saves space as the alternative would be to call the column name of each file and manually compare them (e.g. colnames(jan22) ).

Then we inspect the dataframes to look for incongruencies.

Lastly, we combine the files into a single dataframe.

Step 3: Clean Up and Add Data

First, we inspect the new dataframe that has been created:

We note that there are a few problems we need to fix:

  • The data can only be aggregated at the ride-level, which is too granular. We will want to add some additional columns of data (e.g. day, month, year) that provide additional opportunities to aggregate the data.
  • We will want to add a calculated field for length of ride as ride_length for helpful analysis and inspect it.

Note, we need to convert ride_length to numeric and convert it to minutes as it’s easier to understand:

Before we continue, we decide to remove “bad” data that includes missing start/end station id and/or names or has negative ride_length values. Then, we will create a trip_type variable to inspect if there are any insights in how casual riders and annual members use the service differently based on this:

Descriptive Analysis

We will begin with a descriptive analysis on our cleaned up data set all_trips_v2. We will compare the mean, median, max, and min between casual riders and annual members.

Ride Length Summary

We also analyze by DAY, MONTH, and RIDE_TYPE by WEEKDAY

There are a few things to note:

  • At first glance, casual riders’ trips are almost 2x as long as annual members’ on average.
  • However, we note that the mean for the total trips is 10.60 min while the mean is 17.10 min, indicating a presence of outliers skewing the average.
  • This is confirmed as the max for a casual rider is 34,354 minutes, which would indicate a ride length of about 23 days! * Furthermore, the max ride for an annual member is 1,493 minutes(or 24 hours!)
  • Upon glancing at the data, we recongize using the median and also the total amount of rides (versus length) may be more helfpul.

Visual Analysis

Next, we will conduct a visual analysis. First, we will note how annual members and casual riders differ on the total amount of rides given the day of the week.

Here is the resulting visualization: Day of Week vs. Ride_Type

We will break down this visualization further:

Here is the resulting visualization: Rider Type

INSIGHT: Annual members prefer the weekdays whereas casual riders prefer the weekends.

Next, we will note how annual members and casual riders differ on the average duration of rides given the day of the week.

Here is the resulting visualization: Average Duration

Last, we will note how annual members and casual riders differ on the average median of rides given the day of the week (due to the skewing mentioned earlier).

Here is the resulting visualization: Median Duration

As we see, these totals are closer together, and the maximum values are around the 1000 minutes range compared to the 1750 minutes range from the mean graph above. Also, we recognize that in both the mean and median, casual riders have longer ride lengths than annual members

INSIGHT: Casual riders have longer ride lengths than annual members


SHARE: Supporting Visualizations and Key Findings

Lastly, we export the aggregate data for visualizations in Tableau:


Conclusion

Stakeholder presentation and dashboard

I’ve provided links below for my dashboard and shareholder presentation, which includes the following:

  • A summary of my analysis
  • Supporting visualizations and key findings
  • Three recommendations based on my analysis

Tableau Dashboard: Bikeshare in Chicago

GitHub: Bike Share in Chicago Repository

Slides: Navigating Speedy Success