by Jeff Smith
This looks pretty neat: https://new.mta.info/article/introducin ... ip-dataset
https://data.ny.gov/Transportation/MTA- ... about_data
https://data.ny.gov/Transportation/MTA- ... about_data
If you’ve ever used our subway ridership datasets, you know that the MTA has a very good sense of how many people are riding the subway and where they’re boarding. But this suggests another obvious question: where are those people going? This is one of the most requested datasets by policymakers and MTA Open Data users—and the answer is a little complicated. Our ridership data comes from MetroCard swipes and OMNY taps at entry turnstiles, but our subway system has no similar swipe or tap requirement on exit. However, even though we don’t have perfect exit data, what we do have are pretty good estimates of how our riders move throughout the system.
This week on Open Data, we are excited to share those “pretty good estimates” with the MTA’s new Subway Origin-Destination Ridership Estimate open datasets. These datasets provide an estimation of the number of riders that traveled between a given origin-destination pair for each hour of day and day of the week, averaged over a calendar month, like so:
Year
Month
Day of Week
Hour
Origin Station
Destination Station
Estimated Trips
2024
5
Monday
8
Times Sq-42 St
Court Sq
123.45
2024
5
Monday
8
Times Sq-42 St
Grand Central-42 St
456.78
This data shows us how the entire city travels, and how the approximately 4 million trips that take place on the subway on a typical weekday move the city. To get a high-level sense of how this works, take a look at the Kepler.gl animation below, which shows how riders moved across the system during an average week (Monday-Friday) in June 2024.
...
In this animation, each arc represents a journey for a number of users (identified by arc thickness), from an origin station (colored in light blue) to a destination station (colored in orange). Hundreds of thousands of riders travel to Manhattan’s central business district in the morning from across the city, followed by their return in the evening. Travel visibly varies across the city—with the many permutations that arise from millions of riders traveling freely across the subway’s 425 stations and station complexes.
About the data
Let’s talk a bit more about the data. This dataset is based off of the ‘Destination Inference’ step of our ridership model, which we detailed in a previous blog post. As that post outlines, the basis of this model is the assumption that a subway trip’s destination is the station the rider next swipes/taps at. If a MetroCard swipes into Bowling Green at 9:15 a.m., and then that same MetroCard swipes into the 103 St stop in East Harlem later that afternoon, we make the imperfect (but pretty good) inference that this 9:15 a.m. trip traveled from Bowling Green to 103 St. These “linked trips” are what form the basis of our understanding of how riders travel across the system (Note 1).
In this Subway Origin-Destination (OD) dataset, we’ve taken these assigned destinations generated by our destination inference process and aggregated them by origin-destination station complex pair and hour of day. These totals are then further aggregated by averaging over a calendar month. Removing personally identifying information, like MetroCard ID numbers, and aggregating ridership data over a calendar month is done to protect the privacy of MTA riders by preventing the association of a single MetroCard swipe or subway trip to a specific person or hour. The format of this aggregated dataset allows users to understand for “an average 9 a.m. hour during the month of May,” roughly how many people travelled between two subway complexes.
It’s important to keep a few things in mind when using this data:
Because this data is the result of a modeling process, the ridership numbers for each origin-destination pair are estimates, not exact values. This modeling process, as well as the monthly aggregation, results in fractional ridership values—we’ve intentionally left ridership estimates as decimals to reflect the uncertainty inherent in this dataset.
Because this data represents a monthly average, users should be mindful that holidays, construction, or other important events that take place during a given month might impact ridership estimates.
Since the modeling process only looks at subway station entries, we can’t quantify how many of these trips truly started and ended at these subway station complexes and how many may have included a transfer from or to another mode of transit (e.g. a bus) at either or both ends.
When using the data to look at arrivals to a subway station, users should note that the timestamp for each OD pair is rounded down to the nearest hour of the entry swipe (or tap) and does not account for the travel time between the entry swipe and arrival at the destination (Note 2).
How can we dig deeper with this data?
Where are people going from a station?
One of the clearest use cases for this data is better understanding the journeys that riders are taking from a given subway station. Take the Court Sq station on the lines as an example. The following map and graph represent the various destinations for trips originating at this station for Saturdays in June 2024. We can see that riders entering at this station largely travel to destinations in Queens or Midtown Manhattan.
...
Estimating destination foot traffic
We can also use this data to zoom in on ridership patterns at specific subway station complexes to understand foot traffic in a certain area. For example, someone opening a new business and deciding on optimal operating hours and labor needs could use this dataset to investigate the approximate volume of subway trips arriving to nearby subway station(s) by hour of day for weekdays compared to weekends, or summer months compared to winter months. This data could also support an analysis of where these riders are coming from to better tailor marketing towards these riders.
Let's continue with our example of Court Sq. We can see that weekend ridership to this station doesn’t really pick up until 10 a.m. but stays high until 10 p.m., so a business owner might want to shift their opening hours later on the weekends. We can also see that on weekdays the morning and evening peaks are very similar in scale, suggesting that this station is in an area where many full-time workers both live (evening arrivals) and work (morning arrivals).
...
Next stop, Willoughby
~el Jefe :: RAILROAD.NET Site Administrator/Co-Owner; Carman at Naugatuck Railroad
YouTube Instagram Facebook
~el Jefe :: RAILROAD.NET Site Administrator/Co-Owner; Carman at Naugatuck Railroad
YouTube Instagram Facebook