Discussion relating to the past and present operations of the NYC Subway, PATH, and Staten Island Railway (SIRT).

Moderator: GirlOnTheTrain

  by Jeff Smith
 
This looks pretty neat: https://new.mta.info/article/introducin ... ip-dataset

https://data.ny.gov/Transportation/MTA- ... about_data
If you’ve ever used our subway ridership datasets, you know that the MTA has a very good sense of how many people are riding the subway and where they’re boarding. But this suggests another obvious question: where are those people going? This is one of the most requested datasets by policymakers and MTA Open Data users—and the answer is a little complicated. Our ridership data comes from MetroCard swipes and OMNY taps at entry turnstiles, but our subway system has no similar swipe or tap requirement on exit. However, even though we don’t have perfect exit data, what we do have are pretty good estimates of how our riders move throughout the system.

This week on Open Data, we are excited to share those “pretty good estimates” with the MTA’s new Subway Origin-Destination Ridership Estimate open datasets. These datasets provide an estimation of the number of riders that traveled between a given origin-destination pair for each hour of day and day of the week, averaged over a calendar month, like so:

Year

Month

Day of Week

Hour

Origin Station

Destination Station

Estimated Trips

2024

5

Monday

8

Times Sq-42 St

Court Sq

123.45

2024

5

Monday

8

Times Sq-42 St

Grand Central-42 St

456.78

This data shows us how the entire city travels, and how the approximately 4 million trips that take place on the subway on a typical weekday move the city. To get a high-level sense of how this works, take a look at the Kepler.gl animation below, which shows how riders moved across the system during an average week (Monday-Friday) in June 2024.
...
In this animation, each arc represents a journey for a number of users (identified by arc thickness), from an origin station (colored in light blue) to a destination station (colored in orange). Hundreds of thousands of riders travel to Manhattan’s central business district in the morning from across the city, followed by their return in the evening. Travel visibly varies across the city—with the many permutations that arise from millions of riders traveling freely across the subway’s 425 stations and station complexes.

About the data
Let’s talk a bit more about the data. This dataset is based off of the ‘Destination Inference’ step of our ridership model, which we detailed in a previous blog post. As that post outlines, the basis of this model is the assumption that a subway trip’s destination is the station the rider next swipes/taps at. If a MetroCard swipes into Bowling Green at 9:15 a.m., and then that same MetroCard swipes into the 103 St stop in East Harlem later that afternoon, we make the imperfect (but pretty good) inference that this 9:15 a.m. trip traveled from Bowling Green to 103 St. These “linked trips” are what form the basis of our understanding of how riders travel across the system (Note 1).

In this Subway Origin-Destination (OD) dataset, we’ve taken these assigned destinations generated by our destination inference process and aggregated them by origin-destination station complex pair and hour of day. These totals are then further aggregated by averaging over a calendar month. Removing personally identifying information, like MetroCard ID numbers, and aggregating ridership data over a calendar month is done to protect the privacy of MTA riders by preventing the association of a single MetroCard swipe or subway trip to a specific person or hour. The format of this aggregated dataset allows users to understand for “an average 9 a.m. hour during the month of May,” roughly how many people travelled between two subway complexes.

It’s important to keep a few things in mind when using this data:

Because this data is the result of a modeling process, the ridership numbers for each origin-destination pair are estimates, not exact values. This modeling process, as well as the monthly aggregation, results in fractional ridership values—we’ve intentionally left ridership estimates as decimals to reflect the uncertainty inherent in this dataset.
Because this data represents a monthly average, users should be mindful that holidays, construction, or other important events that take place during a given month might impact ridership estimates.
Since the modeling process only looks at subway station entries, we can’t quantify how many of these trips truly started and ended at these subway station complexes and how many may have included a transfer from or to another mode of transit (e.g. a bus) at either or both ends.
When using the data to look at arrivals to a subway station, users should note that the timestamp for each OD pair is rounded down to the nearest hour of the entry swipe (or tap) and does not account for the travel time between the entry swipe and arrival at the destination (Note 2).
How can we dig deeper with this data?
Where are people going from a station?
One of the clearest use cases for this data is better understanding the journeys that riders are taking from a given subway station. Take the Court Sq station on the ​​​​ lines as an example. The following map and graph represent the various destinations for trips originating at this station for Saturdays in June 2024. We can see that riders entering at this station largely travel to destinations in Queens or Midtown Manhattan.

Image
...
Estimating destination foot traffic
We can also use this data to zoom in on ridership patterns at specific subway station complexes to understand foot traffic in a certain area. For example, someone opening a new business and deciding on optimal operating hours and labor needs could use this dataset to investigate the approximate volume of subway trips arriving to nearby subway station(s) by hour of day for weekdays compared to weekends, or summer months compared to winter months. This data could also support an analysis of where these riders are coming from to better tailor marketing towards these riders.

Let's continue with our example of Court Sq. We can see that weekend ridership to this station doesn’t really pick up until 10 a.m. but stays high until 10 p.m., so a business owner might want to shift their opening hours later on the weekends. We can also see that on weekdays the morning and evening peaks are very similar in scale, suggesting that this station is in an area where many full-time workers both live (evening arrivals) and work (morning arrivals).
...

  by Allan
 
CAUTION ADVISED if connecting directly to https://data.ny.gov/Transportation/MTA- ... about_data

It may just be my connection but AOL won't connect to the direct website link that Jeff provided. For AOL not to allow a connection is very strange.

The message I got: " The owner of data.ny.gov has configured their website improperly. To protect your information from being stolen, AOL Shield has not connected to this website. "

For all the years I have used AOL (many many years), I have never gotten this type of message especially when a government site is involved.

Others may have better luck in connecting. I'll try again tomorrow from a computer in my local library.
  by Kilgore Trout
 
Links to the actual origin-destination for 2023 and 2024 to date. Export seems to be broken at the moment but you can peruse the data online under Actions->Query.
Allan wrote:CAUTION ADVISED if connecting directly to https://data.ny.gov/Transportation/MTA- ... about_data

It may just be my connection but AOL won't connect to the direct website link that Jeff provided. For AOL not to allow a connection is very strange.

The message I got: " The owner of data.ny.gov has configured their website improperly. To protect your information from being stolen, AOL Shield has not connected to this website. "

For all the years I have used AOL (many many years), I have never gotten this type of message especially when a government site is involved.

Others may have better luck in connecting. I'll try again tomorrow from a computer in my local library.
Seems like a false positive - I don't see anything obviously wrong with data.ny.gov (my day job somewhat involves web security, nothing stands out to me).
  by STrRedWolf
 
Allan wrote: Wed Jul 31, 2024 6:39 pm CAUTION ADVISED if connecting directly to https://data.ny.gov/Transportation/MTA- ... about_data

It may just be my connection but AOL won't connect to the direct website link that Jeff provided. For AOL not to allow a connection is very strange.

The message I got: " The owner of data.ny.gov has configured their website improperly. To protect your information from being stolen, AOL Shield has not connected to this website. "

For all the years I have used AOL (many many years), I have never gotten this type of message especially when a government site is involved.

Others may have better luck in connecting. I'll try again tomorrow from a computer in my local library.
Not to get to deep, but it sounds like the browser AOL is using or the plugin being used does not like the SSL/TLS certificate that the server is using. Firefox and Chrome are fine with it on my end, so this is an AOL issue, not a site issue.
  by Allan
 
I just tried the direct link using my cell phone (using public wifi ).

The result is 404 Page not found.

That tells me the problem is with the link address may be at the ny.gov end.

I will try the library computer later today.

UPDATE at 1:34 pm I am on a library computer right now an it is allowed me to go to the webpage (my cell phone still returns a 404 - using the wifi at the library.) BUT after I closed that tab and then went back to reopen that webpage I got a 404. It would appear that there is an issue at the website.
  by andrewjw
 
It's not an issue at the website. You are entering the URL improperly, perhaps by copy and pasting the text. You need to click the link Jeff Smith posted, since it is shortened for display (replaced by ... )
  by Allan
 
andrewjw wrote: Fri Aug 02, 2024 9:16 pm It's not an issue at the website. You are entering the URL improperly, perhaps by copy and pasting the text. You need to click the link Jeff Smith posted, since it is shortened for display (replaced by ... )
andrew - that is exactly what I did in each case - clicked on the link that Jeff posted. At no time did I copy/paste it.
  by andrewjw
 
Well, it loads fine for me ;) here's the full link without abbreviation:

data.ny.gov/Transportation/MTA-Subway-Hourly-Ridership-Beginning-February-202/wujg-7c2s/about_data