Lies, Damned Lies...and Transportation Data?
We live in an era of data evangelism. From medicine to beer to professional sports, data drives our assessments of effectiveness and determines our priorities for scarce resources across any number of areas. In many ways, data-driven decision-making -- propelled by the growing field of data science and machine-learning -- has revolutionized the way we interact with the world around us, particularly through innovations in transportation. We often think of data as fundamentally neutral -- which implies that decisions derived from that data are not only unbiased, but optimized to produce the best outcome. But is this always the case? In transportation planning, for example, does the increasing use of data improve the quality of the overall system?
It’s not an easy question to answer. However, as this piece by Joe Cortright on CityLab explains, the first thing we should consider is not whether we’re using our data correctly, but instead whether we’re using the correct data for the problems we’re trying to solve. National-level trends will not provide useful data for local or regional planners since local characteristics (including density, geography, and demographic overlays, among others) can vary wildly from a national average. On a more granular level, looking at conventional traffic counts, level of service indicators, and speed assessments (all of which were designed for motor vehicles, and are some of the most commonly collected indicators) say little or nothing about multimodal conditions of a particular roadway or region, including whether or not people view non-motor vehicle options as “safe, convenient, or desirable,” as Cortright puts it. Typical engineering standards -- with the data they use to design and assess a road network -- prioritize throughput and efficiency of motor vehicles. Even mode-share data -- which can provide a broad view of the local transportation landscape, and is often used to set long-term goals -- is similarly oriented around cars. The data simply shows that the vast majority of people drive in personal vehicles, but it doesn’t say whether or not people would continue to drive if they had accessible and safe alternatives.
In short, much of our transportation data is, in fact, biased. This is because we choose to disproportionately collect data that measures or otherwise relates to motor vehicles, which then, unsurprisingly, reinforces the system we have created to optimize those vehicles. Most locations don’t collect regular counts for cyclists and pedestrians, or design robust standards to measure the level of stress for a non-motorist on any given street. Even if we did have city- or regional-level data for cyclist traffic on every street, for example, it would probably confirm what we already suspect -- that for many areas, cyclists are a small percentage of overall traffic. But it wouldn’t tell us why that’s the case, or how a particular street design may discourage potential riders who want to try an active form of transportation, but have concerns about safety or convenience. A wider variety of qualitative metrics would help provide more useful data in those cases, in order to provide forward-looking analysis rather than a snapshot of behavior that doesn’t reveal motivations or constraints.
As environmental and quality of life concerns have pushed local governments to de-prioritize the motor vehicle, the rhetoric around planning and budgeting has changed. But in too many cases, the data has not, and we often use these outmoded forms of data to address new, complex challenges. In many ways, the data we have is not the data we need. If we want to meet the urgency of the moment and design transportation solutions for our most pressing problems -- zeroing out carbon emissions, eliminating car-related roadway fatalities, and improving accessibility for all residents -- perhaps we should begin by asking a couple of simple questions: what is the information we need in order to understand the problem, and how do we collect that information in a rigorous, systematic fashion. Data-driven decision-making isn’t going anywhere, and nor should it. But as researchers often say: “Garbage in, garbage out.” We need better data to help us design a safer, more accessible, less car-reliant transportation system.
- Josh Linden (edited by Sachi Arakawa)
It’s not an easy question to answer. However, as this piece by Joe Cortright on CityLab explains, the first thing we should consider is not whether we’re using our data correctly, but instead whether we’re using the correct data for the problems we’re trying to solve. National-level trends will not provide useful data for local or regional planners since local characteristics (including density, geography, and demographic overlays, among others) can vary wildly from a national average. On a more granular level, looking at conventional traffic counts, level of service indicators, and speed assessments (all of which were designed for motor vehicles, and are some of the most commonly collected indicators) say little or nothing about multimodal conditions of a particular roadway or region, including whether or not people view non-motor vehicle options as “safe, convenient, or desirable,” as Cortright puts it. Typical engineering standards -- with the data they use to design and assess a road network -- prioritize throughput and efficiency of motor vehicles. Even mode-share data -- which can provide a broad view of the local transportation landscape, and is often used to set long-term goals -- is similarly oriented around cars. The data simply shows that the vast majority of people drive in personal vehicles, but it doesn’t say whether or not people would continue to drive if they had accessible and safe alternatives.
In short, much of our transportation data is, in fact, biased. This is because we choose to disproportionately collect data that measures or otherwise relates to motor vehicles, which then, unsurprisingly, reinforces the system we have created to optimize those vehicles. Most locations don’t collect regular counts for cyclists and pedestrians, or design robust standards to measure the level of stress for a non-motorist on any given street. Even if we did have city- or regional-level data for cyclist traffic on every street, for example, it would probably confirm what we already suspect -- that for many areas, cyclists are a small percentage of overall traffic. But it wouldn’t tell us why that’s the case, or how a particular street design may discourage potential riders who want to try an active form of transportation, but have concerns about safety or convenience. A wider variety of qualitative metrics would help provide more useful data in those cases, in order to provide forward-looking analysis rather than a snapshot of behavior that doesn’t reveal motivations or constraints.
As environmental and quality of life concerns have pushed local governments to de-prioritize the motor vehicle, the rhetoric around planning and budgeting has changed. But in too many cases, the data has not, and we often use these outmoded forms of data to address new, complex challenges. In many ways, the data we have is not the data we need. If we want to meet the urgency of the moment and design transportation solutions for our most pressing problems -- zeroing out carbon emissions, eliminating car-related roadway fatalities, and improving accessibility for all residents -- perhaps we should begin by asking a couple of simple questions: what is the information we need in order to understand the problem, and how do we collect that information in a rigorous, systematic fashion. Data-driven decision-making isn’t going anywhere, and nor should it. But as researchers often say: “Garbage in, garbage out.” We need better data to help us design a safer, more accessible, less car-reliant transportation system.
- Josh Linden (edited by Sachi Arakawa)
Very cool post and one that's needed in today's climate - I've seen the data evangelism first-hand and it's a troublesome sight. The next big 'thing' in transportation data seems to be mobile-phone-derived trip data. However, this data is still mode-agnostic to a large extent - it can't tell planners definitively if the phone's owner was in a car, bus, light rail, or other mode. It can guess, of course, but more often than not the end result is that the data assumes SOV travel, and thus continues to reinforce a car-centric mentality towards transportation planning.
ReplyDeleteYea, really good point -- mobile phone data is a wide open field that can be used/abused. I've seen some talk about using mode-specific mobile data, through apps like Strava -- but that still brings up big equity issues since Strava users are a small minority, and their trip routes and preferences are often quite different than most users. Fortunately, it looks like many cities are requiring bikeshare systems (including new dockless companies) to share anonymized trip data through publicly accessible APIs, which is great. Having that data (not only start/end points, but for many dockless companies, specific routing data as well) can provide a level of consistent detail that hasn't yet existed (to my limited knowledge at least).
DeleteThat said -- even if/when routing data through mobile apps can zoom in on mode, it seems like it still can't answer questions about why people chose those trips, their motivations, etc. I guess that can come from user surveys. But I worry that mobile-derived data will lead planners to start making assumptions about people's behavior and preferences, using the same machine-learning tools that social media companies often use. We're all just a series of 1s and 0s!