Skip to main content
All CollectionsFAQs
Frequently Asked Questions
Frequently Asked Questions

Review commonly asked questions about Replica’s data, methodology, and interface functionality.

Lauren Massey avatar
Written by Lauren Massey
Updated over a week ago

This document contains trade secrets, commercial and financial information provided by Replica Inc. that is privileged and confidential. Do not distribute without explicit permission from Replica, Inc.

Table of Contents


Replica Overview

What is Replica?

Replica provides data about the built environment and how people interact with it. Our mission is to organize that information to make it accessible, valuable, and actionable.

We know that making decisions about the built environment involves confronting complex challenges that don’t have simple solutions. Too often, you’re expected to rely on data that’s out of date, incomplete, or both. Afterward, you don’t have the information you need to monitor the impacts of those decisions.

Replica exists to change that. We transform vast, disparate datasets into a holistic, up-to-date picture of what’s happening across mobility, land use, people, and economic activity. Our customers get unprecedented insights into a world that’s constantly changing.

What’s included in Replica’s platform?

Replica’s platform has three core datasets: Places, Trends, and Scenario.

Replica Places simulates the complete activities and movements of residents, visitors, and commercial vehicles in a region and season on a typical day. Places are delivered as megaregions, each covering between 10 and 50 million residents and multiple states. Our customers use Places to obtain an accurate, shared baseline understanding of the world as it exists today.

Click here to learn more about Places data.

Replica Trends is a companion to our Places dataset. Trends is updated each week with fresh data nationwide covering mobility, economic activity, and land use. Trends has census-tract-level fidelity with mobility data including origins and destinations, trip mode, residential VMT, and consumer spending data across a number of categories. Customers use Trends to understand the current state of the world and monitor how it’s changing in near-real time.

Click here to learn more about Trends data.

Scenario forecasts the impact of population and employment changes on travel demand and the transportation network nationwide. Scenario harnesses the Replica Places modeling pipeline to produce baseline conditions (Base Year) and forecast conditions (Forecast Year) for your selected study area. Our first version accepts changes to population, employment, and work-from-home rates. Future iterations will allow other inputs such as changes to the transportation network, land use, or behavioral preferences.

Click here to learn more about Scenario.

In addition to our core dataset, Replica has complementary datasets that provide a holistic view of the built environment. Learn more below:

  • Nationwide parcel-level land use data, including both parcel area and building area data (link)

  • Nationwide Annual Average Daily Traffic (AADT) data (link)

  • Nationwide Annual Free-Flow Speeds data (link)

  • Nationwide Quarter-Hourly Speed Profiles (link)

  • Nationwide Turning Movement Count (TMC) data (link)

What is Replica's definition of a trip? How do you define the end of a trip?

A trip is a movement by a person between places. A trip begins when a person leaves a place and ends when a person stops to do a non-travel activity in a place. For example: If a person walks from home to a cafe, sits down to drink a coffee, and then walks to work, two trips have occurred.

A person can use multiple modes within a single trip. For example: If a person walks to the bus stop and then takes the bus to work, this is a single trip with two trip segments. When a trip involves multiple modes, the trip is assigned a primary mode by using the following ranking: 1) Public transit, 2) Driving (private auto)/Auto passenger/Taxi/TNC, 3) Biking, 4) Walking.

How long of a dwell time needs to occur in order for Replica to consider a stop as a separate trip?

As noted in the question above, Replica defines a trip as a movement by a person between places. A trip begins when a person leaves a place and ends when a person stops to do a non-travel activity in a place. As a preliminary step in our Travel Activity Core Data Product development, we analyze stay points from our composite data sources, an example being mobile location data feeds. A typical duration of a period of stay that can be detected from a typical mobile location data feed with continuous temporal sampling (1 location sample per minute) is a stay of over 5 minutes in duration (over 85% detection accuracy), with most stays of 15 minutes identified at 90% accuracy. Very short activities or local activities that happen within the range of localization accuracy can not be reliably detected. For example, it is often not possible to identify events such as buying a coffee at a drive-through location, particularly with no significant stationary periods (such as a long wait in line).

Read more about stay point detection in our Methodology document here.


Data Outputs

What output data is available in Places?

Each Places simulation results in a complete trip, population, and routing table.

Population Attributes: Each trip is associated with a specific person in the simulation, for whom the following characteristics are available:

  • Age

  • Sex

  • Race and ethnicity

  • Primary language

  • Employment status

  • Industry of employment

  • Home location

  • Work location

  • Individual and household income

  • Work-from-home status

  • Vehicle ownership status

  • Resident or visitor status

See full list of attributes available here.

Trip Attributes: Each trip is assigned the following attributes:

  • Origin and destination points

  • Origin and destination points by land use category

  • Trip distance

  • Trip duration

  • Start and end time

  • Complete routing information for each trip (network links and transit routes)

  • Trip mode, including private auto driver, private auto passenger, public transit, walking, biking, freight, and transportation network companies (TNCs)

  • Trip purpose, including home, work, errands, eat, social, shop, recreation, commercial, and school

See full list of attributes available here.

Location Detail: Replica models to specific real-world locations and points of interest (e.g., a specific office building, the Starbucks at a certain address) — trips are modeled from individual building footprint to individual building footprint, rather than zone to zone.

What output data is available in Trends?

Replica Trends data provides near-real-time insights into mobility, spending, and land use. You have access to census-tract level origin-destination tables that represent average weekdays and average weekend hourly breakdowns, and which categorize trips by mode and purpose. You also receive a number of consumer spending metrics, which represent the total amount of consumer spend in each census tract in the country — both in aggregate and across a number of sub-categories including retail, grocery stores, restaurants, and travel. Learn more about the metrics included below:

  • Mobility Data

    • Complete hourly nationwide OD table, representing an average day in the prior week (link)

    • Mode Split (including auto, transit, walking, and biking) (link)

    • Residential Vehicle Miles Traveled (link)

    Economic Data

    • Total dollars of Consumer Spend by all residents of each census tract, and at all merchants in each census tract, across the below categories. Learn more about weekly spend by home location and weekly spend by merchant location.

      • Retail

      • Grocery Stores

      • Gas Stations, Parking, Taxis, Tolls

      • Restaurants & Bars

      • Airline, Hospitality, & Car Rental

      • Entertainment & Recreation

    • Breakdown of on- and off-line spend for certain categories

    • Spend county-to-county flows by merchant location and resident location, broken down by sector (link)

How frequently is Replica's data updated?

Replica updates Places megaregions annually.

In Trends, all mobility and spending data is refreshed each week. Land use data is refreshed quarterly.

What temporal coverage does Replica include?

All Places megaregions have Fall 2019, Spring 2021, Fall 2021, Fall 2022, and Spring 2023 seasons.

In Trends, we have the following temporal coverage:

  • Mobility data: January 2019 to present

  • Spend data: January 2019 to present

  • Land use data: Land use data is available for a single time period and will be updated seasonally. The most current version we have available is from Spring 2023.

Am I seeing data for a single day in Replica?

Places output data is available for a typical mid-week day (Thursday) and a typical weekend day (Saturday) in a given season. When you interact with the data in Places Studies, you can toggle between seeing the output data for Thursday or Saturday.

All data displayed in Trends dashboards (unless otherwise noted) are for an average weekday of a given week. Full-week averages and weekend averages are available for download by clicking the "Download Data" button at the top of each Trends dashboard or visiting the Data Downloads page.

How do you define each mode included in Replica?

Click here for a complete list of primary mode definitions.

How do you define each trip purpose included in Replica?

Click here for a complete list of trip purpose definitions.

How do you define the seasons in Places?

Places data is modeled in seasons, with data available from Fall 2019, Spring 2021, Fall 2021, Fall 2022, and Spring 2023 seasons. See below the definitions of Spring and Fall:

Spring: A typical day in March, April, May

Fall: A typical day in September, October, November

Okay, now that I have a general understanding of the data outputs, I have more questions...

More about Replica's demographic data

What version of ACS data does Replica use in its datasets?

Trends datasets use 2019 ACS/PUMS data.

Details about which data sources were used for specific Places seasons can be found by clicking on the megaregion name in the filter bar of your Study > then the "view data source detail." button. Click here for step-by-step instructions on how to view these details.

Do you have information on the industry of employment of workers in a given area?

Yes, the industry of employment is an attribute that is available in our seasonal (Places) dataset. In Studies, you can filter by specific industries of employment to better understand the movements of the workers as well as their demographic characteristics. "Industry" is also a field within the People table in the dataset tab of your study.

Replica uses 2017 NAICS codes in the Industry field. The mapping between codes and their descriptions can be found here.

Level of granularity of NAICS codes

Our goal is to share the most detailed industry information where possible, however, there are some instances where the level of granularity varies. The primary source that helps us infer more detailed industry of employment data is point-of-interest and parcel-level land use data. In some instances, there are multiple NAICS codes inferred from a single parcel. When this happens, we calculate the proportion of NAICS codes at different levels. For instance, if three unique NAICS codes are found in a person’s workplace, "231023", "231024", and "231025", we are 100% sure the 5-digit NAICS code is “23102”, while we are only 33% sure the 6-digit NAICS code is one of the three options. In this case, we set the detailed industry as the 5-digit NAICS code “23102”. There are also instances where industries have only up to 4 digits instead of 6 digits.

Do you have work-from-home (WFH) or telecommuting data?

Yes, we have WFH data in our Places dataset. Click here for step-by-step instructions on how to view this data.

Learn more about our WFH module and our WFH methodology here.

Do you have information on visitors?

Yes, Places simulations reflect the complete activities and movements of residents, visitors, and commercial vehicle fleets in a region and season on a typical day. You can use the "Resident/Visitor Status" module within the Places Studies Summary Panel (see screenshot below) to filter movements made by visitors.

In Replica, visitors are people who do not normally live or work in the Places megaregion, and either stayed overnight in the megaregion, or entered and exited via a “port of entry” (usually an airport) the same day.

Please note that we do not include the demographic characteristics (such as age and income) of visitors. When you filter to view visitors only, you will not see any data in the demographic modules, and if you filter using a demographic characteristic you will not see any visitors.

What is the best way to analyze tourism with Replica data?

Does Replica model cross-border trips to Mexico or Canada?

We don't currently model cross-border trips in Replica. For more background, we have no points of interest in Mexico or Canada, nor do we have any individuals in our simulated population that have home or work locations assigned in these countries. As a result, our simulation does not model trips to and from them. In some border-adjacent areas, you may see some origins or destination points that are "outside of region", meaning they did not start within the boundaries of the megaregions that border these countries. These are instances where our router -- the component of our model that determines the roadways taken for trips taken by surface modes -- has assigned parts of Mexico's or Canada's road network to the list of routes taken to get from point A to point B for trips traveling near border-adjacent areas. In some instances, trips may make an intermediary stop along the Mexico or Canada road networks (presumably at border facilities), resulting in an "outside of region" origin or destination.

More about Replica's modeled trip modes

Do "private auto" trips include rental cars?

Yes, trips made by private auto vehicles can include those taken by rental cars.

What type of transit is included in the "public transit" mode?

Replica models all transit modes available in a given region provided we have GTFS, this includes bus, rail, light rail, subway, ferry, and even gondola. Click here for a list of valid options that are included in public transit trips.

Does Replica surface time spent waiting at a transit stop during a trip?

In Replica, the trip start time is “optimized” so that there's no wait time before the first transit leg of a transit trip. Subsequent transfer time duration can be obtained by looking at the end and start times of consecutive legs of a transit trip (this information is accessible through direct database access offered via BigQuery).

Does Replica have data for trips taken by e-bikes?

Replica includes data patterns for biking trips, however, we do not currently separate out e-bikes. Click here to learn how to view biking patterns in Replica.

Are modes such as private shuttles, paratransit, and scooters included in Replica?

No, these modes are not currently modeled by Replica.

Does Replica model trips made by school buses? How would they be categorized?

No, Replica does not explicitly model trips made by school buses. These trips would fall under other mode categories, like auto passengers of private auto vehicles, private auto (for those old enough to drive), or walking.

How do you model Taxi/TNC data?

As noted above, Taxi/TNC data is available in Places, only. Mobile location data alone does not allow reliable identification of trips made by an on-demand auto (vs., for example, a carpool). It is preferable to use auxiliary ground truth data sources to represent travel by taxi and TNCs more accurately.

Aggregate Taxi/TNC ridership is a regression model (an approach often called "direct demand model") that takes into account land use, day-of-the-week, and time-of-day activity at points of interest observed via mobile location data, walkability, and transit accessibility scores for origins and destinations of Taxi/TNC trips.

Do trip durations for Taxi/TNC trips include waiting time?

No, they do not.

How can I view walking (pedestrian) and biking data in Replica?

Replica models purposeful walking and biking trips that have distinct origin and destination points. Click here for a step-by-step guide on how to view biking and walking data in Replica.

While we do model biking and walking trips in Replica, one caveat to note is that we don't currently model looping trips without a specific destination (for example, walking a dog or jogging around a park trail). As part of our technical approach, we assign trips to trails when there is an explicit place of interest associated with it and/or if the path is identified as the most time-efficient route to get from a person's origin to their destination.

How does Replica determine which network link direction bike/pedestrian trips take?

Pedestrians take whichever side is faster based on their start/end points. The direction can also be determined by origin and destination.

Why do I see high percentages of trips with a primary mode of "Other" when looking at movements near airports?

"Other" trips most likely represent people arriving on flights. We don't currently model airline trips directly in Replica, and we also intentionally remove most movements we suspect are airline flights. That said, there are many flights in our mobility datasets and sometimes they show up in our model outputs, likely as "Other".

Since we don't explicitly model air travel, we don't recommend analyzing the "Other" trips themselves.

Can I view motorcycle trips in Replica?

We do not currently break down vehicle trips made by motorcycles.

I have filtered to households with zero available vehicles, why are there still private auto trips listed?

Zero-vehicle households do still report private autos as their commute mode in ACS data, and Replica accounts for similar logic for cars being borrowed, rented, or provided by an employer.

More about Replica's mobility data

Is it possible to create a Study to see movements between two megaregions? (Ex: I want to see movements between GA [South Atlantic] to TN [South Central])

Right now we don't support the ability to look at travel between two megaregions, though see notes below:
​
If you are looking at movements coming to or from Atlanta, for example, in the South Atlantic region, you may notice that some trips are labeled as "out of region" in the origins and destination modules or within the origin and destination fields in the trip table. Some of the "out of region" trips include those coming from what we refer to internally as "the donut region" which is approximately the bordering counties of the states that are included in each megaregion. The types of trips we would capture coming to or from the donut are those made by residents of these counties who either work or go to school in the states modeled within the megaregion and make a trip on a given day, or external to external trips that use a road within the megaregion. If you are looking at an area of TN (included in our South Central megaregion) that is right along the border (i.e., Chatanooga), it's possible we could provide you with some data about trip flows between Chatanooga/Atlanta that fall into the categories above.
​
We also have aggregate nationwide O/D pair data available in Trends, where you could download data OD data for trips starting in a particular city in Georgia, for example, and see how many are ending in a particular city in TN (or vice versus).

Does Replica have VMT data?

Yes, VMT data is available in Replica's seasonal (Places) and weekly (Trends) datasets. View How to View VMT Data in Replica to learn more.

Is it accurate to assume that you can calculate the average speed for private auto trips in the Places trip table by taking the total distance of private auto trips and dividing it by the total duration of all private auto trips?

Yes for private auto; transit is defined by the schedule (buses do not get stuck in traffic); and speeds are fixed for cycling/walking.

My transit numbers in Trends seem off. What's going on?

The Trends Mode Split module is based on trip origin and should be evaluated for a core-based statistical area (versus a city) to capture all commuters.

Does Replica model food delivery trips?

We do not explicitly model food delivery trips in Replica. Since these types of trips are detected in our underlying data sources, it's likely that they appear in our modeled data with a trip purpose of either SOCIAL or ERRANDS (MAINTENANCE) or SHOP.

What are the differences between the network link volumes available in Places compared to Replica's AADT data?

Replica’s detailed activity-based models, Places, includes a complete trip and population table for a typical weekday and weekend day in a given season. Data outputs are available down to the route-level, enabling customers to look at the routes of individual trips and query individual network links to understand the volumes and characteristics of trips that utilize those specific links.

Replica offers nationwide AADT data with coverage across both urban and rural areas including >98% of Functional Roadway Classifications (FRC) Class 1-4 roads, and >80% of FRC Class 5-6 roads. Volumes are broken down by vehicle class, including single-unit and combination-unit trucks, on 98% of all FRC Class 1 and 2 roads.

While both datasets leverage a composite of location data collected from personal mobile devices and in-dashboard telematics, there are several differences between the network link volumes available in Places and Replica’s AADT data product. The first is the technical approach that we take to deliver these datasets. The network link volume data available in Replica Places is an output of a data pipeline that simulates activity for a typical weekday and weekend day over a 13-week modeling season. We calibrate Places output data to reflect the movements that are happening for that typical day of the week in that given season of time. AADT counts are annual average daily counts and do not take into account the same types of day-specific or seasonal factors that we account for in Places. AADT data has a specific set of scaling factors associated to take into consideration the location and time of year they were collected. For AADT, we annualize the input source data that we receive and scale to match observed AADT ground truth data which includes data from the Federal Highway Administration (FHWA) and from various DOTs across the country.

The second difference between the two datasets is that the network link volume data available in Places can be broken down by individual surface modes (private auto, auto passenger, walking, biking, commercial, TNC, transit). Each trip modeled in Places is assigned a primary mode and a route containing a set of network links taken to its destination. This enables customers to filter by a specific mode to see individual route characteristics. Currently, Replica’s AADT product includes AADT of any surface mode for roadways.

When do you recommend using network link volumes vs. AADT data?

  • If you are looking for auto volumes with the highest level of accuracy as compared to Federal Highway Administration (FHWA) data, we recommend Replica AADT’s data product. As noted above, we scale our AADT counts to match FHWA data

  • If you want to understand network link volumes in addition to other mobility patterns, like origin and destinations, or demographics of the travelers passing through a route, we recommend using Places data, as these are detailed, activity-based travel demand models that contain detailed information about the trips tied to individual routes and the trip-takers

Where can I learn more about these datasets?

  • Click here to learn more about Replica's AADT data methodology, quality metrics, and data schema.

  • Click here for more information on how to access Replica's AADT data in our platform.

  • Click here for more information on Places data, including network link volume data.

  • Click here for information on how to access Places network link volume data in our platform.

More about Replica's commercial vehicle (freight) data

How can I see commercial vehicle (freight) trips in Replica?

Click here for a step-by-step guide on how to view commercial vehicle data in Replica.

What freight vehicle classifications does Replica model?

As noted previously, freight data is included in Places, only. The FHWA vehicle classifications we apply to models unless otherwise specified by customer criteria, are listed below.

  • FHWA classes 4-6 are medium (14,000 - 26,000 lbs)

  • FHWA classes 7-8 are heavy (>26,000 lbs)

What type of data sources does Replica use to model freight movements?

Replica uses GPS data as the raw source to model freight movements. This particular dataset comes from transponders installed on vehicles for fleet management purposes under specific privacy guidelines. As vehicles go through the route network, their time and speed when they enter and exit certain portions of the route network are recorded. We are currently evaluating additional vendors to further refine its freight models.

What are the sample sizes of Replica's freight data sources?

Replica currently sources data from three different vendors:

  • Vendor 1 details approximately 80 million trips driven by over 75 million vehicles daily nationwide

  • Vendor 2 provides data from about 125k vehicles daily nationwide for evaluation

  • Vendor 3 has provided a small geographic area for evaluation

Do Replica's freight data sources include freight data from major retailers, like Costco or Walmart?

Providers do not specify if trucks belong to a particular retailer. However, POI information along trips is provided, highlighting trucks going to major retailers such as Costco or Walmart. Freight trips to and from refineries, major airports, ports, factories are also included.

How are freight movements calibrated?

Replica builds a tour-based model, meaning we identify tours from truck data and scale them using ground truth counts. The scaling coefficients vary by geography and time.

Are you currently capturing the data around the goods included in freight movements?

Not currently. However, this is a functionality we are exploring.

Do you have dwell times/idle times for commercial vehicles in Replica?

Not currently.

More about Replica's Land Use data

What type of land use data is available in Replica?

You have access to aggregate land use data in both Places and Trends to visualize and download land use data for all census-based geographies (down to block group level in Places and tract-level in Trends) and custom geographies. This data allows you to quantify the current use of land and buildings for your study areas.

Land use data includes land area (i.e, parcel or lot use) and building area (built area used for each specific use), both in square feet, as well as dwelling units. Replica land uses are categorized per the table below.

Land Use Categories

Click here to learn more about our land use data.

More about Replica's geographic data

What version of census-based geographies/boundaries are you using in both Trends and Places?

The census-based boundaries shown in Replica are from the 2010 decennial census. We recently uploaded custom geographies for 2020 census boundaries that you can use in your Studies. You can find them in the Geo Breakdown under "Org Geographies".

More about Replica's spend data

Note: Spend data is available in Trends, only.

Is Replica's spend data adjusted for inflation?

Replica's spend data shows nominal dollars spent each year, meaning they represent the actual value of currency at the time it was spent. Totals are not adjusted for inflation.

Confirming that our spend data shows nominal dollars in each year, meaning they represent the actual value of currency at the time it was spent.

Consumer spend includes all transactions, including credit card, debit card, and cash transactions, that take place at a point of sale, such as at retail stores, supermarkets, restaurants, taxis, and bars. It also includes e-commerce transactions for a select number of these categories.

The data does not include all household expenditures; for example, rent, car payments, and healthcare spending is excluded. This most closely aligns Replica's consumer spend metric to the Census Bureau's Monthly Retail Trade Estimates. Transactions are categorized by the merchant’s NAICS code.

Here is a detailed overview of how Trends spend data is categorized:

Replica Category

NAICS Code(s)

Notes

Retail

Excludes codes 445 and 447 specified below

Grocery Stores

445 - Food and beverage stores

Gas Stations, Parking, Taxis, Tolls

447 - Gasoline stations

4853 - Taxi and limousine service

81293 - Parking lots and garages

Restaurants & Bars

722 - Food services and drinking places

Airline, Hospitality, Car Rental

72 - Accommodation and food services

481 - Air transportation

5321 - Automotive equipment rental and leasing

Excludes code 722 specified above

Entertainment & Recreation

71 - Arts, entertainment, and recreation

Other

All other NAICS codes

Data only includes transactions made at a point of sale (or online in those categories); it does not include all household expenditures such as rent, car payments, and healthcare spending.

What data sources does Replica use for consumer spend?

There are three main parties in consumer transactions: issuers/banks, merchants, and credit card companies. Replica currently has data from two of these parties - issuers and merchants, and is evaluating adding the third.

Does Replica use ground truth data in consumer spend?

Yes, Replica uses the Census Bureau's Monthly Retail Trade Estimates as historical calibration for our models so that we can provide accurate scaled estimates of relevant spend categories.

Does Replica separate out e-commerce (online) from in-person (offline) spend?

Yes, you have the ability to see a breakdown of online-offline consumer spend by home location for three major categories: ​​Restaurants and Bars, Grocery Stores, and Retail. This spend is counted in the location where the purchase would be received and taxed, i.,e., the purchaser's home address.

Why are online and offline breakdowns not available for certain spend categories?

The share of online and offline spend was not distinguished for certain spend categories due to the high variability of online and offline spend in these categories. For example, in our “Gas Stations, Parking, Taxis, and Tolls” spend category, gas stations essentially have no online transactions whereas taxis and TNCs have a significant number of online transactions. In the future, we hope to provide spend data further broken down into subcategories of spend.

Why is spend by merchant location not broken down by online and offline transactions?

Currently, spend data by merchant location captures offline transaction records only. This is primarily because online transactions are difficult to pinpoint to a singular merchant location. For example, online purchases made at major online retailers, such as Amazon, cannot be mapped to a singular merchant location.

How would you categorize online food delivery services, like Door Dash or Uber Eats?

These transactions would be classified as online spend for restaurants.


Replica's Data Sources

What data sources does Replica use to deliver its insights?

Replica uses a diverse set of third-party data from public and private-sector sources. These sources include five categories of data that are outlined below.

Each of Replica’s data processing pipelines leverages a composite of these diverse data sets. This process minimizes the risk of sampling bias that exists in any single source on its own. For example, a product that relies more heavily on data from personal mobile devices risks failing to adequately simulate the portions of the population that do not have mobile devices or those who opt out of device-tracking technologies. Our composite approach also creates resiliency against data quality issues and protects against disruptions of individual data sources.

  1. Mobile location data: To create a representative sample of daily movement patterns within a place, Replica uses multiple types of location data collected from personal mobile devices and in-dashboard telematics. Replica only acquires de-identified mobile location data. See more detail on the multiple types of location data we source below. Previous versions of Replica’s model also included cellular networks data as another source of mobile location data

    • Location-based services (LBS) data: As people move around with their phones in the real world they use mobile apps that rely on their location. Users opt-in to sharing their location when using the apps.

    • Vehicle in-dash GPS data: The data on vehicles' speeds and locations geo-matched to a particular road segment (telematics data) collected by GPS systems is transferred to the centralized data storage and processing systems that monitor real-time congestion. End-user license agreements (EULAs) regulate the use of these telematics data by third parties. This is currently used in Places and will be used in Trends in the future.

    • Point-of-interest (POI) data: Aggregate data on the number of mobile devices present in a given venue (e.g., a park or a shopping mall). Aggregators of this information provide a total count of devices in their sample, providing a signal to estimate the relative occupancy weights at different points and areas of interest.

  2. Consumer/resident data: Demographic data from public and private sources provide the basis for determining where people live and work, and the characteristics of the population, such as age, race, income, and employment status.

  3. Built environment data: Land use data (such as zoning regulations), building data (such as total square footage and use types), and transportation network data (such as road and transit networks) are used to determine where people live, work, and shop, and by what means it is possible to travel to each activity.

  4. Economic activity data: Includes all transactions, including credit card, debit card, and cash transactions, that take place at a point of sale. With this input, Replica depicts the level and types of spending that occurred at a particular time and place.

  5. Ground truth data: Ground truth data is used to calibrate and improve the overall accuracy of Replica outputs. The types of ground truth collected by Replica include auto and freight volumes, and transit ridership. Ground truth is both acquired directly by Replica and provided by customers like yourselves.

Okay great, I have more questions about these data sources...

What type of data is used to create your synthetic population?

Replica Places uses samples of Census demographic data, such as Public Use Microdata Samples (PUMS), American Census Survey (ACS), Census Transportation Planning Products Program (CTPP), and Longitudinal Employer-Household Dynamics (LEHD) data, to create a “synthetic population” that is statistically representative of the actual population. Details about which version of Census data we use for a megaregion can be found by clicking on the [Megaregion name, Season, Day] filter button in the filter bar, and then selecting "View Data Sources Details" from the dropdown:

What are the sources of Replica's battery electric vehicle (BEV) data?

Replica uses third-party consumer marketing data to understand the geospatial distribution of BEV owners and the relationship between BEV ownership and sociodemographic attributes. We also use BEV registration and sales data from sources like Atlas EV hub to calibrate state-level totals.


Replica's Methodology

A detailed overview of Replica's methodology can be found here.

What are Replica Places?

Replica Places simulations reflect the complete activities and movements of residents, visitors, and commercial vehicle fleets in a region and season on a typical day. Places are delivered as megaregions, each covering between 10 and 50 million residents and multiple states, enabling the entire contiguous United States and Hawaii to be produced in 11 megaregions.

The output of each simulation is a complete, disaggregate trip and population table for an average weekday and average weekend day in the subject season (e.g., Fall 2021). The model represents a 24-hour period with second-by-second temporal resolution, and point-of-interest-level spatial resolution. Each row of data in the simulation output reflects a single trip, with characteristics about both the trip (e.g, origin, destination, mode, purpose, routing, duration) and trip taker (e.g., age, race/ethnicity, income, home location, work location).

Each completed model also includes an associated Quality Report, which compares the outputs of the simulation to ground truth data, enabling you to compare Replica’s modeled outputs with observed counts.

Replica updates Places models annually.

How do you create Places data?

Click here to read our Places methodology.

What are Replica Trends?

Replica Trends is a nationwide activity-based model updated each week with near-real-time data on mobility, consumer spending, and land use. Trends has census-tract-level fidelity with mobility data including origins and destinations, mode split, and residential VMT, and total consumer spending data across a number of sub-categories, including retail, grocery stores, restaurants, and travel. Replica generates its data by running large-scale, computationally intensive simulations. These simulations allow us to deliver granular data outputs that match behavior in aggregate, but don’t surface the actual movements (or compromise the privacy) of any one individual.

How do you create Trends data?

Click here to read our Trends Mobility data methodology

Click here to read our Trends Spend data methodology

How do you validate Replica's data?

Places modeled outputs are compared to aggregate control group data (i.e., observed counts, or "ground truth"), like transit counts and traffic counts, for the purposes of calibration and to ensure quality. If the comparison yields an unacceptable gap between outputs and observed data, the model parameters can be iteratively adjusted, increasing the quality of the overall Places model.

Each completed Places model includes an associated Quality Report which compares the outputs of the model to ground truth data, enabling users to compare model outputs to observed counts. Replica’s goal in providing this level of transparency regarding the measurability of each modeled season is to ensure you have the statistical confidence you need to use Replica outputs for key policy and transportation decisions.

We base all Trends metrics in ground truth data, both internally and externally sourced. For example, we use data from the Bureau of Transportation Statistics for mobility ground truth and from the Census Bureau for economic activity ground truth. We do not generate a quality report as we do in Places because for a vast majority of our data there is no week-to-week or day-to-day ground truth to directly compare it against... we're the first to ever generate those numbers. What we do instead is calibrate our metrics based on historical regression to ground truth. For certain metrics labeled 'Quantified' Trends Estimates (including total trips), we have a quantified representation of the margin of error for each day. We may roll this out to other metrics over time where it's possible (though for the previous factor mentioned, for many metrics this will never be possible just because there is no real ground truth).

What type of ground truth data do you use to calibrate and validate Places models?

Details about which data sources were used for specific models and seasons can be found by clicking on the [Megaregion name, Season, Day] filter button in the filter bar, and then selecting "View Data Sources Details" from the dropdown:

How are movement activities determined in Replica?

In Places, Replica creates travel behavior models ("personas") to determine movement activity in the model. Personas extract behavioral patterns from de-identified mobile location data collected from mobile device(s) of real people. They are composed of three main underlying behavioral choice models: activity scheduling, destination location, and travel mode. Each synthetic person, with its assigned persona and its travel behavior models, is “motivated to travel.”

Learn more about personas here.

How does Replica determine the mode of travel?

For Places, Replica’s mode choice model consists of two distinct components, mode inference model and a mode choice model. Click here to learn more about our mode choice model.

For Trends, Replica builds a mode choice model based on geographical location of origin and destination, the distance between the two, and observed aggregate transit usage to assign modes to trips.

How does Replica handle multi-modal trips?

Many trips use multiple modes, such as walking to a bus stop and then riding the bus. Replica faithfully models these trips. For example, if someone takes a 2-mile bus ride to connect to a 10-mile rail trip this is a single trip with two segments (bus to rail stop, rail stop to destination) and Replica will model both of these.

When a trip involves multiple modes, the trip is assigned a primary mode using the following ranking: 1) Public transit, 2) Driving (private auto)/Auto passenger/Taxi/TNC, 3) Biking, 4) Walking.

How does Replica count transit transfers and accessing transit?

A transit-to-transit transfer is part of the same trip in Replica, as a trip begins when a person leaves a place and ends when a person stops to do a non-travel activity in a place. Therefore, a single trip can include multiple modes as well as multiple transit legs (e.g., transferring from one transit service to another transit service). Replica's trip counts generally reflect "linked" trips; however, "unlinked" transit legs will be counted/described in Places if a user clicks on an individual transit station.

Currently, in our model, all access to transit is made via walking. However, the entire trip would have a primary mode of transit and it would count as a single trip.

Does Replica include long-distance buses in its models?

Yes, Replica includes some inter-city buses in its models. When we look at ridership for intercity bus routes, we look at whether a route is considered a prominent commuter route, and/or whether the route existed mostly exists within a single megaregion. If a route doesn't fall in this criteria, we don't include them in our model at this time. The trips we observe that take these excluded routes are likely assigned a primary mode of "private auto", "carpool" or "other" in Replica.

How does Replica calculate auto passenger ("carpool") trips?

Auto passenger is calculated by looking at the number of trips destined to an area based on our multiple datasets and assigning appropriate attributes and behaviors to the synthetic population making those trips. From there, the model applies a mode choice option. As an example, 'auto passenger' mode would be a choice applied to children and teenagers (under driving age), to people who register as carpoolers or passengers in their Census commute data, etc. Once the mode choices are applied, we then calibrate with ground truth to calibrate the correct apportionment for mode split.

Does Replica provide data on vehicle occupancy?

We do not currently offer data on vehicle occupancy.

How does Replica determine trip purpose?

In Places, Replica uses a fine-grained location choice model (LCM) to determine location choices for discretionary activities (i.e. not home/work/school) made by the device owner of the de-identified mobile location data. The model selects individual venues (businesses, shops, services) and Points of Interest (parks, places of historic interest, tourist attractions) as potential destinations.

How does Replica determine trip purpose in a dense urban area?

Replica relies on having an independent and accurate source for the total number of visits for every venue / point-of-interest. Destinations of individual trips are always randomized within proximity of real observed locations. If a persona trained from a cellular device went to a shopping plaza, the synthetic person guided by that persona would go to any of the businesses at that plaza, proportionally to how popular they are in aggregate on that day of the week and hour of the day. For example, if a real person went to a hairdresser and not Target, it is very likely that a synthetic person guided by that persona will go to Target (because of its relative popularity) and some other synthetic person would go to the hairdresser.

How does Replica capture trips made to airports?

Replica models trips made by visitors of a megaregion who are traveling to the airport to depart the region by plane. These trips have a trip purpose category of "Region Departure".

We also capture trips to airports made by residents of a megaregion who work at the airport. These would be categorized as "work" trips.

We do not currently model trips made by residents of a megaregion who are traveling to the airport to depart the region by plan, or traveling from the airport after returning to the region by plane.

How does Replica model movements of minors?

Residents below the age of 5 do not receive a travel persona (learn more about personas in the "How do you create Places models?" question above). All their travel is assumed to be represented by the travel of an accompanying adult from the household. Likely, personas include trips to daycare en route to work. For example, an accompanying adult persona would be that they start at home, drop kids off at daycare, go to work, run an errand after work, pick kids up and go home.

How does Replica take data privacy into consideration?

Click here to learn more about our approach to privacy.

How is your data coverage in rural or sparsely populated areas?

We use a composite of data sources described above with national coverage from public and private sources, including mobile location data, vehicle counts, and demographics. Though rural areas where the population is lower or less dense will reflect lower population density, we do have coverage for these areas.

One category of data where we have less data coverage for these types of areas is transit data. This is because we currently model transit agencies with a minimum ridership threshold of 500 daily boardings per route, which means smaller agencies in smaller cities often do not have transit coverage today.

How do you account for underrepresented populations, like those without access to mobile phones, or those who don't participate in the Census?

Replica’s population is calibrated against recent Census ACS estimates. It is of course possible and sometimes the case that not all people are represented in the Census, which is why our calibration to Census does include small margins of error. Replica does not require seeing all mobile devices in a given geographic location, so access to mobile devices is not necessary for scaling Replica samples to represent movements of all people.

Does Replica capture homeless populations?

The Census does attempt to capture homeless populations but there are known issues. Some people who are experiencing homelessness do have phones so we likely are seeing some of their movements.

How should I interpret the "certainty indicator" in a Places Study?

The certainty indicator in a Places Study is a simplified abstraction of Replica's quality reports (available here). We would generally recommend for users with high sensitivity to quality to consult the actual quality report for the respective megaregion. However, we often recognize that many people simply need a directional sense of quality. The certainty indicator itself provides insight into confidence primary based on the number of trips and number of filters. As a general rule, more trips and less filters is going to yield high confidence. Conversely, low trips and many filters is going to yield lower confidence. However, it can be the case the conditions of the trips or filters -- meaning the presence of network level ground truth in a given area -- actually yield a "low confidence" indicator in the interface, but are actually in fact very high confidence given the presence of available ground-truth. In short, the certainty indicator should be used a quick guide for whether or not consulting the detailed quality report is necessary. High confidence indicators generally mean "proceed with low caution" whereas low confidence indicators mean "proceed with caution, consult quality report."


General Capability Questions

Can I access Replica data through an API?

Not currently. However, this is a functionality we are exploring. Click here to tell us how you would leverage an API.

What reference coordinate system is used by Replica?

Replica uses the WGS 84 or CRS 84 coordinate system.

Can I use Replica to forecast?

The granularity and flexibility of Places data makes it easy to use for forecasting. For instance, our public sector users work with Replica's preferred partners to run impact assessments on top of Places data.

Additionally, Replica’s Scenario product can be used to model the impact of population and employment changes on travel demand and the transportation network nationwide. Scenario harnesses the Replica Places modeling pipeline to produce baseline conditions (Base Year) and forecast conditions (Forecast Year) for your selected study area. Our first version accepts changes to population, employment, and work from home rates. Future iterations will allow other inputs such as changes to the transportation network, land use, or behavioral preferences. For more information about Scenario, click here.

What is the 'gz' extension on the files and how can I open it?

A gzip file is similar to a zip file. Unfortunately, the default zip utility packaged with Windows will not recognize them. We recommend using 7-Zip or WinZip to extract their contents. Both are free and easy to use.

What is the "Data Uploads" page? How does Replica use custom data that we upload to this page?

You can create or upload custom study areas to use in Replica through the Data Uploads page. These are called custom geographies. They are designed to make it easier to view data for the study areas most relevant to you, like city neighborhoods or state districts.

Replica accepts zipped shapefiles, KML, or GeoJSON file formats.

Once custom geographies are successfully uploaded and processed by Replica, you can view them in Trends Dashboards or Places Studies, just as you previously have with census-defined geographies.

In Trends, mobility data is aggregated based on the centroid of h3 resolution 9 cells, and spend data is aggregated based on the centroid of census tracts. If an h3 resolution 9 cell centroid is included in your custom geography, all mobility data from the h3 resolution 9 cells will be counted in your custom geography. If a census tract centroid is included in your custom geography, all spend data from the tract will be counted in your custom geography.

In Places, where Replica uses disaggregate data, trips are assigned to custom geographies based on their startpoints or endpoints. For example, if you select a custom geography as the trip origin point, Places will generate a total of all trips that started within the selected area.

Click here to read more about how to create or upload custom geographies in Replica.

What are Trends events?

Events allow you to visually indicate moments in time in order to understand their impact on the built environment. Use events to contextualize sudden or gradual changes in Trends or analyze micro cause and effects of short-term or near-term policy and planning decisions. To learn more about Events, watch a short video here.

How can Replica visualizations and data be integrated into third-party platforms?

Click here for a compilation of examples of how Replica visualizations and data can be integrated into third-party platforms.

Why are there trips with the purpose work for unemployed people?

We do expect to see a small percentage of unemployed people take work trips. Replica generates our synthetic population based on ACS data that is relevant to an entire year, and is binary - however, an "unemployed" person may have been employed occasionally, part-time for some duration, or occasionally at a temporary job. Some additional reasons include:

  • Unemployed workers going to interviews

  • Day laborers

  • Students with part-time jobs may be categorized as not in the labor force, but still take work trips

Did this answer your question?