Leverage Amazon SageMaker Geospatial Capabilities to Gain Insights from Mobility Data

Geospatial data is data about specific locations on the earth’s surface. It can represent a geographical area as a whole or it can represent an event associated with a geographical area. Analysis of geospatial data is sought after in a few industries. It involves understanding where the data exists from a spatial perspective and why it exists there.

There are two types of geospatial data: vector data and raster data. Raster data is a matrix of cells represented as a grid, mostly representing photographs and satellite imagery. In this post, we focus on vector data, which is represented as geographical coordinates of latitude and longitude as well as lines and polygons (areas) connecting or encompassing them. Vector data has a multitude of use cases in deriving mobility insights. User mobile data is one such component of it, and it’s derived mostly from the geographical position of mobile devices using GPS or app publishers using SDKs or similar integrations. For the purpose of this post, we refer to this data as mobility data.

This is a two-part series. In this first post, we introduce mobility data, its sources, and a typical schema of this data. We then discuss the various use cases and explore how you can use AWS services to clean the data, how machine learning (ML) can aid in this effort, and how you can make ethical use of the data in generating visuals and insights. The second post will be more technical in nature and cover these steps in detail alongside sample code. This post does not have a sample dataset or sample code, rather it covers how to use the data after it’s purchased from a data aggregator.

You can use Amazon SageMaker geospatial capabilities to overlay mobility data on a base map and provide layered visualization to make collaboration easier. The GPU-powered interactive visualizer and Python notebooks provide a seamless way to explore millions of data points in a single window and share insights and results.

Sources and schema

There are few sources of mobility data. Apart from GPS pings and app publishers, other sources are used to augment the dataset, such as Wi-Fi access points, bid stream data obtained via serving ads on mobile devices, and specific hardware transmitters placed by businesses (for example, in physical stores). It’s often difficult for businesses to collect this data themselves, so they may purchase it from data aggregators. Data aggregators collect mobility data from various sources, clean it, add noise, and make the data available on a daily basis for specific geographic regions. Due to the nature of the data itself and because it’s difficult to obtain, the accuracy and quality of this data can vary considerably, and it’s up to the businesses to appraise and verify this by using metrics such as daily active users, total daily pings, and average daily pings per device. The following table shows what a typical schema of a daily data feed sent by data aggregators may look like.

Attribute	Description
Id or MAID	Mobile Advertising ID (MAID) of the device (hashed)
lat	Latitude of the device
lng	Longitude of the device
geohash	Geohash location of the device
device_type	Operating System of the device = IDFA or GAID
horizontal_accuracy	Accuracy of horizontal GPS coordinates (in meters)
timestamp	Timestamp of the event
ip	IP address
alt	Altitude of the device (in meters)
speed	Speed of the device (in meters/second)
country	ISO two-digit code for the country of origin
state	Codes representing state
city	Codes representing city
zipcode	Zipcode of where Device ID is seen
carrier	Carrier of the device
device_manufacturer	Manufacturer of the device

Use cases

…

Leverage Amazon SageMaker Geospatial Capabilities to Gain Insights from Mobility Data

Sources and schema

Use cases

Creating Your Own Personal Brand as a Data Scientist: A Step-by-Step Guide

Robotics Update: Latest Innovations as of 08.12.2023

Change the Way You View Your Data

AI and Machine Learning: Revolutionizing Personalized Employee Recognition

Best 10 AI Products for 2024

Finding the Balance: Governing AI and Privacy to Safeguard Innovation and Security

Reducing Manufacturing Economic Pressures through Automated Palletizing

BigML.com’s Official Blog: Automated Machine Learning for Imbalanced Data