Getting to the Heart of On-Demand Processing
In this series of engineering-focused blog posts, members of Arturo’s engineering team will dissect the infrastructure backing Arturo’s on-demand property analytics API.
For this first post, I’ll discuss the three types of services that compose our service graph: Machine Learning (ML) Models, 3rd-Party Data Providers, and Feature Aggregators.
Layers of a solution
Simplified service dependency graph. Some feature aggregators are dependent on other feature aggregators. For example, predicting fire risk requires the roof outline from another feature aggregator in order to localize the analysis on a specific roof.
How does Arturo’s API translate a simple text address to detailed information on the vast array of property attributes we support today?
Feature Aggregators are responsible for the highest level orchestration logic — the first layer of services that expect an address as input, and return an output of some subset of features like roof material and condition or the location of pools on the property. These services gather all the necessary input and postprocessing context to deliver the feature set they’re responsible for in an easily-parsable format.
- The first step is a combination of address normalization and geocoding, which takes an address like `1747 grey avenue, 60201` and yields a normalized address (e.g. `1747 Grey Ave, Evanston, IL 60201, USA`) and coordinates (e.g. `lat: 42.0501356, lng: -87.702193`).
- Translating address coordinates to a geo-referenced parcel is also part of the process, providing the full extent of a property.
- Now we know the area or place on the globe we are looking for. We then fetch imagery from various satellite and aerial 3rd-Party Data Providers, and pass them directly to our ML Models, which indicate, for example, where there are buildings in the images.
- For some customers, it’s enough to return a simple building count or highlight the buildings in an image. Others require georeferenced objects or property characteristics in some standard format (e.g. GeoJSON), which we derive from image metadata.
Gathering 3rd-party data
We use 3rd-Party Data Providers to access imagery and their derivatives (i.e., Digital Surface Models / Point Clouds) to support end user requirements at the property level. In order to support our strict on-demand SLA of less than five seconds, we use the following key performance indicators (KPIs) to assess each provider:
- Coverage: what percentage of our locations of interest are covered?
- Latency: what’s the p50, p75, p99?
- Volume: how many requests per time period is supported? How volatile can our request volume be?
- Uptime: how many 9s do they guarantee?
Meeting this KPI standard is the first step in including a 3rd Party Data Provider in our on-demand processing pipeline. However, we won’t necessarily call all providers in every request. Aerial imagery is typically higher resolution (as they fly closer to the ground) than satellite imagery, but satellite providers typically have significantly higher coverage. So for an address that has aerial coverage, we might not need a higher resolution image if we only want to analyze property-level features like “building count”. If we want to analyze features of a specific building (e.g. “roof material”), we do need a higher resolution image, so we’d call our aerial providers first and fallback to lower resolution satellite providers if there isn’t aerial coverage.
To predict “roof presence” as an example, we would first pull a high zoom image and transform it to be as close as possible to the shape the ML Model was trained on. We then pass the image to the ML Model, which returns a segmentation mask i.e., an image that says whether each individual pixel is a roof pixel. For customers who require geo-referenced objects, we project pixel coordinates to a geographic coordinate system (e.g., WGS 84).
By harnessing a combination of efficient processing pipelines we’re able to deliver up to 50 highly accurate property characteristics and predictive indicators to our customers in under 5 seconds — 100% delivered on-demand.
For Arturo customers that rely on the latest property analytics, our on-demand API represents both a cost and time savings as they’ll no longer need to perform in-person property inspections.
Our engineering and applied machine learning teams continue to iterate all three types of services (Feature Aggregators, 3rd-Party Data Providers, and ML Models) to improve our overall response time and make our predictions and their confidence more accurate.
In a future engineering blog post, we’ll discuss the underlying infrastructure for all Arturo services, including hardware and common software frameworks we use.
Josh Trotter | Engineering Lead