How to Incorporate Live Traffic Data into Your Routes

A tutorial on how to set up a dynamic, real-time routing solution that finds the best path when traffic on each road segment is constantly changing.

May 10th, 2023 7:00am by Hari Subhash

Featued image for: How to Incorporate Live Traffic Data into Your Routes

Road networks can be modeled as graphs where each edge in the graph represents a specific road segment. And each edge (road segment) can have an associated “weight” that indicates things like traffic, the time for travel, distance, etc.

Graph solvers like the shortest path algorithm can then be applied to these weighted networks to find the best route between two points. A “graph solver” is an algorithm that is used to solve common graph network problems. I am using Kinetica’s library of graph solvers for this demonstration. We can query Kinetica’s graph API using SQL which makes it really easy to use. The graph outputs also pair well with relational tables making it easy to plug graph solvers into existing analytical pipelines.

The challenge when it comes to applying this to the real world is that traffic on each road segment is constantly changing. A solver, therefore, needs to account for these real-time traffic weights to find the best route between a source and a destination point.

This article shows how to set up a dynamic, real-time routing solution that finds the best path from a randomly selected source point in Washington, D.C., to Union Station (a railway, bus and Metro station). Each proposed route will incorporate the latest traffic information that is streaming in from a Kafka topic.

We also will compare each of them with the shortest path generated without the updated traffic weights. Both of these routes are executed every 10 seconds with a new randomly selected source point.

Figure 1: The green route in the map below uses updated real-time weights while the blue one does not.

This demo is set up using Kinetica. Scroll down to the last section to see how to try this out on your own.

The Data

I have set up the following data for this demonstration.

D.C. road network data: This is a CSV file on AWS S3 that contains spatial data (WKT line strings) that describes roads in Washington, D.C., along with additional information on the direction (one-way vs. two-way), distance and average time for traversing each segment.
Traffic weights data: This is a stream of fake traffic data associated with each edge in the D.C. road network that I have set up using Kafka. Each message in the Kafka topic consists of an edge ID that corresponds to the ID in the D.C. road network data and an associated traffic weight. Note that this data is fake. I have generated it to show the effects of applying real-time weights when solving a routing problem.
A list of source points: This CSV file on AWS S3 contains a set of starting points for each trip to Union Station. We select one point from this list at random when executing the solver.

Create the Graph

Our first task is to create the graph using the road network data. The query below specifies the edges using the ID and the WKTLINE.

A WKTLINE is shorthand for a MULTILINE WKT (Well Known Text) string. WKT is a markup language that is used to represent spatial geometries. The figure above shows an example. (Image source: Wikipedia)

The weights for the graph are specified as the average of the total time for a particular line string divided by the number of points in it. This is because the edges in the graph are created based on the number of points in each edge WKTLINE.

CREATE OR REPLACE DIRECTED GRAPH dc_graph
(
EDGES => INPUT_TABLE 
(
SELECT  
id  AS ID,
wkt AS WKTLINE,                                            
dir AS DIRECTION,
time/(ST_numPoints(wkt)-1) AS WEIGHT_VALUESPECIFIED
FROM dc_roads
),
OPTIONS => KV_PAIRS
(
graph_table = 'dc_graph_table'
)
);

CREATE OR REPLACE DIRECTED GRAPH dc_graph

(

EDGES => INPUT_TABLE

(

SELECT

id AS ID,

wkt AS WKTLINE,

dir AS DIRECTION,

time/(ST_numPoints(wkt)-1) AS WEIGHT_VALUESPECIFIED

FROM dc_roads

OPTIONS => KV_PAIRS

(

graph_table = 'dc_graph_table'

)

);

Get the Most Recent Weights Using a Materialized View

The Kafka topic receives one message at a time that represents the most recent traffic cost associated with a particular edge. A sample message is shown below.

{
"timestamp": 1681938040473,
"id": "727339712",
"traffic_cost": 9.58
}

{

"timestamp": 1681938040473,

"id": "727339712",

"traffic_cost": 9.58

}

These messages are added as rows in a table in Kinetica. There are no restrictions on the number of times that traffic information could be recorded for an edge. For instance, we could have the traffic cost for an edge from one1 hour ago, 10 minutes back and the current time. However, we only want the most recent weight for each edge when we find the best route.

Additionally, the weights are reported for each edge. We, therefore, need to divide the weights for a particular edge by the number of points associated with that edge (just as we did when we created the graph). However, the message on Kafka does not include the spatial representation of the road segments — WKTLINE. To get this, we need to join the most recent weights with the D.C. road network data that contains the edge WKTLINE.

And all of this needs to be executed really fast so that the solver always has up-to-date weights. We can address this by using two materialized views (MV) in Kinetica. Read this article to learn more about real-time materialized views in Kinetica.

The first MV finds the most recent data for each edge ID.

CREATE OR REPLACE MATERIALIZED VIEW recent_weights 
REFRESH EVERY 1 SECOND AS 
SELECT 
    id,
    ARG_MAX(timestamp, traffic_cost) as traffic_cost,
    MAX(timestamp) AS timestamp
FROM traffic_weights
GROUP BY id

CREATE OR REPLACE MATERIALIZED VIEW recent_weights

REFRESH EVERY 1 SECOND AS

SELECT

id,

ARG_MAX(timestamp, traffic_cost) as traffic_cost,

MAX(timestamp) AS timestamp

FROM traffic_weights

GROUP BY id

The second materialized view joins the recent weights view above with the D.C. road network data that contains information on the number of points in each edge in the data. This gives us the most recent weights after taking into account the number of points in each edge WKTLINE. These weights can now be used for finding optimal routes on the D.C. road graph network.

CREATE OR REPLACE MATERIALIZED VIEW weights_mapped 
REFRESH ON CHANGE AS 
SELECT 
    rw.id,
    wkt,
    time,
    rw.traffic_cost/(ST_numPoints(wkt)-1) as traffic_cost
FROM recent_weights rw, dc_roads d 
WHERE rw.id = d.id

CREATE OR REPLACE MATERIALIZED VIEW weights_mapped

REFRESH ON CHANGE AS

SELECT

rw.id,

wkt,

time,

rw.traffic_cost/(ST_numPoints(wkt)-1) as traffic_cost

FROM recent_weights rw, dc_roads d

WHERE rw.id = d.id

Note that both the materialized views are set to refresh, so this means that every second these queries are updated to give us the most recent weights for each edge in the D.C. graph.

Real-Time Solver

Now that we have the weights and graph network setup we can go ahead and run the solvers. In the workbook (shared in the last section) I use a SQL procedure that is executed once every 10 seconds to run three separate queries.

The first performs a simple task: pick a random starting point for the trip to Union Station in Washington, D.C. from the list of source points.

CREATE OR REPLACE TABLE source_point AS 
SELECT source AS WKTPOINT 
FROM source_points 
ORDER BY RAND() 
LIMIT 1;

CREATE OR REPLACE TABLE source_point AS

SELECT source AS WKTPOINT

FROM source_points

ORDER BY RAND()

LIMIT 1;

The second query runs the shortest path solver on the most recent weights. Notice the WEIGHTS_ON_EDGES parameter, here we are applying the most recent edge weights from the materialized view.

CREATE OR REPLACE TABLE path_with_weight AS
SELECT * FROM TABLE 
(
    SOLVE_GRAPH
    (
        GRAPH => 'dc_graph',
        SOLVER_TYPE => 'SHORTEST_PATH',
        SOURCE_NODES => INPUT_TABLE(SELECT WKTPOINT FROM source_point),
        DESTINATION_NODES => INPUT_TABLE(SELECT ST_GEOMFROMTEXT('POINT (-77.0074811 38.8976992)') AS WKTPOINT),
        WEIGHTS_ON_EDGES => INPUT_TABLE
        (
           SELECT 
                id AS EDGE_ID,                                 
                traffic_cost AS VALUESPECIFIED
            FROM hsubhash_kinetica.weights_mapped
        )
    )
);

CREATE OR REPLACE TABLE path_with_weight AS

SELECT * FROM TABLE

(

SOLVE_GRAPH

(

GRAPH => 'dc_graph',

SOLVER_TYPE => 'SHORTEST_PATH',

SOURCE_NODES => INPUT_TABLE(SELECT WKTPOINT FROM source_point),

DESTINATION_NODES => INPUT_TABLE(SELECT ST_GEOMFROMTEXT('POINT (-77.0074811 38.8976992)') AS WKTPOINT),

WEIGHTS_ON_EDGES => INPUT_TABLE

(

SELECT

id AS EDGE_ID,

traffic_cost AS VALUESPECIFIED

FROM hsubhash_kinetica.weights_mapped

)

);

The last query repeats the same solver, but this time we don’t apply the new weights. Instead, the graph is solved using the default weights that were used when it was created.

CREATE OR REPLACE TABLE path_without_weights AS
SELECT * FROM TABLE 
(
    SOLVE_GRAPH
    (
        GRAPH => 'dc_graph',
        SOLVER_TYPE => 'SHORTEST_PATH',
        SOURCE_NODES => INPUT_TABLE(SELECT WKTPOINT FROM source_point),
        DESTINATION_NODES => INPUT_TABLE(SELECT ST_GEOMFROMTEXT('POINT (-77.0074811 38.8976992)') AS WKTPOINT)      
    )
);

CREATE OR REPLACE TABLE path_without_weights AS

SELECT * FROM TABLE

(

SOLVE_GRAPH

(

GRAPH => 'dc_graph',

SOLVER_TYPE => 'SHORTEST_PATH',

SOURCE_NODES => INPUT_TABLE(SELECT WKTPOINT FROM source_point),

DESTINATION_NODES => INPUT_TABLE(SELECT ST_GEOMFROMTEXT('POINT (-77.0074811 38.8976992)') AS WKTPOINT)

)

);

Both the routes, with and without updated traffic weights, are stored in tables in Kinetica. We can compare them below.

Route Comparisons

The images below show the route comparisons between the same starting point to Union Station. The green routes incorporate the latest weights while the blue uses the default weights. These routes are generated from random starting points once every 10 seconds.

Try This Out on Your Own

You can try all of this on your own for free by downloading the workbook for this article here and importing it into Kinetica Cloud or Kinetica’s Developer Edition. I have pre-configured all the data so you don’t have to do any additional setup to run this workbook.

Both Kinetica Cloud and Developer Edition are free to use. The cloud version takes only a few minutes to set up and it is a great option for quickly touring the capabilities of Kinetica. The Developer Edition is also easy to set up and takes about 10 to 15 minutes and an installation of Docker. The Developer Edition is a free forever personal copy that runs on your computer and can handle real-time computations on high-volume data feeds.

Download this academic paper to learn how Kinetica-Graph is implemented.

Hari Subhash is director of community content strategy and enablement for Kinetica. He has more than 15 years of experience in the data analytics space across a wide range of industry sectors such as financial management, international development and technology....