Modal Title
Data / Edge Computing

Creating an IoT Data Pipeline Using InfluxDB and AWS

Reliable data pipelines are critical to industry, providing key information about the state of machinery and to train predictive models to improve operations.
Jun 5th, 2023 10:25am by
Featued image for: Creating an IoT Data Pipeline Using InfluxDB and AWS

The Internet of Things (IoT) and operations technology (OT) spaces are awash in time series data. Sensors churn out time-stamped data at a steady and voluminous pace, and wrangling all that data can be a challenge. But you don’t want to collect a bunch of data and let it just sit there, taking up storage space without providing any value.

So, in the spirit of gaining value from data for IoT/OT, let’s look at how to configure a sensor using InfluxDB and Amazon Web Services (AWS) to collect the type of time-stamped data generated by industrial equipment. You can adapt, expand and scale the concepts here to meet the needs of full industrial production needs. Here’s what we’ll use in the example and why.

  • InfluxDB: This time series database just released an updated version that outperforms previous versions in several key areas, such as query performance against high cardinality data, compression and data ingest. Critically for this example, InfluxDB’s cloud products are available in AWS. Having everything in the same place reduces the number of data transfers and data-latency issues in many instances.
  • Telegraf: Telegraf is an open source, plugin-based data-collection agent. Lightweight and written in Go, you can deploy Telegraf instances anywhere. With more than 300 available plugins, Telegraf can collect data from any source. You can also write custom plugins to collect data from any source that doesn’t already have a plugin.
  • AWS: Amazon Web Services has a whole range of tools and services geared toward IoT. We can tap into some of these services to streamline data processing and analysis.
  • M5stickC+: This is a simple IoT device that can detect a range of measurements, including position, acceleration metrics, gyroscope metrics, humidity, temperature and pressure. This device provides multiple data streams, which is similar to what industrial operators face with manufacturing equipment.

InfluxDB and AWS IoT in Action

The following example illustrates one of many possible data pipelines that you can scale to meet the needs of a range of industrial equipment. That may include machines on the factory floor or distributed devices in the field. Let’s start with a quick overview; then we’ll dive into the details.

The basic data flow is:

Device → AWS IoT Core → MQTT (rules/routing) → Kinesis → Telegraf → InfluxDB → Visualization

The M5stickC+ device is set up to authenticate directly with AWS. The data comes into AWS IoT Core, which is an AWS service that uses MQTT to add the data to a topic that’s specific to the device. Then there’s a rule that selects any data that goes through the topic and redirects it to the monitoring tool, Amazon Kinesis.

A virtual machine running in AWS also runs two Docker containers. One contains an instance of Telegraf and the other an instance of InfluxDB. Telegraf collects the data stream from Kinesis and writes it to InfluxDB. Telegraf also uses a DynamoDB table as a checkpoint in this setup so if the container or the instance goes down, it will restart at the correct point in the data stream when the application is up again.

Once the data is in InfluxDB, we can use it to create visualizations.

OK, so that’s the basic data pipeline. Now, how do we make it work?

Device Firmware

The first step is creating a data connection from the M5 device to AWS. To accomplish this we use UI Flow, a drag-and-drop editor that’s part of the M5 stack. Check out the blocks in the image below to get a sense for what the device is collecting and how these things map to our eventual output.

We can see here that this data publishes to the MQTT topic m5sticks/MyThingGG/sensors.

AWS IoT Core Rules

With the device data publishing to an MQTT broker, next we need to subscribe to that topic in AWS IoT Core. In the topic filter field, enter m5sticks/+/sensors to ensure that all the data from the device ends up in the MQTT topic.

Next, we need to create another rule to ensure that the data in the MQTT topic goes to Kinesis. In IoT Core, you can use a SQL query to accomplish this:

SELECT *, topic(2) as thing, 'sensors' as measurement, timestamp() as timestamp FROM 'm5sticks/+/sensors'

In an industrial setting, each device should have a unique name. So, to scale this data pipeline to accommodate multiple devices, we use the +` wildcard in the MQTT topic to ensure that all the data from all devices end up in the correct place.

This query adds a timestamp to the data so that it adheres to line protocol, InfluxDB’s data model.

Telegraf Policy

Now that data is flowing from the device to Kinesis, we need to get it into InfluxDB. To do that, we use Telegraf. The code below is the policy that determines how Telegraf interacts with Kinesis. It enables Telegraf to read from the Kinesis stream and enables read and write access to DynamoDB for checkpointing.

Telegraf Config

The following Telegraf config uses Docker container networking, the Telegraf Kinesis Consumer plugin to read the data from Kinesis and the InfluxDB v2 output plugin to write that data to InfluxDB. Note that the string fields match the values from the device firmware UI.

Docker Compose

This Docker compose file gets uploaded to an EC2 instance using SSH.

Visualization

Once your data is in InfluxDB, users running InfluxDB can create visualizations and dashboards using your tool of choice. InfluxDB offers a native integration with Grafana and supports SQL queries using Flight SQL-compatible tools.

Conclusion

Building reliable data pipelines is a critical aspect of industrial operations. This data provides key information about the current state of machinery and equipment, and can train predictive models used to improve machinery operations and effectiveness. Combining leading technologies like InfluxDB and AWS provides the tools necessary to capture, store, analyze and visualize this critical process data.

The use case described here is just one way to build a data pipeline. You can update, scale and amend it to accommodate a wide range of industrial IoT/OT operations or build something completely custom using the range of IoT solutions available for InfluxDB and AWS.

Group Created with Sketch.
TNS owner Insight Partners is an investor in: Pragma, Docker.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.