Towards a Modular IoT Stack

4 minute read

After ten sometimes bumpy years in the IoT environment, I would like to explain the pitfalls encountered and make a proposal on how to build a more modular, resilient IoT stack.

The Organically Grown IoT Stack

About ten years ago, there was a lot of hype, and nobody wanted to miss the IoT train. No one knew where the IoT train was headed, but there was no way you could afford to miss it. With great enthusiasm, teams were formed and sent on the IoT journey. Since the destination of the journey was unclear, the focus was on what was already visible: there were sensors, gateways, and, of course, the cloud. The decision was easy, and the embedded developers were divided into a sensor team and a gateway team. Anyone who had already worked with the cloud was placed in the cloud team. And the managers? They didn’t trust the idea and purchased a “turnkey IoT solution”.

It quickly became clear that, due to the network topology, the cloud could not connect directly to the IoT gateway — and so an agent was installed on the gateway to establish an outgoing connection to the cloud.

From then on, this connection was the linchpin for all communication between the cloud and the edge. The agent was expanded to include more and more functionality, and its complexity increased steadily:

Organically grown IoT stack

Whenever a new feature had to be developed, it was a challenge to find the right people from the different teams. In addition, the approaches and mindsets of the various teams were extremely different and sometimes incompatible. And suddenly, the “turnkey IoT solution” that had originally been selected was discontinued during the course of the project. It goes without saying that replacing the discontinued solution has shaken the foundations of the overall architecture.

It was time to rethink the approach…

The Multi-Layered IoT Stack

Over the years, it has become clear which useful and necessary applications can be solved thanks to IoT. Based on these findings, I would like to propose an architecture that is based more on use cases than on obvious building blocks:

Multi-layered IoT stack

The teams could be divided according to use cases:

Edge Hardware and Board Support Packages

This hardware-oriented team is responsible for providing suitable hardware and ensuring that the hardware can be equipped with appropriate software. Furthermore, this team ensures long-term maintenance of the basic software so that timely improvements and security updates can be provided throughout the entire service life of the devices. Nowadays, operating systems no longer need to be invented — yet comprehensive control over this fundamental building block remains indispensable. By choosing and maintaining systems based on Yocto, Zephyr and or Debian, you have this freedom.

IoT Connectivity

The connectivity team is responsible for the strategy governing how devices connect to the cloud initially and throughout their service life. Communication between edge devices must also be secured depending on the application and is also the responsibility of this team.

Device Provisioning and Management

By the time an IoT device goes online for the first time, its software could already be horribly outdated. The device provisioning team ensures that these devices can be updated to the latest version. The update functionality must be designed to be reliable and durable. An error can result in entire shipments of devices having to be updated manually, which is a laborious and time-consuming task. Of course, the team can come up with its own solution, but in the meantime, consideration should be given to using proven solutions such as Mender, RAUC or SWUpdate.

Configuration Management and Orchestration

The configuration management and orchestration team ensures that devices across the entire fleet are configured correctly according to their use cases. Team members take care of the configuration details of individual devices on the one hand, and are responsible for the operational aspects of the entire fleet on the other. In a previous blog post, I explained how GitOps could be used to orchestrate the fleet in a highly reproducible manner.

Monitoring and Alerting

The monitoring and alerting team has various customers. On the one hand, monitoring and alerting can be the actual purpose of the IoT solution and thus be directly relevant to the end customer. On the other hand, all developers of the platform are likely to be interested in how well their solutions work on a large scale. Instead of developing a solution themselves, the team can put together a good solution using, for example Fluent Bit, InfluxDB and Grafana.

Command and Control

IoT solutions are often not only responsible for monitoring, but are also used to control networked devices. Depending on the application, short response times and high availability are expected for control purposes. The command and control team takes care of these applications. I don’t have any specific recommendations in this area, but I would definitely take a look at NATS.

Conclusion

The aim of the above division is to enable individual teams to handle an entire use case. I believe that this will allow use cases to be implemented much more efficiently and satisfactorily, as the entire team will be working toward a common goal. Of course, it is of paramount importance that the teams complement each other and do not compete with each other. There is still a great deal of overlap between the teams, and only close cooperation will lead to the desired result.

Updated:

Leave a comment