The Hidden War in IoT: The Special Case of the Engineer and the Data Scientist

Successful IIoT implementation projects require two major actors: the engineer and the data scientist. Yet how do you bridge the gap between these two mindsets?

At its core, the Internet of Things (IoT) is based on the premise that a sufficient amount of data can lead to new insights into processes and systems. These can be used for decision support and new products and services. Or lead to internal savings and new external revenue streams. And successful IoT implementation projects require two major actors: the engineer and the data scientist.

Processing and analyzing machine-generated data fast yields positive results in predictive maintenance and production process optimization. But it also leads to enhanced customer satisfaction as it helps us better understand user behavior. Some time ago, a comprehensive study by McKinsey estimated that if machines are connected and monitored, it is possible to increase the productivity and the lifespan of machine tools, lower the maintenance costs between 10% and 40%, and reduce the energy consumption up to 20%.

A streamlined cycle leading to conclusive action is at the heart of the underlying strategy surrounding IoT. The challenge in this context is to bring the worlds of engineering and data science together. Within IoT contexts, these two specialty domains have to function in the most efficient way with the least possible friction.

How and at what stages of the data journey do companies close the gap between engineering (the world of hardware, microcontrollers, chips, electronics) and data science (the world of data warehousing, algorithm development, data analytics) is a strategic decision with far-reaching consequences. In what follows, we propose a way for the engineer and data scientist to work together across IoT edge scenarios.

The shift towards edge computing

What motivates the shift towards edge computing and why should we move intelligence to the edge of the IoT network? A major driver behind this trend is simultaneously a common challenge in big data systems. This is the need to capture massive amounts of data from various heterogeneous sources.

The edge becomes synonymous with speed

Further, the data has to be prepared for analysis across multiple stages. These include data validation, data cleaning, transformation, indexing, aggregation, and storing. Depending on the nature of the data and the business goal in mind, companies also need a suitable processing technique. This, in turn, can range from batch processing to real-time processing.

For those processing massive amounts of data, it makes sense to processes the data close to where it is generated. This applies when you leverage IoT in end-to-end scenarios or in highly sensor-intensive and thus data-intensive environments. This is due to some inevitable challenges in the data journey including bandwidth, network latency, and overall speed. Edge computing becomes particularly relevant in IoT applications with a mission-critical or remote component. Here edge computing minimizes the risk of data loss. But more significantly, it also offers acceleration in scenarios where speed is a key differentiator in IoT efforts.

The speed of data analysis has become indispensable in many industrial IoT applications. Speed is a key element of industrial transformation as companies shift towards autonomous and semi-autonomous decision-making by systems, actuators, and controls. You need to accelerate the generation of aggregated and analyzed data that can serve as actionable intelligence. And you need a fast decision-making path.

Edge vs. cloud

Whereas storing the data and analyzing it in the cloud means deeper and more comprehensive processing, edge computing offers speed and immediacy in data processing. More comprehensive processing can take place in cloud systems. There, you combine data from different sources and generate insights not immediately available at the edge. But when it comes to the speed of processing, accelerated decision-making, and greater autonomy, and hence higher levels of automation, processing at the edge has established itself as the faster and smarter approach.

Further, processing data at the edge of the IoT network enables organizations to remain fully in control of highly sensitive or proprietary information. Decision-making can take place at the edge. Hereby all data remains within the company and only non-sensitive information would make its way to the cloud. Also, the anonymization of company data can take place at the edge, allowing companies to protect critical data assets from possible security breaches.

A comprehensive IoT solution would manage devices at the edge layer, connect them together, and collect data from them. But it will also address big data techniques for data management, data transformation, and advanced analytics. IoT and big data can be brought together on a unified platform managing IoT devices and deploying apps in real-time while also leveraging big data frameworks and applications to store, process, analyze, and visualize industrial big data acquired by IoT. How do we achieve this confluence?

Engineering and data science: closing the gap

Edge computing is typically owned and managed by engineering departments. This is the world of hardware, microcontrollers, chips, and electronics. Engineers usually operate in slow development cycles that involve accessing machines without affecting the functionality of the whole, collecting data out of physical devices, and wrestling with high volumes of incoming raw data.

Cloud services, data processing, and analytics are typically owned by IT departments and data scientists. This is the world of mathematics, information technologies, abstract knowledge, and theoretical models. On this end of the scale, we have agile development, fast trial and error scenarios, the development of algorithms, and data science, broadly conceived. In the classic setup, the engineering department provides data extracted from industrial devices and sends it over to the data scientists so that they can work on insights generation and come back with an actionable decision.

Some typical bottlenecks in IoT development

What is today’s scenario? Well, it often involves some miscommunication between two actors: the engineer and the data scientist. Engineers gather massive amounts of data from their machines. And they want to have data science performed on the data to streamline certain aspects of their process. So they would hire a data science team to do work on that data. Once confronted with the need to get to the machine data of the devices in the machine hall, however, data scientists have to discover that getting the data out of devices is a long laborious process. Given this scenario, it is often the case that the bottlenecks for IoT projects are usually not the ideas or algorithms but the data pipeline and data quality.

The data from the engineering department is sometimes in the wrong format, or not what is needed at the moment. For example, it may turn out that the data has to be tracked at a higher rate. Engineers may have to get to the edge computer to find a new way of extracting the data. To get usable data, they may even have to purchase new edge devices to extract the data. But then again, the data can still be unusable. Then it might emerge that a higher iteration rate or a larger time window for recording is needed. So the edge devices may have to be reconfigured completely to collect that data in a different way. This process of running back and forth may involve multiple iterations that take weeks.

Better communication with a common IoT platform

A practicable solution, in this case, is a unified platform that covers the entire IoT development cycle from working on the machine—starting with data extraction—all the way up to the data science outcomes. Engineers can log into the platform to access the computing device at the IoT edge, set values via the platform tools or deploy the same code/configurations on a fleet of IoT devices located remotely. Once the data starts coming in, it is stored in a location on the platform from where data scientists can view the historization and versioning processes at a glance, data cleaning is performed, the data can be merged with other data, modeled, and made directly available for analytics and visualization tasks.

One such IoT platform closes the communication gap between engineering and data science. It brings these worlds together via a unified enabling interface. Data scientists are able to directly access the edge computer that generates the data and is responsible for data handling. The core machine computer remains completely separate. It is not maintained or managed by the platform. And the data engineering department no longer has to face the challenge of accessing the actual machine in real time. Via the platform, both IoT engineers and the data engineering team have the benefits of remote access. They can deploy code over the air and use the device management functionality to see where the data is coming from and which IoT device is deployed at a certain location.

The IoT platform as a unifying hub

One such IoT platform becomes the “digital backbone” of the industrial operations of an organization. This is where two actors — the engineer and the data scientist — connect software and hardware to extract value from business operations.

An IoT platform that is ready for smart manufacturing incorporates artificial intelligence, i.e. machine learning and deep learning, big data technologies plus established automation technologies for data mining from different data sources, data modeling, statistical analysis, data visualization, as well as the ability to work with any programming language. One such platform not only serves as the enabler for the IoT ecosystem but also leverages its own infrastructure as a digital hub where device management and app development meet an advanced data science toolchain to generate actionable insights at a greater pace.

An IoT platform equipped to face critical IoT challenges at the IoT edge offers a combination of capabilities such as the management of IoT endpoints and connectivity plus IoT application development and integration tools. Further, an IoT platform that brings engineers and data scientists together supplements these capabilities with the accessing, ingestion, and processing of IoT data, together with IoT data analysis and visualization. One level of this solution is a fully-fledged IoT studio for application development and the remote management of IoT devices. Another level is a fully integratable data warehouse infrastructure. This is where you receive the data streams harvested from devices and build an insight-enabling analytical environment for the data analyst.

The key takeaway

A platform that brings together the data engineer and the data scientist offers a complex end-to-end IoT solution. It starts with data acquisition from IoT devices, plus collecting, pre-processing, and aggregating the data at the IoT gateway. Data scientists can then transmit the data to a cloud data science platform. This is where more advanced analytics are performed and ML models are trained. Once data scientists have trained their machine learning models in the cloud, they can bring logic to the IoT edge. Using the platform, the trained models can be rolled out to a variety of IoT devices. In this way, we have an iterable cycle of data acquisition, transformation, analytics, and deployment back at the IoT edge.

‍