Implementing an IoT platform in a smart factory – Three approaches

smart factory

Transforming a legacy manufacturing unit to a smart factory is the new trend in this age of digital disruption. A smart factory creates an ecosystem involving connected systems, automation, Internet-of-Things (IoT), big data, and cloud, transforming almost all areas, including manufacturing, marketing, supply chain planning, energy management, etc.

It also has great potential to innovate, improve productivity, reduce costs, improve market share and create competitive advantage by creating a single IoT (Internet of Things) platform where multiple business functions such as procurement, planning, manufacturing, sales & distribution, finance, and accounting teams work together to meet overall corporate objectives.

This type of IoT platform that transforms a factory into a smart factory consists of several layers and components, such as the abilities to extract data from equipment, sensors, and devices, leverage Edge Analytics to analyze large volumes of data, store large amounts of data that can grow with minimal cost and perform real-time analysis and control.

Implementing a reliable, secure, and single IoT platform in a smart factory is possible through three approaches. This post will discuss the pros and cons of each implementation approach.

1. In-house, open-source IoT platform

This means building an organization-specific in-house IoT platform, using many Open Source Software (OSS), products, and technologies to establish all the necessary components of a smart factory. Typically OSS Apache Hadoop framework, along with its plethora of modules, is used to establish the IoT platform. At the same time, Hadoop Distributed File System (HDFS), HBase, and NoSQL databases (Cassandra or MongoDB) are the usual choice for data storage.

The ideal choices for data computations are Apache Spark and MapReduce. At the same time, Apache Flume and Sqoop are used for data movement and connectivity from different sources such as machine logs, enterprise databases, and IT applications. In addition, Apache Kafka and Storm are used for real-time data streaming from sensors, while Apache Mahout, MLLib, and Spark ML are used for applying machine learning algorithms. Apart from these products, there are additional governance, scheduling, security, metadata management, deployments, etc.


  • Can achieve solutions specific to the organization’s requirements
  • Offers a high level of flexibility in terms of achieving project goals, change, and customizations
  • Complete ownership of data, product stack, and processes
  • Completely driven by the organization’s vision, expertise, and execution


  • Requires strong expertise in different big data suite of products and various programming skills such as java, scala, python, R, etc.
  • TTM (Time-to-Market) is slower unless an organization has high capability maturity, involve large development teams, and execute the project in parallel sprints
  • Involves a high level of complexity and requires a huge effort as the solution is entirely home-grown
  • Hindered by the challenges of being an early adopter and prepare to discover issues as the products mature
  • Requires significant effort to just build the platform
  • Prepare to be overwhelmed by issues, such as version incompatibilities among the multitude of products or product support.
  • Must have the ability and willingness to experiment and endure calculated risks
  • This setup can only be achieved in large organizations since it requires significant effort, resources, time, and budget.
  • It may not be suited for SMEs (Small and Medium Enterprises) with relatively smaller IT teams.

2. Commercial distributions or SaaS

Instead of building from scratch, an organization can embrace appropriate commercial distributions and cloud technologies that will greatly mitigate the big challenges highlighted in the previous approach.

Subscription-based cloud solutions and deployments are very flexible and scalable. It doesn’t need to worry about the platform compatibility issues since several enterprise-ready commercial distributions are available that simplify the integration issues among the multitude of products in the Hadoop ecosystem and provide Big Data as a packaged solution ensuring ease, flexibility, and security. As a result, they can onboard the platform at a much faster pace and can focus more on specific business needs.


  • Business goals are met faster as platform and infrastructure challenges are outsourced to software distribution providers.
  • Partial ownership of data and process, using cloud deployments
  • Minimize or avoid platform version incompatibilities and software technical debt.
  • Offers still a high level of flexibility in development and deployment, with faster TTM (Time to Market)


  • Increases risks of vendor lock-in, with reduced flexibility offered by open-source software
  • These solutions can be generic by offering a big data suite of products in the cloud requiring industry-specific IoT solutions to be built.
  • There is no single software provider that offers end-to-end IoT solutions.
  • Still requires broad expertise in multiple technologies, highly skilled resources, larger development teams, and the ability to execute big projects

3. Platform-as-a-service (PaaS)

The first two approaches demand significant effort and time in establishing an IoT platform. In PaaS, the complexities of the platform and the multitude of products are outsourced to the software provider companies, which leverage IoT, cloud computing, machine learning, and big data analytics. In addition, the organizations can also benefit from domain-specific or industry-specific accelerators that can be readily adopted.


  • IoT solutions are deployed at a much faster pace, and organizations can benefit from readily implemented features.
  • Most of the IT complexity is outsourced: not only the platform and infrastructure complexity but also the implementation complexity
  • Product companies can provide the necessary skilled resources for project implementation
  • Business goals are met much faster, as the organizations can benefit from domain-specific templates or accelerators, resulting in faster TTM


  • High risk of vendor lock-in
  • Requires highly specialized skills related to the software product, and that skillset is not widely available even among System Integrator companies
  • Heavy reliance on product vendors for implementation, which is usually offered at a premium price
  • Decrease in flexibility for customizations and often must wait for product vendor to release the required features.