Machine Vision – Benefits, components, techniques and history

machine vision

A robot capable of moving around a factory, construction site, or street must avoid obstacles and people with high precision. Machine vision enormously helps robots and machines alike to adapt to a changing environment, operating safely around people, and task-switching based on visual input.

Beyond obstacle avoidance, machine vision encompasses all industrial and non-industrial applications in which a combination of hardware such as sensors (single beam laser, LiDAR or sonar) and software is used to acquire, process, analyze, and measure various characteristics of objects for quick and real-time decision making, ensuring higher productivity, speed, accuracy, and repeatability.

On a production line, machine vision systems built around the right camera resolution and optics can inspect hundreds and even thousands of parts per minute. They can quickly examine object details too small to be seen by the human eye. They also bring additional safety and operational benefits by reducing human involvement in the manufacturing process.

Key benefits of machine vision applications

  • Higher performance and quality in inspection, measurement, gauging, and assembly verification
  • Increased productivity in repetitive tasks
  • Production flexibility in measurement and gauging
  • Less machine downtime and reduced setup time
  • Complete information and tighter process control
  • Lower capital equipment costs
  • Lower production costs, detection of flaws early in the process and scrap rate reduction
  • Inventory control and reduced floor space

How it works

All machine vision approaches are inspired by the human vision system, based on the extraction of conceptual information from two-dimensional images. They have 2D image-based capture systems and computer vision algorithms that mimic aspects of human visual perception. Humans perceive the surrounding world in 3D.

The ability to navigate and accomplish specific tasks depends on reconstructing 3D information from 2D images that allow them to locate themselves concerning the surrounding objects. Subsequently, this information is combined with prior knowledge to detect and identify objects around them and understand how they interact.


A machine vision system’s major components include the lighting, lens, image sensor, vision processing, and communications. Light illuminates, the part to be inspected, allowing its features to stand out so the camera can see them. The lens captures the image, presenting it to the sensor in the form of light.

The sensor converts this light into a digital image to be sent to the video processor for analysis. Vision processing consists of algorithms that examine the image and extract information, required for necessary inspection and decision making.

Types of machine vision systems

Regardless of the imaging sensors used, the most common approaches of reconstructing 3D information are typically based on time-of-flight techniques. The time-of-flight techniques use laser scanners to estimate the distance between the light source and the object based on the time required for the light to reach the object and return. They measure distances in kilometers, and they are accurate to a millimeter scale since they are limited by the ability to measure time.

Broadly, there are three categories of machine vision systems: 1D, 2D, and 3D.

  • 1D vision – Instead of looking at a whole picture at once, 1D vision systems analyze a signal one line at a time. They commonly detect and classify defects on products manufactured in a continuous process, such as metals, plastics, paper, non-woven sheet, or roll goods.
  • 2D vision – 2D vision systems capture an image of an object using a two-dimensional map of reflected intensity. This can be expressed in X and Y coordinates. Once captured, the image is typically processed by comparing variations in contrast.
  • 3D vision – Typically, 3D machine vision systems include multiple cameras or one or more laser displacement sensors. In robotic guidance applications, a multi-camera 3D vision provides the robot with information about the part orientation. These systems include multiple cameras mounted at different locations, and “triangulation” in 3D space at an objective position.

History of machine vision – Timeline

1914 – Optical Character Reader (OCR): Goldberg invented the machine that could read characters and convert them into standard telegraph code.

1963 – Complementary Metal-Oxide Semiconductor (CMOS): Frank Wanlass, an American electrical engineer, patented CMOS used in digital logic circuits, analog circuits, and image sensors.

1969 – Charge-Couple Device (CCD): CCD was invented at American Bell Laboratories by William Boyle and George E. Smith. CCD is the major technology for digital imaging as it converts incoming photons into electron charges.

1974 – Bayer filter camera: Bryce Bayer, an American scientist working for Kodak, captured vivid color information onto a digital image.

1980 – Photometric Stereo: Woodham presented a method to extract surface normals from multiple images based on smoothness constrained posed by the illumination model.

1981 – Computational Stereo: Grimson presented the theory of computational stereo vision that is biologically plausible.

The early 1990s – Simultaneous Localization and Mapping (SLAM): Leonard and Durrant-Whyte pioneered a probabilistic method for handling the uncertainty of noisy sensor readings and allows autonomous vehicles to localize themselves.

1997 – Nomad robot: Autonomous used to search Antarctic meteorites based on advanced perception and navigation technologies developed at Carnegie Mellon University.

1999 – Scale-Invariant Feature Transform (SIFT): David Lowe patented an algorithm to detect and describe local features in images. SIFT features are invariant to uniform scaling, orientation, and illumination changes.

2001 – Hawk-Eye: A real-time vision system with multiple high-performance cameras for providing a 3D representation of the trajectory of a ball using triangulation.

2001 – Bag of words in computer vision: Representing visual features as words to allow natural language processing information retrieval to apply in object recognition and image classification.

2002 – Active stereo with structured light: Zhang introduced the idea of using light patterns to estimate robust correspondence between a pair of images.

2004 – Real-time face detection: Machine learning approach of sliding-window based object recognition for robust face detection has been introduced.

2009 – Kinect: Microsoft announced a device that used structured-light computational stereo technology to track the body’s posture. Within 60 days, it sold 8 million units and claimed the Guinness World Record of the ‘fastest-selling consumer electronics device.’

2012 – Deep Neural Networks in Image Classification: DNNs are trained with big image datasets such as ImageNet and currently have exceeded human abilities in object/face recognition.

2017 – BWIBots: Vision robots learn the human’s preferences and cooperate by working side by side with humans.