All about Sony’s Vision Sensors embedded with AI processing

The industrial world is familiar with different kinds of vision sensors, used to solve a variety of error-proofing and inspection applications throughout the manufacturing process. Using high performance AI algorithms, these vision sensors detect defects in products in various shapes and sizes and help manufacturers maintain high-quality control standards.

Unlike manual inspection, which is slow, prone to error, and impeded by product size, space constraints, lighting conditions, and fast production line speeds, vision sensors are quicker, more objective, and can work continuously. They can inspect thousands of parts per minute, providing more consistent and reliable inspection results.

The use of vision-based sensors for applications in various fields, ranging from robot navigation to visual inspection in industrial environments, has considerably increased during the last decades.

But all sensors are not created equal. Low-cost photoelectric sensors can perform only a few everyday tasks, such as position verification and basic counting. They cannot distinguish between patterns or colors.

Besides, the increasing volumes of information obtained from sensors and processed in the cloud pose many problems: security concerns, increased data transmission latency hindering real-time information processing, increased power consumption, and costs, etc.

What is the solution?

The world’s leading manufacturer of image sensors and image processors for cameras, tablet computers, and smartphones, Sony has come up with the world’s first image sensors equipped with AI processing functionality within the sensors itself.

The signal, acquired by the pixel chip, is processed via AI on the logic chip, eliminating the need for high-performance processors or external memory. The sensor outputs metadata (semantic information belonging to image data), instead of image information, making for reduced data volume and minimizing any privacy concerns.

Data output format selectable to meet various needs

The sensors enable high-speed edge AI processing and extraction of only the necessary data, which, when using cloud services, reduces data transmission latency, minimizes any privacy concerns, and reduces power consumption and communication costs.

Its two models of intelligent vision sensors expand the opportunities to develop AI-equipped cameras, enabling a diverse range of applications in the commercial and industrial equipment industries and contributing to building optimal systems that link with the cloud.

Example of real-time tracking with product and task at a register

Moreover, AI capability makes it possible to deliver various functionality for versatile applications, such as real-time object tracking with high-speed AI processing. Different AI models can also be chosen by rewriting internal memory in line with user requirements or the conditions of the location where the system is being used.

Example of camera usages in a facility

Key features

First image sensor equipped with AI processing

The pixel chip is back-illuminated and has approximately 12.3 effective megapixels for capturing information across a wide-angle of view. In addition to the conventional image sensor operation circuit, the logic chip is equipped with Sony’s original DSP (Digital Signal Processor) dedicated to AI signal processing and memory for the AI model. This configuration eliminates the need for high-performance processors or external memory, making it ideal for edge AI systems.

Metadata output

Signals acquired by the pixel chip are run through an ISP (Image Signal Processor), and AI processing is done in the process stage on the logic chip. The extracted information is output as metadata, reducing the amount of data handled. Ensuring that image information is not output helps to reduce security risks and to minimize any privacy concerns. In addition to the image recorded by the conventional image sensor, users can select the data output format according to their needs and uses, including ISP format output images (YUV/RGB) and ROI (Region of Interest) specific area extract images.

High-speed AI processing

When a video is recorded using a conventional image sensor, it is necessary to send data for each output image frame for AI processing, resulting in increased data transmission and making it challenging to deliver real-time performance. The new sensor products from Sony perform ISP processing and high-speed AI processing (3.1 milliseconds processing for MobileNet V1*2) on the logic chip, completing the entire process in a single video frame. This design makes it possible to deliver high-precision, real-time tracking of objects while recording video.

Selectable AI model

Users can write AI models of their choice to the embedded memory and can rewrite and update it according to its requirements or the conditions of the location where the system is being used. For example, when multiple cameras employing this product are installed in a retail location, a single type of camera can be used with versatility across different locations, circumstances, times, or purposes. It can count the number of visitors entering a facility, detect stock shortages, and be used for heat mapping store visitors (detecting locations where many people gather). Furthermore, the AI model in a given camera can be rewritten from one used to detect heat maps to one for identifying consumer behavior, and so on.

You can check the key specifications below:

Model names:

  • IMX500 1/2.3-type (7.857 mm diagonal) approx. 12.3 effective megapixel intelligent vision sensor (bare chip product)
  • IMX501 1/2.3-type (7.857 mm diagonal) approx. 12.3 effective megapixel intelligent vision sensor (package product)
Model name IMX500 (bare chip product) IMX501 (package product)
Image size Diagonal 7.857 mm (1/2.3 type)
Unit cell size 1.55 μm (H) × 1.55 μm (V)
Frame rate Full pixel 60 fps
Video 4K (4056 × 2288) 60 fps
1080p 240 fps
Full/video+AI processing 30fps
Metadata output 30fps
Sensitivity (F5.6 standard value) Approx. 250LSB
Sensor saturation signal level (minimum value) Approx. 9610e-
Power supply Analog 2.7V
Digital 0.84V
Interface 1.8V
Main functions AI processing function, ISP, HDR shooting
Output MIPI D-PHY 1.2 (4 lane) / SPI
Color filter array Bayer array
Output format Image (Bayer RAW), ISP output (YUV/RGB), ROI, metadata
Package Ceramic LGA

12.5 mm (H) × 15.0 mm (V)