Computer Vision: here’s how a machine can help you recognize objects.
The “computer vision” is an interdisciplinary field that concerns algorithms and applications that allow a normal computer to obtain a high level understanding from digital images or videos.
The first studies date back to the 1970s, but only recently thanks to the exponential improvement in hardware performance, the development of increasingly advanced algorithms and the use of techniques based on artificial intelligence (in particular Deep Learning) have been made giant leaps in this sector.
Today, computer vision is applied in many areas from the industrial to the medical field to everyday life. Just think of all the algorithms that are natively present in our smartphones for facial recognition, panoramic, license plate recognition systems, etc.
Typically in most applications at the moment the images are supervised by people, however image processing can be automated with applications never seen before, capable of giving better results than images analyzed by the best experts.
Principle of operation
The computer vision somehow tries to imitate the functioning of the visual process.
Through the eye the images are transferred to the brain where the visual cortex analyzes the image that compares it with everything it already knows, classifies objects and dimensions and finally decides “something to do”. All in a small fraction of a second!
For the computer, an image is a matrix of numbers, said pixels, each of which represents the elementary component of an image that can be displayed using the coordinates set by a graphic processing program.
The set of pixels represent the set of numbers needed by a computer to see the image and are organized according to a 2D or 3D matrix of numbers between 0 and 255, depending on whether it is in grayscale or color. The numbers within it identify the scale of a given color, hence its intensity, and if a pixel is more opaque, clearer or clearer than the neighbor it will have a different number.
For example, in a black and white image there is only a 2D matrix where: the number 0 represents black and the number 255 represents the white color for a given pixel of the matrix. In grayscale images, each pixel represents the intensity of a single color. In other words, the matrix is said to have a channel.
The 3D matrix of the color image, on the other hand, includes 3 channels: the red, green and blue (from English RGB, ie Red, Green and Blue). Each channel is composed of a 2D matrix, similar to that of the gray scale, and therefore we will have a matrix for the red channel, one for the green channel and one for the blue channel, which are then stacked one above the other to obtain the 3D matrix showing the color image.
So the computer sees the same things we see?
It actually depends on the type of sensor used. In some areas, for example, it is important to focus attention outside the visible spectrum, so with special “multispectral” or even “hyperspectral” sensors the computer is able to “see” details that the human eye is not able to see. to perceive, because “outside” the range of frequencies visible to us.
And the recognition?
A computer can identify the contents and recognize what it sees using a whole series of methodologies such as feature recognition (feature detection) and corner recognition (corner detection). Without going into complex details these methodologies are used by computer vision algorithms in the initial phase of image study in order to search for the lines that meet at an angle and understanding a specific part of the image with a color gradient. These corners and features are the building blocks that help you find more detailed information contained in the image.
Furthermore, to facilitate recognition, the algorithm used performs a structural analysis and image segmentation to understand where the regions of interest are located, providing information on the spatial arrangement of colors or intensities in an image.
Fundamental aspect: Training!
In order for the computer to be able to compare two similar objects it is important that the computer vision algorithm be “trained” or taught how it will operate, which objects it should recognize. This procedure occurs by iterating through the computer recognition cycles, supervising the result by an operator (or in the case of unsupervised learning, training the algorithms with a huge number of labeled data). For this reason, computer vision is closely linked to machine learning.
Object recognition is probably the most important area of Computer Vision with many practical applications in different areas. Given an image, through the application of special algorithms it is possible to automatically recognize all the relevant objects, or concentrate on a single object by extracting the main features.
Defect analysis in the industrial environment
Automatic defect recognition systems to be applied for statistical purposes or in quality control systems.
Possibility to count moving objects. Whether it is car traffic, traffic of people or products on a conveyor belt, through an artificial vision system it is possible to obtain useful information for numerous applications.
The application of these technologies is much more practical than older methods (for example, using special hardware or a person who counts vehicle traffic) and can:
- recognize objects of interest;
- keep track of them as they move;
- determine if they enter or leave a specific region of interest;
- store the video (if necessary)
Request more information
Enter your details below to be contacted and receive more information on how to apply these technologies to your business.
[inbound_forms id=”1313″ name=””]