Computer Vision as a field, in its primitive form, arose right in the early 1960s after the term Artificial Intelligence (AI) was coined at the Dartmouth conference. Image classification of cats was one of the initial challenging problems which researchers tackled in AI. With the advent of more excellent computational capabilities of machines and broader access to data, AI became prominent after the 1980s. So, presently, why is Computer Vision one of the most crucial domains of AI? The reason being, it has a tremendous number of applications relevant in day to day life across multiple diverse industries. Consider saving someone’s life on time when a person has been diagnosed with a fatal disease or being able to get those perfect fitting clothes without actually trying them; sounds interesting, right? Well, all of this is possible with the use of Computer Vision. Some more interesting examples are listed below.
Computer Vision is also one of the most demanding domains in AI. According to a report presented by ‘Research and Markets’, the value of AI in the Computer Vision market is estimated to increase by USD 35.4 billion in the next five years from the value of USD 15.9 billion in 2021. Therefore, considering the given statistics, there will be a promising rise in the need for Computer Vision engineers. So if you want to become one, don’t worry; this computer vision tutorial will provide a perfect outline for you to get started.
Introduction to Computer Vision Machine Learning
Computer vision deals with analysing the image/video data and providing computational capabilities to a machine using different machine learning algorithms. Abstractly, it involves imparting the skill of human-like visual inference to a machine. Knowledge of various aspects of computer science is required to get started with computer vision. This computer vision tutorial will cover your questions related to:
- Which mathematical concepts are needed?
- Which coding languages are popular?
- Which libraries and modules can come to your rescue?
- What frameworks and aspects of deep learning are essential?
- What are the applications of computer vision?
The chart provided below gives a brief overview of the required skillset to become a Computer Vision engineer.
Mathematical Concepts Needed to Learn Computer Vision
You might wonder why we need to know all the mathematics when we already have the libraries and in-built functions doing the magic for us. Although somewhere, you are right, a mathematical understanding of the underlying concepts is needed to dive deeper into the architectures and optimise them for improved performance. Also, the core fundamentals of all the machine learning, deep learning and computer vision algorithms comprise pure mathematics; hence it is essential to know the same.
Calculus
To understand the working of a neural network, one of the essential concepts is the backpropagation algorithm. The primary strategy used is the updation of gradients based on the value of the loss function. This concept requires a basic understanding of Differential Calculus, Partial Derivatives and Divergence/Convergence of a function.
Blurring
Image blurring also referred to as smoothening, is an essential step in any image processing application. Blurring is usually a preprocessing step when dealing with edge detection as it helps reduce the noise around the edges of the objects.
Mainly three types of blurring techniques are used – Median Blurring, Gaussian Blurring, and Bilateral Filtering.
1. Simple Blurring
The basic blurring simply involves averaging the pixel values of the image. For example, you can construct a custom 3*3 kernel consisting of ones and divide it with 9. Convolution of this kernel with the image will cause the blurring effect.
Code: Custom Blurring
Output:
2. Gaussian Blurring
This blurring technique uses a Gaussian filter. Instead of directly averaging all the values with a constant number, a weighted average will take place here. The image pixels’ distance from the kernel’s centre will determine their corresponding weight involved in the weighted average. Pixels nearby the centre have more weight as opposed to the farther pixels.
OpenCV provides the function ‘GaussianBlur()’, which primarily requires four arguments – src, ksize, sigmaX and sigmaY.
- src is the input image
- ksize is the size of the Gaussian kernel
- sigmaX is the standard deviation of Gaussian kernel in the horizontal direction; the default value is 0
- sigmaY is the standard deviation of Gaussian kernel in the vertical direction; the default value is 0
Code: OpenCV’s Gaussian Blur
3. Median Blurring
This kind of blurring involves the replacement of the pixel values in the original image with the median value of the pixels present in the area covered by the blurring kernel.
The function used is ‘medianBlur()’. It requires only the input image and the kernel size of the median filter as its arguments.
4. Bilateral Filtering
Blurring an entire image is not a good choice when information related to sharp edges needs to be preserved. In that case, bilateral filtering comes handy. It selectively blurs the image based on the similarity of pixel values in a neighbourhood. This filter contains the property of Gaussian filter, i.e. filtering based on the distance from the kernel centre and the pixel intensities present in a neighbourhood of the image. Hence, it helps to maintain the edge structure of an image.
OpenCV provides the ‘bilateralFilter()’ function. It primarily has four arguments – src, d, sigmaColor, sigmaSpace.
- src is the input image
- d is the value of diameter to be considered for pixel neighbourhood while filtering
- sigmaSpace determines the spatial distribution of the kernel (similar to Gaussian filter)
- sigmaColor determines the threshold for the difference between pixel intensities that could be allowed
-
Applications of Object Detection
Object Detection has a vast range of applications. Relevant to current times, you can use it to create a social distancing application that can detect masks and check the distance between people walking. Moreover, object tracking is crucial to surveillance systems. Gesture Recognition, Face Recognition, Vehicle Identification are some other real-life use cases of object detection.