Computer Vision: A Step Closer to Skynet

Akash James
4 min readFeb 25, 2019


Don’t get intimidated by the title of this post, I don’t intend to take over the world with Artificial Intelligence (Debatable, actually). I’m sure most of you remember Skynet, a self-aware Artificially Intelligent agent, back from the Terminator series. Skynet was portrayed in the movie-series to have a sinister idea of eradicating humans from the earth by creating T-800s. Contrary to fictitious believes, AI will not conquer humanity, rather it will enhance our capabilities. AI is a by-product of our intelligence and will place human civilization higher on the Kardashev scale.

To mimic Human Intelligence, we need a machine with memory, processing power, godly algorithms, synthetic senses and cognitive capabilities. Let’s focus on the sense of vision that we are bestowed with by Apollo, the God of Light. (Yes, our vision is cent per cent helpless in the absence of visual spectrum of light). We constitute most of our cognition via our eyesight; Reading, writing, memorizing, recognizing objects, deciphering our environment and so on are done primarily by the use of our vision.

Cameras are all around us. They are just a synthetic version of our eyes. Our brain is responsible for making sense out of the electric impulses sent out by the optic nerve. This collectively is called the Visual Cortex. So, machines have a camera but no ‘Visual Cortex’. Finally, this is where the subject of Computer Vision drops by. Simply put, it is the process of acquiring, processing and analyzing digital images. Our journey to achieving Computer Vision has seen several iterations. Early algorithms focused on locating edges, finding colours and recognizing shapes. As the tasks got complex, the methodology improved. Skip forward to the present, we have techniques such as convolutional neural networks (CNN) that have vastly improved what a machine can understand from a camera feed. From mere algorithms, Computer Vision has moved on to employ Deep Learning techniques to get more out of digital images. CNNs are inspired by the visual cortex of cats. They consist of neurons connected in neural layers. Hurray to another nature-based synthetic spoof! These CNNs have enabled the capability of namely two functionalities: Image Classification and Object Recognition.

Image Classification

Image classification revolves around labelling an image. Let me take the most cliche example of all time, the Cat and Dog classification problem. The task is simple: in a given data set of cat and dog images, the machine should be able to accurately label the images according to the animal represented in it. This is achieved by using convolutional neural networks such as Inception (developed at Google Inc.) where a training dataset of cats and dogs is fed to the model and the model learns from these images.

How it learns, you may ask? The layers in the CNN are responsible for Feature Extraction and Feature Combination (called Feature Engineering, collectively) where each neuron is triggered just as they are triggered in our brain. Cool, isn’t it?

Object Recognition

What if there is both a cat and dog or maybe multiple cats and dogs in a single image? Image Classification is going to struggle with such a problem. The new objective is to localize the relative position of the cats and dogs in an image. This can be done by Object Detection algorithms such as Single Shot Detectors (SSD), Region Convolutional Neural Networks (R-CNN), RetinaNet and YOLO (You Only Look Once). These networks work by dividing an image into regions and performing predictions of these regions.

We, at Integration Wizards, leverage Computer Vision techniques to solve a myriad of problems faced by our customers. Some of them are as follows:

Intrusion Detection -

So there is an existing CCTV network deployed in a facility but only a single security personnel to monitor them all. Error-prone and obsolete methodology in my opinion. Here, using object detection models, we detect an intruder in the view of the CCTV and generate alarms.

PPE Compliance -

Compliance is of utmost importance in facilities of blue-collar labour. Accidents caused due to non-compliance can be catastrophic to an organization. Here, we developed a model to detect if a person has their safety equipment on using a CCTV network, all in real-time.

Fire Detection -

As the name suggests, we’ve developed a model to detect an instance of fire or fire-related-smoke.

Pose Estimation -

So detecting humans is easy. We are taking it up a notch and estimating the pose of a human. Clubbing this with RNNs and LSTMs, we can easily predict the action performed by a person, be it walking, running, leaning, bending, et cetra.

Inventory Management using Drones -

Ah, how can we not include drones in our arsenal? We’ve devised an inventory management system with an autonomous indoor drone that works with Ultra Wide Band (UWB) indoor tracking technology. The drone can batch-read QRCodes or Barcodes to perform inventory checks.

We live in an age where technology is rapidly pacing forward and we rather than trying to keep up, are trying to be at the forefront. Computer Vision is just one of the cognitive skills a machine possesses. We are closer than ever to achieving Skynet, yet there’s a long way left. I’ll keep churning out posts as often as I can. Until then, in classic Terminator style, I’ll be back :)



Akash James

Artificial Intelligence Architect at Integration Wizards | NVIDIA Jetson AI Ambassador & Specialist | Speaker | Hackathon Enthusiast