What it takes to be an Artificial Intelligence Architect

Tracing my journey from a Deep Learning Engineer to an AI Architect

High-Performance Machine build | Image by author

Well well, we are back at it again. Last time, I narrated my journey as a Deep learning engineer, drawing parallels to the fictional landscape that envelops my consciousness. If it manifests in this universe, Destiny, had bigger mantles for me to hone — An AI Architect! Just designing and training neural networks isn’t enough to be a superhero in AI.🦸🏽You need to harness high-performance computation, lower latency, improve throughput, use complex data structures and scale on hardware. Time haggling, midnight code runs and a will made of steel is what defines a wizard of AI.

The when?

In March of 2020, the pandemic forced us to stay at home. At my organisation, we came up with the idea to create a product to help authorities enforce wearing masks and maintain social distancing at shopping malls, manufacturing facilities, factories and so on. Adapting to working from home and working on a new and never-before done application was a boss-level challenge. What we needed was to build a high-performance intelligent video analytics pipeline — with the ability to onboard several input cameras to run mask-social-distancing validation on embedded devices or GPUs.

An Intelligent Video Analytics pipeline usually refers to applications that utilise computer vision and deep learning, usually hardware accelerated, to gather insights and analytics from input video sources like CCTVs, cameras, videos, etc.

Building this required modules to be built — the computer vision models to detect people and masks, advanced object tracking mechanisms, algorithms to process this data, a pipeline that could run several input camera sources in parallel. The application had to do all that while keeping scalability and ease of deployment in mind.

Avenger now | Marvel Cinematic Universe

The moment had come when the folks at my organisation thought I was ready to be an architect, marking the beginning of my journey into designing one of the best computer vision AI platforms in the country!

The expedition of gaining skills

Iron Man Model Prime | Marvel Comics

If you read my articles (if you don’t, please do!), you already know how I draw inspiration from a lot of fictional characters. Iron Man — one of the most popular and iconic superheroes, is one of my main sources of inspiration. Who doesn’t want to be a narcissistic-genius-billionaire that constantly keeps building weaponised advanced prosthetics? Keeping his eccentric personality aside, the two things that intrigue me the most about Stark are his narcissism and the desire to constantly keep improving technology. The Mark III was already a technological masterpiece! That was no reason to stop and we see our favourite character creating at least fifty different iterations, constantly improving the technology powering his suit.

My expertise and focus lie mainly with Computer Vision with Deep Learning. Choosing this as a career path allows me to pick up bleeding-edge techniques and tinker with technology that few have gained mastery in.

Computer Vision is a branch of computer science that involves enabling machines to gain abilities similar to that of human visual cognition.

To architect high-performance solutions, keeping myself updated with the latest frameworks and developments in deep learning is pivotal.


These are one the most widely used acceleration frameworks used by researchers and state-of-the-art method creators. Being biased towards one framework is not allowed in my books. The open-source community is divided when it comes to using these frameworks. To rapidly develop prototypes and take things into production, knowledge of utilising only one of these may end up being a barrier.


This has to be one of my favourite toolkits for parallel computation. NVIDIA is arguably the best when it comes to accelerating Deep learning workloads. With the ability to write CUDA kernels, it becomes easier to utilise NVIDIA GPUs to run heavy workloads that require the horse-power to get it done in milliseconds.


Neural networks play an important role in computer vision. Most CV techniques now use neural networks as they have state-of-the-art performance. However, training neural networks is not the only job that needs GPUs; running inference at scale can benefit from some acceleration! With my work primarily focused on running inference on multiple cameras, knowing TensorRT to run inference at fp16/int8 precision and combining layers to improve performance is vital.


DeepStream is hands-down, the best Intelligent Video Analytics SDK available. It harnesses GStreamer to run the entire inference pipeline on an NVIDIA GPU, right from decoding the image data to displaying it on-screen. I have taken this up a notch by adding multi-model support and writing custom parsers/custom engine generators for popular networks like RetinaFace, OpenPose and YOLOv5.

Deepstream in action | Image by author


Building your AI application is one side of the coin. Serving it as an API is a different ballgame altogether. Being adept with libraries like Flask, Celery, Redis, RabbitMQ and Nginx helps in writing REST APIs quickly to serve your application for customers to utilise.


The most debatable topic in the AI community is the best programming language. In my opinion, both Python and C++ have their pros and cons. Python has a simpler syntax and a wide range of libraries to choose from whereas C++ gives you raw performance, especially on embedded devices. The ability to work with different data structures like dictionaries, maps and vectors are important — these make a difference when working on applications that might be processing millions of data points per second. Being an architect, mastery of both languages helps you quickly prototype and build production-ready applications at the same time.

Not an exhaustive list but this gives an insight into the vast number of SDKs and toolkits I had to add to my skill set to be able to construct commercially viable applications. Some of these were added to my expertise overnight, just like Stark and thermonuclear astrophysics. 😈

Becoming an NVIDIA Jetson AI Specialist

If you haven’t noticed already, I have chosen to work with NVIDIA’s software suite. It is only fitting to have a certification that they provide. Jetson AI Specialist is awarded to anyone who builds an open-source application that utilises the Jetson embedded platform. For this, I built an application that uses computer vision and deep learning to detect wildfires using reconnaissance drones. Hermes — is how I christened my application. I fancy greek mythology and like to associate my creations with some Greek god goodness! 🔥

NVIDIA Jetson Xavier NX | Image by author

Hermes consists of a YOLOv3 model trained on thousands of images of wildfires. I used a Ryze Tello drone as a reconnaissance device, utilising its camera stream as an input to my AI engine. If you would like to know more, here’s a link to my blog!

This helped me get certified; also earning me a spot on Jetson’s Community Project page.

The constant development of Open-Source projects

The open-source community is the reason why we all grow. It’s only fitting to give back to the community whenever I can. These projects help me build my skills — nothing like a relaxing session of coding where you build something new to get a rush! The weekend, some badass trap music, a couple of cans of energy drinks and I am all set! The feeling of listening to narcissism inducing music while coding is addictive — better than the hyperactivity brought upon by caffeine.

Here are a couple of examples.

Video Classification using 3D ResNets

Video classification involves constructing a 4D tensor which is a collection of images and running inference on them to predict what broad-spectrum action is occurring in the video. I used a model pre-trained on the Kinetics-400 dataset and accelerated using TensorRT. Here’s a video with some banger music for you to enjoy!

YOLOv4 with OpenCV and CUDA

Several newbies find it overwhelming to run computer vision applications using GPU acceleration. I aimed to reduce that gap by using OpenCV’s CUDA enabled dnn module.

What’s next?

There’s a long way left in this journey I have embarked on. My methods of motivation are deemed unconventional, but hey, as long as they work, they should be acceptable. 😎 I’ve trained the fastest computer vision models and built accurate data ingestion algorithms, all with finesse. However, I am far from my endgame; just one step closer to Jarvis. I envision a world where technology prevents wars, avoids nature degradation and makes cities safer! Until then, my only competition is me and iterating to get better is what I will be doing.

Why you may ask? Cause I am an Architect! *snap*