In our work at DeepX we often have to address a problem of creating resource optimized and horizontally scalable machine learning pipelines.
Why the need for real-time processing
It is a simple requirement from the point of view of a business or a consumer. Indeed, video acquisition hardware, simply put, cameras, be it CCTV cameras, web cams, specialized tailor-build imaging hardware, smartphone cameras etc is the main source of visual data nowadays. More often than not these cameras are able to run continuously, 24/7 if needed.
Shouldn’t it be the job of Machine Vision software ‘agents’ to be able to monitor and process those video streams on the fly, so that humans don’t have to?
Yes but with caveats. Computer Vision typically processes each frame separately and needs considerable processing power to do so. Human brain is a ‘machine’ of an analogous nature as opposed to digital. This means human brain is allowed to make mistakes or leave some calculations unfinished as it has to catch up with a constant flow of new information. A machine learning based computer vision system, on other hand, has to finish the processing of each image without compromises. All the data available has to be processed.
Chasing the FPS
In cinema there has long been (and still is) a standard of 24 frames per second (FPS) which was commonly considered a minimal comfortable frame rate for the human viewer to regard the film as a real moving scene as opposed to a set of separate images. Since then the technology capabilities have improved which means that many TV shows, games and sport events stream at the higher rates of 30, 60 or even 120 FPS.
This poses a problem for machine learning engineers who constantly seek ways to find an optimal configuration around hardware limitations whilst keeping their ML / Computer Vision models running at the maximum capacity.
Importance of high resolution and lossless formats in Computer Vision
Indeed Machine Learning engineers want to have as much data as possible at the disposal of their models, and certainly not less. What seems to be like a great quality video to a human eye may be not so satisfactory for ML processing due to video codecs compression algorithms resulting in artefacts and other quality issues that can be noticed when a video is broken down into individual frames. Where possible, we set up the systems or ask our clients to set up the systems in a way to produce lossless videos. An example of such format is MJPEG video codec which is pretty much the same as a collection of high quality stand-alone JPEG photographs. This of course results in an increased load on the network, storage and the machine learning pipeline.
Image resolution is another factor. Human viewers like a good quality high resolution picture but otherwise a human won’t notice as much difference when processing a low res as opposed to high res image, unlike the machine learning model. For obvious reasons an ML model needs the more processing power (or processing time) per image the higher the amount of pixels is packed into that image. And Computer Vision problems need all pixels they can get as every pixel is a useful bit (or shall I say 3 bytes) of information.
Horizontal scalability challenge
Consider now that certain business has 100 CCTV cameras and needs all their data to be constantly processed in real time by a certain Machine Learning model, for security purposes, for example. A Machine Learning engineering team provided by a consultant has been able to achieve a satisfactory FPS rate in processing one of the video streams in a lab demo. It is now time to apply that technology and start processing all those 100 video streams. Intuitively it should be possible, at a cost of multiplying computing power by 100, to achieve the goal, right?
Now, at this point, experienced IT infrastructure and software engineers will start shaking their heads because the answer is not so simple. There are a number of issues here. Increasing the maturity of the technology solution so it transitions from a lab demo into a production ready solution that is able to run on its own for months, well, takes time and effort. After that is done, remember, the problem is only solved for 1 video source. You need to achieve the horizontal scalability in order to start processing 10, 20 or 100 sources in real-time. You may need it even earlier, for 1 video source too, if your ML model is only capable of processing X frames per second and X is less than Y required by the project. This could be solved by running multiple copies of your ML model in parallel, each taking a fixed amount of frames per second, jointly achieving the desired throughput. In other words, this is called horizontal scalability (the ability to run identical copies of certain processing modules of a system making it possible to increase throughput by adding more computational units).
Horizontal scalability in Machine Learning (as is in any other IT system) is a problem which requires significant planning and research investment in order to get it right.
By default, a new ML system coming out of lab / R&D effort would have a number of bottlenecks and busy intersections in its design. It is often a separate project or a phase, in other words, a major effort, often requiring going back to the drawing board and rebuilding the system from scratch (and not just once), to achieve operational horizontal scalability.
Most Machine Learning consulting companies focus on solving ML specific R&D problems and don’t have enough resources and experience to address horizontal scalability. This is a shame really causing many great advances in Machine Learning to fall short of reaching the productization where they could have created significant added value and improved the life and work of numerous users of the technology.
ML models optimization and data reduction
Thus, while businesses and consumers may want their 24-120 FPS multiplied by the number of cameras or sources they have, machine learning engineers may often be struggling to get even 1 FPS for one source during the R&D phase.
A typical scenario for R&D / solution seeking phase would be working with the maximum possible amount of data, not worrying about ML models size or complexity, and agreeing with long processing times for each frame. The main goal of ML team at this stage is to achieve the desired accuracy level. It is often considered ‘normal’ at this stage that it takes minutes to process one frame meaning FPS is much lower than 1. This may seem far away from the desired outcome.
Once the ML solution has achieved the desired accuracy level, however, there are a number of tools in a ML engineer’s arsenal helping to speed up the processing time such as:
- weight pruning
- model compression
- data reduction (convert to greyscale, reduce resolution, take less frames per second etc)
- client-side / edge processing
- software implementation enhancements (such, for example, as rewrite Python code in C language, better processing algorithms etc)
- better hardware (more powerful GPU cluster, more memory) or even specialized purpose-built hardware
In our practice we have often seen that we were able to bring a ML model working significantly under 1 FPS to high FPS real-time processing capability, or to create a lightweight version of a ML model capable of running on a CPU or a smartphone with similar throughput compared to its early version only capable to run properly on a powerful server-side GPU cluster. That said, there is a limit of how much optimization you can do. It all comes at a cost and using a Pareto’s law we can predict that there is around 80% of optimization that can be achieved by picking the lower hanging fruits, while the remaining 20% will become exponentially more difficult and expensive to achieve.
The importance of a tested, horizontally scalable Machine Learning pipeline
All this brings us to a simple fact that a battle-tested, polished, mature Machine Learning Pipeline is incredibly valuable to the business.
This is why we have spent so much time working on the core part of our DeepXHub product – the horizontally scalable machine learning pipeline, capable to ingress hundreds of video sources if required, with all aspects taken care of.
At DeepX we know this out-of-the-box solution is of great value for businesses that simply expect results without needing to appreciate the challenges involved, as well as for Machine Learning consulting companies who can remain focused on innovative R&D and problem solving, knowing that batch processing, horizontal scaling and reliable production ready uptime are taken care of by DeepXHub.