Arin Upadhyay


        07-09-2025
        QMTIK
        
        
            When most people think of neural networks, they imagine massive models trained on racks of GPUs, running in the cloud. 
            But what if you could train a neural network entirely on cpu and run it on a microcontroller? 
            That's exactly what the QMTIK (Quantized Model Training and Inference Kit) sets out to do. 
            It is a minimal, dependency-free implementation of a quantized neural network designed for embedded systems and resource-constrained environments. 
            By using 8-bit integers for both weights and activations and heap-less memory, it delivers the efficiency needed to deploy machine learning on devices with just a few kilobytes of memory.
        
        Why Quantization
        
            Normally, neural networks are trained and run with 32-bit floating point weights and activations. 
            But this is makes running the models on embedded hardware extremely diificult due slower hardware and small memory.
            By quantizing everything to 8 bits, we get:
            

                4x smaller models which are easier to store
                2-4x faster inference
                Minimal accuracy loss (often <1%) if training is quantization aware
            
            For example, on the MNIST digit recognition task, this kit achieves ~95% test accuracy with just 327 KB model size (vs ~1.2 MB for float32) and <1 ms inference time on a modern CPU, which is a ~14x speedup. 
            That's small and fast enough to run in real-time on many microcontrollers.
        
        What is it good for
        
            This kit is not meant to compete with cloud-scale AI. Instead, it shines in environments where:
            

                Memory is limited and/or hardware is slow (IoT devices, wearables, industrial sensors)
                Power is constrained (edge devices running on batteries or solar)
                Fast inference is needed (real-time applications)
                Determinism is required (real-time systems where malloc/free isnt allowed)
                Learning is the goal (understanding how quantized neural nets really work)
                Rapid prototyping (quickly test a small neural network idea)
            
        
        Features
        
            

                INT8 weights and activations for low memory usage, small model and fast inference
                Adam optimization with batching
                Quantization-Aware Training to minimize accuracy loss
                Extremely configurable with custom network architecture, multiple activation, output, cost and learning decay functions and scaling factors
                No dynamic memory allocation
                No dependencies
            
        
            This project is a step towards making AI practical on tiny machines.
            Whether you are building an IoT project, experimenting with edge AI, or just curious about how quantized neural networks work, this project provides a clean, hackable foundation.