ARIN UPADHYAY | Embedded Systems & Low-Level C Developer
HOME | PROJECTS | BLOG | OTHER
07-09-2025
QMTIK
When most people think of neural networks, they imagine massive models trained on racks of GPUs, running in the cloud.
But what if you could train a neural network entirely on cpu and run it on a microcontroller?
That's exactly what the QMTIK (Quantized Model Training and Inference Kit) sets out to do.
It is a minimal, dependency-free implementation of a quantized neural network designed for embedded systems and resource-constrained environments.
By using 8-bit integers for both weights and activations and heap-less memory, it delivers the efficiency needed to deploy machine learning on devices with just a few kilobytes of memory.
Why Quantization
Normally, neural networks are trained and run with 32-bit floating point weights and activations.
But this is makes running the models on embedded hardware extremely diificult due slower hardware and small memory.
By quantizing everything to 8 bits, we get:
- 4x smaller models which are easier to store
- 2-4x faster inference
- Minimal accuracy loss (often <1%) if training is quantization aware
For example, on the MNIST digit recognition task, this kit achieves ~95% test accuracy with just 327 KB model size (vs ~1.2 MB for float32) and <1 ms inference time on a modern CPU, which is a ~14x speedup.
That's small and fast enough to run in real-time on many microcontrollers.
What is it good for
This kit is not meant to compete with cloud-scale AI. Instead, it shines in environments where:
- Memory is limited and/or hardware is slow (IoT devices, wearables, industrial sensors)
- Power is constrained (edge devices running on batteries or solar)
- Fast inference is needed (real-time applications)
- Determinism is required (real-time systems where malloc/free isnt allowed)
- Learning is the goal (understanding how quantized neural nets really work)
- Rapid prototyping (quickly test a small neural network idea)
Features
- INT8 weights and activations for low memory usage, small model and fast inference
- Adam optimization with batching
- Quantization-Aware Training to minimize accuracy loss
- Extremely configurable with custom network architecture, multiple activation, output, cost and learning decay functions and scaling factors
- No dynamic memory allocation
- No dependencies
This project is a step towards making AI practical on tiny machines.
Whether you are building an IoT project, experimenting with edge AI, or just curious about how quantized neural networks work, this project provides a clean, hackable foundation.