Edge AI: Running TensorFlow Lite on Microcontrollers

The future of AI isn't just in the cloud — it's at the edge. Running ML models directly on microcontrollers enables real-time inference without network latency, bandwidth costs, or privacy concerns.

Why Edge AI?

| Cloud AI | Edge AI | |----------|---------| | High latency (100ms+) | Ultra-low latency (<10ms) | | Requires connectivity | Works offline | | Ongoing cloud costs | Zero inference cost | | Privacy concerns | Data stays on device |

The Edge AI Pipeline

[Data Collection] → [Model Training] → [Quantization] → [Deployment] → [Inference]
     (Python)         (TensorFlow)       (TF Lite)        (C/C++)       (MCU)

Training a Simple Model

Start with a keyword detection model in Python:

import tensorflow as tf

model = tf.keras.Sequential([
    tf.keras.layers.Conv1D(8, 3, activation='relu', input_shape=(1000, 1)),
    tf.keras.layers.MaxPooling1D(2),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(16, activation='relu'),
    tf.keras.layers.Dense(4, activation='softmax')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy')
model.fit(train_data, train_labels, epochs=50)

Quantization: Shrinking the Model

Full-precision models are too large for MCUs. Quantize to int8:

converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.int8]

tflite_model = converter.convert()
# Model size: 2.1 MB → 89 KB

Running on ESP32

#include "tensorflow/lite/micro/all_ops_resolver.h"
#include "tensorflow/lite/micro/micro_interpreter.h"

const tflite::Model* model = tflite::GetModel(model_data);
tflite::AllOpsResolver resolver;

constexpr int kTensorArenaSize = 10 * 1024;
uint8_t tensor_arena[kTensorArenaSize];

tflite::MicroInterpreter interpreter(model, resolver,
    tensor_arena, kTensorArenaSize);
interpreter.AllocateTensors();

// Run inference
TfLiteTensor* input = interpreter.input(0);
memcpy(input->data.int8, sensor_data, input->bytes);
interpreter.Invoke();

TfLiteTensor* output = interpreter.output(0);
int prediction = output->data.int8[0];

Real-World Applications

Predictive maintenance — Vibration anomaly detection on motors
Voice commands — Keyword spotting without cloud dependency
Visual inspection — Defect detection on assembly lines
Environmental monitoring — Species identification from audio

Performance Tips

Use int8 quantization — 4x smaller, 2-3x faster than float32
Minimize model complexity — Fewer layers = faster inference
Profile on target — Don't rely on desktop benchmarks
Use hardware acceleration — ESP32-S3 has vector instructions for ML

Build your first Edge AI project in our Industrial IoT Architecture track.

Edge AI: Running TensorFlow Lite on Microcontrollers

Edge AI: Running TensorFlow Lite on Microcontrollers

Why Edge AI?

The Edge AI Pipeline

Training a Simple Model

Quantization: Shrinking the Model

Running on ESP32

Real-World Applications

Performance Tips

Related Articles

Getting Started with IoT: A Beginner's Roadmap

Building an Industrial IoT Pipeline with ESP32, MQTT & AWS

Docker & Kubernetes on Raspberry Pi: Edge Orchestration