Convert ONNX Models to Hailo8L: Step-by-Step Guide Using Hailo SDK
- Adrian Araya
- Jul 29
- 8 min read

The Hailo-8L is a 13-TOPS AI accelerator designed for edge devices like the Raspberry Pi 5 AI Kit. If you have an ONNX model ready, the next step is to convert your ONNX model to Hailo8L format using the official Dataflow Compiler. In this guide, we’ll show you how to do that step-by-step, based on Hailo’s docs and a working YOLOv5 example for vehicle detection.
Along the way, we’ll cover key concepts like the difference between Hailo-8 and Hailo-8L, why conversion must happen on an x86 machine, and how model calibration works.
Hailo8L vs Hailo8: Key Differences
Before converting, it’s crucial to specify the target Hailo device. Hailo offers multiple chips, among those we have the Hailo-8 and the Hailo-8L. The Hailo-8L (used in the Raspberry Pi AI Kit and the Raspberry Pi AI HAT+) is a lower-power, 13-TOPS accelerator, while the Hailo-8 is a more powerful 26-TOPS variant. Why does this matter? Models compiled for one won’t necessarily run on the other or they will run at a different performance. In fact, models compiled for Hailo-8L will run on the more powerful Hailo-8, but the inverse won’t be necessarily true. This is because the Hailo-8L has certain resource constraints; a model compiled to fully utilize a Hailo-8 might not fit on the Hailo-8L.
Setting Up Hailo SDK on x86 for Model Conversion
Before starting the conversion process, it’s important to know that you cannot run Hailo’s Dataflow Compiler on a Raspberry Pi or other ARM-based boards. The full model conversion workflow (parsing, quantizing, and compiling) is only supported on x86_64 Linux machines.
Why? Two reasons:
The complete Hailo SDK is not available for ARM platforms, so tools like the Dataflow Compiler aren’t even installable on the Pi.
Model conversion is resource intensive (especially during calibration) and Hailo recommends having at least 32 GB of RAM to avoid memory errors during quantization.
Many developers hit a wall trying to compile on under-powered machines (if you don't have enough RAM, you might run into out-of-memory issues, especially with larger models). Always compile on a capable x86 machine, then deploy the resulting .hef file to your Hailo-8L target (e.g., Raspberry Pi).
Installing the Hailo SDK on Your x86 Machine
Before getting started, make sure you have your ONNX model file ready. In this guide, we’ll use yolov5m_vehicles.onnx as an example (a medium-sized YOLOv5 model trained specifically to detect vehicles, and available through Hailo’s Model Zoo). The conversion process is quite similar for other ONNX models, though if your architecture is highly customized, you might need to make a few additional adjustments. Once your model is in place, you’re all set to begin the conversion to Hailo8L format.
Converting an ONNX Model to Hailo8L Format (Step-by-Step)
If you prefer to jump straight into the code, we’ve prepared a ready-to-use Jupyter notebook that walks through the entire conversion pipeline using the YOLOv5m vehicle detection model. You can find it here.
Feel free to follow along in the notebook or continue reading below for an explanation of each step and what’s happening under the hood.
Now let’s go through the conversion pipeline. At a high level, converting an ONNX model to a Hailo8L-compatible HEF involves three main stages:
Parsing: Translate the ONNX model into a Hailo HAR (Hailo Archive) format (an intermediate representation of the network).
Optimizing & Quantizing (Calibration): Perform 8-bit quantization on the model using sample data, producing a quantized HAR (with int8 weights and activations).
Compiling: Compile the quantized model for the Hailo8L hardware, generating a .hef executable file.
Hailo’s tools can perform these steps either via a single command-line interface or through the Python API. We’ll outline the process and provide examples for both approaches.
Parsing ONNX Models with Hailo Dataflow Compiler
The first step is to load your ONNX file and let Hailo’s compiler parse it into its own format.
Using the Python API
Using the Python API, this is done with a method like translate_onnx_model which produces a HailoNets object and associated data. The parsing process looks like:
Note: The only required parameter is the model path; the rest can be inferred by the SDK. However, sometimes the SDK fails to find the input and output nodes. If this happens, you should analyze the neural network with tools such as Netron.
This code generates a .har file (Hailo Archive) from your ONNX model. The HAR is an intermediate format required for quantization and compilation. Saving it allows you to separate the parsing step from the rest of the workflow.
Using the CLI alternative
If you prefer not to write Python code for parsing, you can achieve the same via the command line. For example:
This command produces a yolov5m_vehicles.har targeting the Hailo8L.
Quantizing with Calibration Data for Hailo8L
Once your model has been parsed into a .har file, the next step is model optimization, this is where quantization happens. The goal is to convert the model to 8-bit integer precision so it can run efficiently on the Hailo8L hardware.
This process requires a calibration dataset: a collection of example images representative of the model’s expected inputs. These images help the compiler understand the dynamic range of the activations so it can quantize them correctly. No annotations are required for this dataset, only the images.
Tip: Hailo recommends using at least 1,000 calibration images. If possible, perform this step on a machine with a GPU and at least 32GB RAM, as it can be memory-intensive.
Preprocessing the Calibration Dataset using Python
Before running optimization, the calibration images must be resized and normalized according to the model's expected input format. Here's a Python example:
Defining the Model Script
To ensure proper normalization and postprocessing during inference, you should use a model script; the Hailo Model Optimization Tutorial defines a minimal model script. For YOLOv5 models, it might look like this:
This script tells the compiler how to normalize input values (dividing by 255), resize images, and apply non-maximum suppression (NMS) during inference using YOLOv5 post-processing. The Hailo Model Zoo repository contains several example scripts for different models.
Running the Optimization Using the Python API
With the .har file loaded, the calibration dataset prepared, and the model script set, optimization is as simple as:
Running the Optimization Using the CLI alternative
If you prefer not to write Python code to optimize the model, you can achieve the same via the command line. For example:
Where:
--default_model_script.all: specifies a file containing the model script, formatted as described below.
normalization1 = normalization([0.0, 0.0, 0.0], [255.0, 255.0, 255.0])
resize_input1 = resize(resize_shapes=[640,640])
nms_postprocess(meta_arch=yolov5, engine=cpu, nms_scores_th=0.2, nms_iou_th=0.4)--calib-set-path: path to the calib_set.npy file generated by the preproc function shown above. Alternatively, you can use the --use-random-calib-set option to use a random calibration dataset instead.
After this step, the model is quantized and ready for compilation into a .hef.
Compiling ONNX to HEF Format
The final step: turn the quantized model into a HEF file that can be deployed on the Hailo-8L device. Compilation takes the optimized (8-bit) model and maps it onto the Hailo hardware resources.
Running the Compilation Using the Python API
Running the Compilation Using the CLI alternative
If you prefer not to write Python code to compile the model, you can achieve the same via the command line. For example:
This will produce yolov5m_vehicles.hef in your working directory. Hailo’s compiler will log its progress in the console. Don’t be surprised if this is quite intensive, it might iterate many times to find the best allocation on the chip, printing lines like “Iteration #X – Contexts: Y” as it tries different mappings. For a model like YOLOv5m, dozens of iterations are normal. When it finishes, you should see a success message.
Congratulations, you now have a HEF model file compiled for Hailo-8L!
A quick recap of where we are:
We took yolov5m_vehicles.onnx and converted it into yolov5m_vehicles.hef targeting Hailo8L.
We used a set of calibration images to quantize the model.
All conversion steps were done on an x86 machine (not on the Pi).
Troubleshooting & Support
The above steps are straightforward for supported models, but real-world models can introduce challenges. You might encounter errors if your ONNX has unsupported operations or if the model architecture isn’t directly in Hailo’s Model Zoo. For example, you may need to manually specify certain layer names as the end nodes (especially for networks with custom output processing), or remove operations that the compiler can’t handle. If you get stuck, remember that you’re not alone, this process can get complex depending on the model. Double-check the Hailo documentation (the Dataflow Compiler user guide) for any hints, and don’t hesitate to ask for help. You can reach us at support@ridgerun.ai or engage with the friendly folks on the Hailo community forum for guidance.
Testing the Converted Model on a Hailo8L Device
Once you’ve compiled your model and have the .hef file ready, you can test it directly on your Hailo8L device using our example repository. We've prepared several ready-to-run GStreamer-based scripts that you can use to verify the model's behavior with different video sources.
Follow these simple steps:
1. Clone the RidgeRun.ai Example Repository
Start by cloning our repository:
This folder contains example scripts to run inference using the yolov5m_vehicles.hef model we’ve already converted for you.
2. Choose the Script That Fits Your Use Case
We’ve included several Bash scripts that launch different Gstreamer inference pipelines. Here’s what each one does:
Takes input from a camera connected to /dev/video0 and displays the result on screen.
Includes preprocessing to adapt the resolution to the model input resolution, postprocessing to match the original size, and object tracking.
Modify the device path if you're using a different camera.
Takes a video file as input and displays the inference results in real time.
Same preprocessing, postprocessing, and tracking as above.
Replace the video file path if you want to use your own footage.
Processes a video file and writes the output (with detections) to a new video file in .ts format.
Includes preprocessing, postprocessing, and tracking.
Use VLC to play the .ts file, or convert it to .mp4 if needed.
file2display_simple.sh
Displays a video file without any resolution adjustments or tracking
Output resolution will match the model’s (e.g. 640x640), not the input.
Good for quick testing or debugging.
file2file_simple.sh
Same as above, but writes the result to a file instead of displaying it.
3. Update the Script with Your Paths
Once you've chosen a script:
Open it in your preferred editor.
Locate the path to the .hef model and either:
Leave it as is to use the provided yolov5m_vehicles.hef, or
Replace it with the path to your own .hef file.
Also update the path to the postprocessing libraries (e.g. libyolo_hailortpp_postprocess.so). If you installed the Hailo SDK correctly, these libraries should be somewhere in your system. You can find them using:
Copy the path you find and paste it into the script where indicated.
4. Run the Script
After saving your changes, simply run the script from the terminal:
Or whichever script you chose.
Note: Take a few minutes to review the script logic, especially if you’re using a different model. You might need to tweak some parameters like:
Resolution
Thresholds
Batch size
Tracker params
Making these small adjustments can help ensure your model runs optimally.
🚀 Need Help Converting or Deploying Models on Hailo8L? We're Here for You
If you're working on AI at the edge and need support converting custom ONNX models, tuning your pipeline, or getting the most out of your system, RidgeRun.ai is ready to help.
At RidgeRun, we specialize in AI consulting services, helping teams bring high-efficiency inference to production with custom integrations, hardware tuning, and robust GStreamer pipelines tailored for your use case.
Reach out to us at support@ridgerun.ai — let’s unlock the full potential of your edge AI project.
