YOLOv4-tiny has been released! You can use YOLOv4-tiny for much faster training and much faster detection. In this article, we will walk through how to train YOLOv4-tiny on your own data to detect your own custom objects.

YOLOv4-tiny is especially useful if you have limited compute resources in either research or deployment, and are willing to tradeoff some detection performance for speed.

How Does YOLOv4-tiny Compare to vanilla YOLOv4?

Comparing Evaluation Metrics

(YOLOv4-tiny performance metrics)

Performance metrics show that YOLOv4 tiny is roughly 8X as fast at inference time as YOLOv4 and roughly 2/3 as performant on MS COCO (a very hard dataset). On small custom detection tasks that are more tractable, you will see even less of a performance degradation. On the custom example in this tutorial, we see almost no degradation of performance as a result of decrease in model size.

Comparing Model Architectures

The primary difference between YOLOv4 tiny and YOLOv4 is that the network size is dramatically reduced. The number of convolutional layers in the CSP backbone are compressed. The number of YOLO layers are two instead of three and there are fewer anchor boxes for prediction. You can see the differences between the two networks for yourself in the config files:

If you are trying to detect small objects you should keep the third YOLO layer like yolov3-tiny_3l.cfg.

Installing Darknet Dependencies and Framework for YOLOv4-tiny

We recommend working through this post side by side with the YOLO v4 tiny Colab Notebook.

And many of the details in this post cross apply with the general How to Train YOLO v4, so that is a useful resource if you are searching for more in depth detail.

In order to set up our Darknet environment we need these dependencies:

  • OpenCV
  • Cuda Toolkit
  • GPU resources
  • cuDNN

Thankfully, Google Colab takes care of the first three for us so we need only worry about cuDNN.

To acquire cuDNN, we have to head over to the NVIDIA cuDNN download and download the linux cuDNN according to our cuda version, in this case 10.1.  Then we import the file into Colab - I do this from Google Drive but you could do it from anywhere. We would host this for general public but I'm sure NVIDIA would disapprove.

Once cuDNN has successfully installed, you will see this printout:


Colab Free Tier K80 GPU Note: the Makefile in this tutorial was built for the P100 GPU accelerator that is typically provision on Colab Pro. If you are on the Colab free tier, you might receive a K80 GPU, seen above with nvidia-smi. In that case, in the Makefile you will likely need to change the architecture specified. For K80:

ARCH= -gencode arch=compute_30,code=sm_30

And if you're building locally on a GPU that is not P100, you'll need to change your ARCH definition as well.

Then we clone the Darknet repository (we made some minor tweaks to configuration and print statements) and !make the Darknet program. If successful you will see a lot of strange printouts including:

g++ -std=c++11 -std=c++11 -Iinclude/ -I3rdparty/stb/include -DOPENCV `pkg-config --cflags opencv4 2> /dev/null

Then lastly, we will download the first 29 layers of the tiny YOLO to start our training from the COCO pretrained weights:

yolov4-tiny.conv.29 100%[===================>]  18.87M  16.0MB/s    in 1.2s    


Download Custom Dataset for YOLOv4 tiny

For our custom dataset in this tutorial we are using the public blood cell detection dataset Roboflow. If you would like to follow along directly, fork that dataset. Otherwise you can upload your custom objects in any annotation format. To get upload your data to Roboflow, create a free Roboflow account.

Need to label your data with bounding boxes? Open source solutions such as labeling with CVAT may be of use.

Once uploaded, we can choose preprocessing and augmentation steps. In this example we use auto-orient and resize to 416x416.

Roboflow screenshot: BCCD Dataset.
The settings I've chosen for my example dataset, BCCD.

In order to generate a dataset version we click Generate and then Download, choosing YOLO Darknet. This gives us a curl link that we can port into the Colab notebook for download.

Roboflow Screenshot: YOLO v3 Darknet Download.
Export as YOLO Darknet, and "Show Download Code."

Once we have zipped our download, we paste the curl link into the notebook and run it!

Downloading data from Roboflow - If you already have your data in Darknet framework you can skip through this step.

Then we write a little bit of code to write our obj.data file to point Darknet towards our data for training.

✅ All set!

Write Custom YOLOv4-tiny Training Configuration

Next we write a custom YOLOv4-tiny training configuration.

The important takeaway here is that the YOLO models slightly adjust network architecture based on the number of classes in your custom dataset. And the length of training should also be adjusted based on the number of classes.

Thus, we create the following custom variables based on our dataset:

  • num_classes
  • max_batches (how long to train for)
  • iteration steps
  • layer filters

And write them into the configuration file as directed by the YOLOv4 repo.

Train Custom YOLOv4 tiny Detector

Once we have our environment, data, and training configuration secured we can move on to training the custom YOLOv4 tiny detector with the following command:

!./darknet detector train data/obj.data cfg/custom-yolov4-tiny-detector.cfg yolov4-tiny.conv.29 -dont_show -map

Kicking of training:

YOLOv4-tiny training fast!

Approx. 1 hour training time for 350 images on a Tesla P-100.

We witnessed 10-20x faster training with YOLOv4 tiny as opposed to YOLOv4. This is truly phenomenal. YOLOv4 tiny is a very efficient model to begin trials with and to get a feel for your data.

As your model trains, watch for the mAP (mean average precision) calculation. If it is steadily rising this is a good sign, if it begins to deteriorate then your model has overfit to the training data.

Detect Custom Objects With YOLOv4-tiny from Saved Weights

When training has completed the darknet framework will drop backup/custom-yolov4-tiny-detector_best.weights where your model achieved the highest mAP on your validation set.

We can invoke these saved weights to infer detection on a test image:

!./darknet detect cfg/custom-yolov4-tiny-detector.cfg backup/custom-yolov4-tiny-detector_best.weights {img_path} -dont-show


YOLOv4-tiny inference on a test image - pretty good!

And the inference runs fast, blazingly fast:

test/BloodImage_00113_jpg.rf.a6d6a75c0ebfc703ecff95e2938be34d.jpg: Predicted in 3.131000 milli-seconds.

3ms, batch size 1, Tesla-P100!

From there, you can port the weights out of Colab for usage in your application, without having to retrain the next time.


Congratulations! Now you know how to train YOLOv4 tiny on a custom dataset. It trains very quickly and infers faster than pretty much any model out there.

Stay tuned for comparisons of YOLOv4 tiny to YOLOv5s.

You may enjoy also visiting training tutorials on how to:

And to learn more about the modeling: