Data augmentation in computer vision is not new, but recently data augmentation has emerged on the forefront of state of the art modeling. YOLOv4, a new state of the art image detection model, uses a variety of data augmentation techniques to boost the models performance on COCO, a popular image detection benchmark that contains a wide range of images and object classes. We wrote a great post specifically about data augmentation in YOLOv4.

In this post, we discuss why you should use data augmentation on your own dataset and how to get started. We provide a hands-on example of how we used data augmentation to boost our models performance. The reality is that data augmentation is extremely important for sparse computer vision data sets, and we believe that augmentation is currently under utilized.

Chart: Performance Increase from Augmentation (Test Performance mAP vs Vanilla Training (75) and Augmented Training (87))
The performance increase from augmentation we will witness in this tutorial

The What the Why of Data Augmentation in Vision

A few months ago, we put together a great post explaining what preprocessing and data augmentation is in computer vision and why you should consider using them on your own data set. I will briefly revisit that ground here.

Preprocessing in Vision - Image preprocessing involves reformatting images before they are used in training and inference. You do this on all of the data set splits - training, validation, and testing. This include resizing, orienting, color corrections, etc.

What is Data Augmentation in Vision - Image augmentation involves altering the training images to generate a synthetic dataset that is larger than your original dataset and will hopefully improve the downstream performance of your model. Data augmentations include flipping, rotating, cropping, adding noise, occluding portions of the image, and more. In the past, we have written blog posts on a number of these including how to create a synthetic dataset for computer vision, how and why to use flip augmentation, the importance of blur augmentation, and why and how to use random crop augmentation. In our post on data augmentation strategies in YOLOv4, we introduce a number of occlusion techniques that are designed to improve training through regularization.

Examples of a photo of a hot air balloon being blurred to StDev 3 and StDev 10
Blur amounts via Wikipedia

Why Use Data Augmentation in Vision - augmenting your images creates a larger dataset that will better generalize to the situations that the model may encounter in production. Different augmentation techniques are more or less useful in different settings. For example, in a self driving car dataset, adding random noise may make the model more resilient to camera artifacts, but flipping vertically might not be useful since the road is never inverted.

Simple augmentation strategies can help fill in the gaps for a model. For example, models often localize object classes to guess that they only appear in the locations that appear in the training set. For example, in our tutorial example, we have one under represented class (platelets) that appears with the following heat map.

Roboflow Screenshot: Annotation Heatmap of Platelets (361 examples)
Annotation heatmap of under represented class in this tutorial

With a few flip and rotate augmentations, we should be able to make that picture look a lot better and more distributed.

Roboflow Screenshot: Annotation Heatmap of all BCCD classes (evenly distributed except the corners)
Annotation heatmap of all classes in this tutorial

More complicated augmentations like occlusion strategies are implemented to force the model to learn more about the whole object that is being detected and not just the low hanging fruit. For example, in detecting a dog, the model may converge on recognizing the dogs head as a useful characteristic. This convergence can be demonstrated with Class Activation Mapping (CAM). But what happens if that dog is in a picture with its head behind a bush? Below you can see in "Cutout", how the dog is now recognized by its back portion as well.

Image comparing neural network attention on images of a St Bernard and Poodle dog augmented using Mixup, Cutout, and CutMix.

Training and Evaluating an Object Detection Model

***Skip this section if you already have a training/evaluation pipeline

In order to get our hands dirty with image augmentation, we will need to train an object detection model. Then we will evaluate our model on our test set to determine if the augmentations we have made improved our models performance.

For this tutorial, we will use EfficientDet, a state of the art object detector published by Google Brain. For an in depth guide on training, see my post on how to train EfficientDet for object detection.

Now, I will provide a high level view of the training process. If you are eager to get right into the code, here is a Colab Notebook for using data augmentation in computer vision.

First, we will set up our training environment in Google Colab.

! git clone
! cd Monk_Object_Detection/3_mxrcnn/installation && cat requirements_colab.txt | xargs -n 1 -L 1 pip install

Then, we will downloaded data that we have augmented via an easy to use user interface. The download link will be generated below in the tutorial.


!curl -L [Your Data Link HERE] >; unzip; rm

We will do some folder restructuring in the notebook and then we will kick off training.

#some free compute from Colab, nice
gtf = Detector();

gtf.Train_Dataset(root_dir, coco_dir, img_dir, set_dir, batch_size=8, image_size=512, use_gpu=True)


gtf.Set_Hyperparams(lr=0.0001, val_interval=1, es_min_delta=0.0, es_patience=0)

gtf.Train(num_epochs=30, model_output_dir="trained/");

Then you will see a training output like so:

Terminal Screenshot: training epochs and loss values (Epoch 27/30, Total Loss 0.28062)
Training output in the tutorial

For evaluation, we will make inferences with our model and take the mAP of our model's predictions on the test set. See my post here for a deep dive on what is mAP. Here is the output!

52.62% = Platelets AP 
77.24% = RBC AP 
92.00% = WBC AP 
mAP = 73.96%

Not too bad for state of the art detection!

If you are like me, you can set up multiple Colab notebooks to run multiple experiments at once :D

How to Generate Data Augmentations for Vision

For our data augmentations, we will head over to Roboflow where we will find an easy user interface for uploading images, making augmentations, and then downloading our augmented images in any annotation format.

To get started, you can upload your dataset or you can fork one of the public datasets. Once your images and annotations are uploaded, the dataset view will show you the images along with bounding boxes like so.

Roboflow Screenshot: BCCD dataset example images and overview statistics (last updated, dataset size, annotation info)
Dataset Visualization on Roboflow

Then you will find a number of options that we can use to create an augmented dataset.

Roboflow Screenshot: Augmentation Options (Augmentations Per Image, Flip, 90-Degree Rotations, Random Crop, Random Rotation, Random Shear, Brightness, Exposure, Blur)
Augmentation options on Roboflow

For my experiment with blood cell detection - I can rule a few augmentations out by reasoning about the dataset and production environment that my model will find itself in. I don't need to vary "Exposure" because these images will be taken under consistent settings in the bloodstream. I also probably don't need to introduce blur because the images will not be taken with a camera that has focus issues. But some useful augmentations will definitely be flipping and rotating, since a cell could manifest itself in the image in any given direction.

Through a process of running experiments and thinking about my dataset, I came up with the following experiments to run.

1) vanilla - The original 364 images without augmentation

2) flip-rotate - flip vertically and horizontally and rotate 90 degrees. 3 augmentations per image.

3) flip-random-rotate - flip vertically and horizontally and randomly rotate up to 30 degrees. 3 augmentations

4) flip-sheer - flip the image and sheer it a little bit. 3 augmentations

5) flip-rotate-sheer - flip, rotate and sheer according to the above. 6 augmentations

6) flip-rotate-big - just flip and rotate. 6 augmentations

Ultimately I ended up with the following dataset versions.

Roboflow Screenshot: Several versions of the BCCD dataset generated with different preprocessing and augmentation settings.
Data versions in Roboflow

Results of Data Augmentations

Now on to experimentation!

In order to experiment with the model training process we have set up in this Colab notebook to test augmentations, we need to export our data in COCO json format.

***If your model requires another format, never fear, all formats are available to export.

Roboflow Screenshot: Download Dataset dialog (COCO JSON format, show download code)
Downloading data from Roboflow

Then you paste that link into the notebook where the dataset link is missing and hit run all cells!

The notebooks train for 30 epochs (about 10 minutes) and then infer (30 ms) on the augmented data. The inferences are passed into an evaluation script and I found the resulting mAP performances after augmentation.

Chart of Augmentation Results: mAP vs vanilla, flip-rotate, flip-rand-rotate, flip-shear, flip-rotate-shear, flip-rotate-big)

Now that is a big improvement from our baseline dataset! And we didn't have to collect more data or wait for the new state of the art image detection model to be released.

Inference screenshot detecting blood cells and platelets in the BCCD dataset.
Inference with our shiny new model in the tutorial notebook


Data augmentation is extremely powerful for computer vision tasks, especially if you have a sparse dataset that needs to be scaled without collecting new data.

In this post, we have provided a tutorial on how to use data augmentation on your own computer vision task.

Happy augmenting! 🙌

Drop us a line!

If you are attempting to use augmentation in your computer vision project and found this tutorial useful, we would love to hear what you are working on and help you get to the finish line.