How to label your own computer vision dataset in CVAT.

CVAT screen recording: creating a bounding box for object detection.
Labeling docks, boats, and jet skis in CVAT for our aerial maritime drone dataset

In order to use modern computer vision technologies, we need supervise deep learning models with annotated data. In particular, if we want to apply computer vision techniques like object detection to a new dataset to detect our own custom objects, we need to gather images that include examples of these objects and then label them.

This post walks through the process to DIY when it comes to labeling your own custom computer vision dataset. At the end of the post, we point to take next steps to getting your computer vision model off the ground! You will be surprised with just how quick it can be.

I will be showing the steps that I used to annotate the public aerial maritime object detection dataset taken from a drone. Although a specific dataset is used, this post is meant to be a general guide on how to label an object detection dataset and how to use labeling tools for object detection.

What is CVAT - DIY labeling

CVAT is an OpenCV project to provide easy labeling for computer vision datasets. CVAT allows you to utilize an easy to use interface to make your annotations efficiently. CVAT is an open labeller, a free open source labeling tool, a free annotator, an image annotator, and of course a Computer Vision Annotation Tool.

In this post, we will be focusing on CVAT's ability to make object detection annotations on images, although, it has many more capabilities including, CVAT annotation tool for video, CVAT annotation tool for semantic segmentation, CVAT for polygon annotations, and so on.

CVAT is an annotation tool among a group of similar DIY labeling tools including LabelImg computer vision labeling tool. If you have a labeling project that will require a large amount of labeling beyond your own capabilities, you will want to look to an automated solution and leverage the power of the crowd through labeling services such as Labelbox, Scale, Hive, and many others in the data collection space.

That said, even if you have a large labeling task, we recommend trying to label a batch of images yourself (50+) and training a state of the art model like YOLOv4, to see if your computer vision task is already solved with current technologies.

CVAT Quickstart

If this is the first time you have encountered CVAT, then you want to start by launching the CVAT website, which is the quickest way to start labeling your data.

Once into the CVAT website, you will see a page like this:

CVAT Screenshot: Default Project Tasks
CVAT Master Task Page https://cvat.org/

Launch New CVAT Task

From there, you can launch a new task in CVAT and drag your images in for labeling. You are also prompted to specify the class labels of the objects that you would like to detect. Carefully specify these,

Once your data is uploaded, navigate back to tasks. From there, you will see a task page.

CVAT Screenshot: Task details
CVAT Task Page https://cvat.org/

Enter CVAT Labeling Job

You can create jobs to annotate this dataset and you will have automatically set up the CVAT labeling job when you created the task. Note the task/job semantic hierarchy.

Now you can click into your labeling task and get to work!

You will see the hyperlink for "Job" and "Old UI". They both link to the labeling screen. I prefer "Old UI" but maybe I am just an old dog and haven't learned the new tricks yet.

When you're in the labeling screen you will see the following.

CVAT Screenshot: Labeling screen.
Photo of an image in my labeling task at https://cvat.org/

Drawing Annotations in CVAT

You can click "Create Shape" and draw a box around the object you want your detector to detect. Then on the right hand side, you will see the color of the box that you just drew. You can choose among the class labels that you provided when setting up the task.

Exporting Annotations From CVAT

When you click "Open Menu" in CVAT you will see the following options:

CVAT Screenshot: Menu (Label, Boxes, Polygons, Polylines, Points, Cuboids, Manually, Interpolated, Total)
Menu from CVAT on my labeling task at https://cvat.org/

You want to click "Save Work". CVAT does not automatically save work. Then you want to click "Dump Annotation" and you can choose among different formats: label VOC XML, label COCO JSON, label YOLO annotations, etc.

Congrats! Now you have a labeled dataset.

CVAT on Local for Serious CVAT

If you are serious about CVAT, you can configure it on local. The CVAT website has these limitations:

  • No more than 10 tasks per user
  • Uploaded data is limited to 500Mb

On local you will not be subject to these limitations because your machine will be doing the heavy lifting.

To launch CVAT on local, first clone the CVAT repository in your terminal window.

git clone https://github.com/opencv/cvat.git
cd cvat

Then, if you don't have Docker, install Docker. See that Docker is sucessfully installed:

docker version

Now we build CVAT on local and launch with

docker-compose build
docker-compose up -d

This will take a while to run. It is building CVAT dependencies in your local machine.

Then you create your username within your local CVAT service by executing into it:

docker exec -it cvat bash -ic 'python3 ~/manage.py createsuperuser'

Now, navigate to your browser and type

http://localhost:8080/

This will navigate to your local CVAT!

You can come back later and restart the service. If you are having trouble logging into CVAT, you can rebuild with no-cache:

docker-compose build --no-cache 
docker-compose up -d

CVAT Labeling Tips, Tricks, Best Practices

When you're operating in CVAT, carefully annotate objects with your downstream model in mind. Keep these labeling best practices in mind while working through your dataset:

1) Label entirely around the object

2) For occluded objects - label them entirely

3) Generally label objects that are partially out of frame

4) Beware of labeling many boxes that overlap or are entirely contained within each other. This can really confuse your model.


CVAT shortcuts:

  • Start your labels list with the most represented class - it will be the default when you draw a box
  • Label all objects in each class first - you can focus on them and change all of their labels at once
  • Type "N" to draw a new box

Next Steps after Labeling Your Computer Vision Dataset in CVAT

Once your dataset is labeled in CVAT, it is time to move to the creation of your computer vision model!

Roboflow makes it easy to load in your data (just drag and drop your images and your annotation file from CVAT). You can generate even more data with augmentations such as flipping images for CV, random cropping, and creating synthetic computer vision data. If you are interested in using data augmentations to increase the number of your training images (to spend less time in CVAT), this is a good guide on using data augmentation in computer vision.

When you are ready, export your data from Roboflow to any format and start training your computer vision model. Our posts on How to Train YOLOv4 and How to Train EfficientDet are good starting points to train your model and then from model evaluation, you can gauge how much more data you may need to collect and annotate.

Output of model inference finding bounding boxes for docks and lifts in aerial photo taken from a drone.
Inference after training - our model is doing a pretty good job with only 74 aerial drone images!