The below post is a lightly edited guest post from Result! Data, a Netherlands-based consultancy providing leading digital services.
The Roboflow team thanks Gerard Mol (Managing Partner) and Brand Hop (Chief Data Science) for their contributions. View their original post on detecting road signs with computer vision.
Result! Data has developed the app Spobber with which you can check assets in the field. Based on your location it shows the collected data about objects in that area and you can check the information that is in the database. You can add also new information to the database by taking pictures or by entering text and numbers in specified fields. The latest version also offers the possibility of mobile object detection. With object detection you can automatically check if the existing situation still matches how it is stored in the database. Objects may be missing, polluted or they may still be present in the data while actually not existing any more. The app can detect this and send warnings automatically.
We have built an application that detects hectometre signs that are placed along the Dutch highways and national roads. The hectometre sign is placed on every 100 meters on a highway and on every 500 meters along a national road. The text on the hectometre sign indicates where it is positioned, measured as a distance from the beginning of the road. It also indicates the name of the road and on which side you are (left or right). Hectometer signs are often used in notifications of incidents and accidents, so that rescue workers, maintenance contractors or someone to fix your car can easily find the location of the incident.
Hectometre signs are very typical and are easy to recognize. The downside is that the signs are quite small. Our goal is to recognize a hectometre sign driving by in a car with 100 kilometres per hour.
In this blog we describe the process to do object detection with the use of four existing state-of-the-art models, namely YOLOv3, EffiecientDet, YOLOv4 and YOLOv5.
We started with data collection and data preparation. A part of data preparation is data augmentation. This can help building a larger and more diverse dataset. After data augmentation the data needs to be prepared to be used as input for the modelling. The next step is then the modelling and testing of the models. When the models perform well, then a model can be applied to a new dataset. And in the last step the models are used on a webcam or a video.
For the collection of the images containing hectometre signs a single trip along the highway was done. A smartphone (iPhone X) was installed behind the windscreen of the car. The camera was pointing towards the roadside and with a slight zoom in. In this single run 400 images were taken, not all of them contained a hectometre sign, so these were removed.
We built an annotated dataset of bounding boxes around the hectometer signs, using LabelImg. LabelImg is a free labeling tool and can be downloaded from GitHub. In most cases the images contained only one hectometre sign and the training set used contained samples with one hectometer sign per image. It should be noted that there were images containing multiple signs, but for the sake of simplicity we did not use them.
Images are usually of different size and different orientation. To overcome this all images were resized to a square, small size. This also reduces training time.
In this singe run the conditions were quite stable. But when we train a model it is useful to perform different operations on the images to simulate different circumstances. In our case we have done cropping, rotation and adjustments in brightness, blur and noise. In total for every image 3 augmentations were done.
Data augmentation in computer vision is key to getting the most out of your dataset, and state of the art research continues to validate this assumption.
Image augmentation creates new training examples out of existing training data. It’s impossible to truly capture an image for every real-world scenario our model may be tasked to see. Thus, adjusting existing training data to generalize to other situations allows the model to learn from a wider array of situations.
When training models on a dataset it is possible to fit the model completely on the data, just by increasing the number of parameters. When this happens the model fits perfectly on the training data. When applied to a new dataset the performance is much lower. We have overfitted on the training set. To prevent over-fitting we have a training and a validation set in the training stage and a test set, which contains images that were kept out of the training stage. The result on the test set is a good indicator for the strength of the model.
This way we split our data in a training, a validation and a test set. Our training set consists of 765 images.
Object detection models are quickly getting better, both in performance and in speed. Until shortly, YOLOv3 (released April 8, 2018) has been a popular choice. YOLO (You Only Look Once) applies a single neural network to the full image. This network divides the image into regions and predicts bounding boxes and probabilities for each region. These bounding boxes are weighted by the predicted probabilities. Compared to earlier models YOLO is fast, faster than faster R-CNN. YOLOv3, however, has difficulty detecting small objects. Faster R-CNN detects small objects, but it fails to do detection real time, and that’s what we want.
This year the Google Brain Team presented EfficientDet (released April 3rd, 2020). EfficientDet is showing a good performance compared to the model size. The model is relatively small. On the COCO dataset it outperforms other ConvNet based architectures.
Shortly after that a new release of YOLO became available, YOLOv4 (released April 23rd, 2020), which has been shown to be the new object detection champion by standard metrics on COCO.
And even more recently (June 9) YOLOv5 was released. Compared to YOLOv4, v5 is even faster and light weight. We trained our model van v5 and the training was 20 minutes instead of 2 hours.
We have tested the four models on our hectometre signs dataset.
Training the models require significant computing power. GPU’s usually make training much faster. To make use of GPU’s we used Google Colab to train the models.
The real strength of a model is found in real life situations: how well and how fast does a model recognize hectometre signs in an image? We applied our models to a video in which we drove along the highway. The result of this is shown below for the YOLOv4 model:
The world of object detection moves fast. Yesterday YOLOv3 was state-of-the-art, then EfficientDet improved results for a short period, but was quickly overruled by YOLOv4 and shortly after that by YOLOv5. For us YOLOv5 is the winner at this moment. But it is always useful to have more models available depending on the circumstances and needed applications.
For custom objects it is relatively easy to build a new object detection model. A good training set is key to building a good model. It is important to build a diverse training set, with different cameras, different weather conditions and different angles, so that the model later is applicable to a wide range of images.
There is a wide range of models to use for object detection. The results of the detection are dependent of the input and the practical use of the model. Hectometre signs are relatively small compared to other objects in an image, but the new models can handle these objects relatively easy.
We will be following the developments in object detection very closely.
We will make object detection available in a mobile environment, so that live detection of objects is possible. Usage of object detection in the field makes it possible to act immediately on deviations or do updates on the status of objects.