Introducing an Improved Hard Hat Dataset for Computer Vision in Workplace Safety

Introducing an Improved Hard Hat Dataset for Computer Vision in Workplace Safety

In a given year, approximately 65,000 workers wearing hard hats incur head injuries in the workplace, of which over one thousand ultimately die. Workplace safety regulations exist to protect and promote wellbeing in the workplace, yet ensuring awareness and compliance can be a challenge. Even in cases of accidental negligence, an employee may find him or herself in a position where they were unaware that hard hats were required to be in a specific zone.

Computer vision more broadly has found a natural fit in helping capital intensive industries as monitoring equipment and conducting regular inspections can be quite costly. Extending these capabilities to look after individuals as much as equipment is a reasonable step to consider.

Introducing the Hard Hat Dataset

Originally collected and open sourced by researchers at Northeastern University in China (Xie, Liangbin, 2019), the hard hat dataset contains 7,035 images of workers in a wide array of situations that require protective hard hats to complete their day-to-day work. The dataset includes 27,039 annotations (approximately 3.8 per image) across three classes: those in a helmet (19,747), head (6,677), and person (615). The researchers released the dataset such that 5,269 of the images and their labels are in the training set, and the remaining 1,766 are in a validation set.

Notably, the Roboflow team did identify potential errors that were corrected before re-release. In image 000495.jpg, there is a single occurrence of an annotation called “others” and three individuals (that appear to be learning over) are labeled as such. The Roboflow team removed this annotation (but not the image as other labels were present)in our release of the dataset, as it appears to be a mistake resulting in a dramatically underrepresented class, decreasing training quality of the overall data.

Roboflow found a severely underrepresented class. We removed this class, but not this image.
This is the image where suspect annotations were found.

In addition to practical applications where this dataset may be intriguing, the dataset provides prime territory to assess and compare model performance. The images are of various shapes and sizes, there is significant class imbalance, and many of the annotations are particularly small (heads in broader scenery). These characteristics may make for worthwhile test of various model strengths and weaknesses, e.g. how poorly YOLOv3 performs on small bounding boxes.

We’re excited to make this research more open. Check out the full dataset on Roboflow Public here.

Want to be the first to know about new computer vision tutorials and content like our synthetic dataset creation guide? Subscribe to our updates 📬.

Roboflow accelerates your computer vision workflow through automated annotation quality assurance, universal annotation format conversion (like PASCAL VOC XML to COCO JSON and creating TFRecords ), team sharing and versioning, and easy integration with popular open source computer vision models. Getting started with your first 1GB of data is completely free.

Show Comments