How to Create to a TFRecord File for Computer Vision and Object Detection

TensorFlow expedites the machine learning process markedly. From abstracting complex linear algebra to including pre-trained models and weights, getting the most out of TensorFlow is a full-time job.

However, when it comes to loading data in ways that TensorFlow expects in order to perform as efficiently as it does, every developer will inevitably come across a troublesome TFRecord file format. As a serialized data format specific to the TensorFlow framework, TFRecords are both enigmatic and indispensable.

So, what are TFRecords, and how does one work with them effectively? How does one create a TFRecord from PASCAL VOC XML labels? Or what about COCO JSON to TFRecord for object detection? Does Keras require TFRecords too?

We’ll break it down in this post: the what, why, and how of TFRecords for computer vision.

Looking How To Convert COCO JSON or VOC XML to a TFRecord?

Roboflow does it for you in three clicks. Jump to the bottom of this post to see.

What is a TFRecord?

The TensorFlow documentation describes TFRecords succinctly:

The TFRecord format is a simple format for storing a sequence of binary records.

But why create a new file format? Especially one as seemingly complex as a TFRecord. TFRecords are opaque: as serialized files, one cannot easily open them in a text editor to inspect their contents. What’s the method for the madness?

Because they are serialized files, .tfrecord files aren’t recognized by common applications.

In introducing TFRecords, Google is playing a game of tradeoffs. While TFRecords are “incompatible” with other file readers, they come with few core benefits:

  1. As binary file formats, a TFRecord file takes up less disk space, which means every read/write operation that needs to be performed is faster.
  2. TFRecord files are optimized to handle component parts of a larger dataset. So, for example, if a given dataset exceeds the size of a given machine’s memory, streaming a subset of the dataset is easily done. This is exactly what happens when training on a single batch of data: the machine is using a subset of the overall data.
  3. TFRecord files are also optimized for stored sequenced data. This is particularly important for word or time sequences, similar to how data can be easily broken into component pieces and streamed.

All said, the tradeoffs of serialization do provide significant advantages in the form of faster training. Working with a TFRecord file does require a small learning curve.

How Do We Create a TFRecord File?

Intrinsic in creating a TFRecord file is knowing what is contained within a TFRecord file. If it’s a serialized representation of a dataset, how do we create that serialized interpretation?

While TFRecord files exist for any type of data -- tabular, text, time series -- this particular posts focuses on using TFRecords in the context of computer vision, and especially classification and object detection problems.

A TFRecord file contains our training data. In the context of deep learning, that often includes having both an annotation and an image. Images are encoded to integer representations. Annotations are encoded to describe where in an image a given bounding box is, and an integer representation of that bounding box’s class. Critically, because we are converting our annotations into integer representations as well, we need a dictionary that maps our integers back to our string value object names. This is what is called a label_map.pbtxt.

Creating TFRecord Files with Code

The TensorFlow documentation walks through a few image encoding examples. In all their examples, however, the images are not being converted from one format to another.

Most often we have labeled data in PASCAL VOC XML or COCO JSON. Creating a TFRecord file from this data requires following a multistep process: (1) creating a TensorFlow Object Detection CSV (2) Using that TensorFlow Object Detection CSV to create TFRecord files.

Dat Tran has published a great code example on converting VOC XML to a TensorFlow Object Detection CSV and into TFRecord files.

However, his generate_tfrecord.py file presumes the user is only working with a single class. A slight modification is required. The section labeled #TO-DO requires that the user match their label map against the class names, replacing “class_name_one” with their class name string in the script below.

"""
Usage:
  # From tensorflow/models/
  # Create train data:
  python generate_tfrecord.py --csv_input=data/train_labels.csv  --output_path=train.record
  # Create test data:
  python generate_tfrecord.py --csv_input=data/test_labels.csv  --output_path=test.record
"""
from __future__ import division
from __future__ import print_function
from __future__ import absolute_import

import os
import io
import sys
import pandas as pd
import tensorflow as tf

from PIL import Image
from object_detection.utils import dataset_util
from collections import namedtuple, OrderedDict

flags = tf.app.flags
flags.DEFINE_string('csv_input', '', 'Path to the CSV input')
flags.DEFINE_string('output_path', '', 'Path to output TFRecord')
flags.DEFINE_string('image_dir', '', 'Path to images')
FLAGS = flags.FLAGS


# TO-DO replace this with label map
def class_text_to_int(row_label):
    if row_label == 'class_one_name':
        return 1
    if row_label == 'class_two_name':
        return 2
    if row_label == 'class_three_name':
        return 3
    else:
        None

def split(df, group):
    data = namedtuple('data', ['filename', 'object'])
    gb = df.groupby(group)
    return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)]


def create_tf_example(group, path):
    with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:
        encoded_jpg = fid.read()
    encoded_jpg_io = io.BytesIO(encoded_jpg)
    image = Image.open(encoded_jpg_io)

    width, height = image.size

    filename = group.filename.encode('utf8')
    print(filename)
    image_format = b'jpg'
    xmins = []
    xmaxs = []
    ymins = []
    ymaxs = []
    classes_text = []
    classes = []

    for index, row in group.object.iterrows():
        xmins.append(row['xmin'] / width)
        xmaxs.append(row['xmax'] / width)
        ymins.append(row['ymin'] / height)
        ymaxs.append(row['ymax'] / height)
        classes_text.append(row['class'].encode('utf8'))
        classes.append(class_text_to_int(row['class']))

    tf_example = tf.train.Example(features=tf.train.Features(feature={
        'image/height': dataset_util.int64_feature(height),
        'image/width': dataset_util.int64_feature(width),
        'image/filename': dataset_util.bytes_feature(filename),
        'image/source_id': dataset_util.bytes_feature(filename),
        'image/encoded': dataset_util.bytes_feature(encoded_jpg),
        'image/format': dataset_util.bytes_feature(image_format),
        'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
        'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
        'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
        'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
        'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
        'image/object/class/label': dataset_util.int64_list_feature(classes),
    }))
    return tf_example


def main(_):
    writer = tf.python_io.TFRecordWriter(FLAGS.output_path)
    path = os.path.join(FLAGS.image_dir)
    examples = pd.read_csv(FLAGS.csv_input)
    grouped = split(examples, 'filename')

    # added
    file_errors = 0

    for group in grouped:
        try:
            tf_example = create_tf_example(group, path)
            writer.write(tf_example.SerializeToString())
        except:

            # added
            file_errors +=1
            pass

    writer.close()

    # added
    print("FINISHED. There were %d errors" %file_errors)

    output_path = os.path.join(os.getcwd(), FLAGS.output_path)
    print('Successfully created the TFRecords: {}'.format(output_path))


if __name__ == '__main__':
    tf.app.run()

Similarly, converting COCO JSON to TFRecord Files has an open source code solution, even officially maintained by TensorFlow available here.

Generating TFRecord Files With Three Clicks

Both of these methods are fairly tedious and have developers writing boilerplate code that every other developer is writing when handling computer vision data rather than focusing on the code unique to their application.

With Roboflow, you can generate TFRecord files from COCO JSON, VOC XML, LabelBox, and more with a few simple clicks.

Here’s how:

First, create a free Roboflow account. Free accounts are limited to smaller dataset sizes, and a credit card is not required to get started: https://app.roboflow.ai

A brand new Roboflow account.

Second, create a dataset by clicking in the upper right hand corner. Name your dataset whatever is apt, and describe the annotation class. (For example, a self-driving car dataset might say "obstacles" for annotation class.)

Third, drop your images and their annotations directly onto the blank upload space. Your annotations can be VOC XML, COCO JSON, LabelBox, or even TensorFlow CSV. (And if we're not identifying your data, open the live chat for support!)

Drop your images and annotations here.

Next, click "Start Uploading" in the upper right. You're given the option to create training, valid, and testing splits. These are option, and you can select what is most appropriate for your problem. If you do create splits, we will create one set of TFRecord files for each split (train, valid, and test).

Now, you make your way to your Modify Dataset – the "mission control" page for your dataset. You may apply one-click preprocessing and/or augmentation steps here. For our purposes, we can turn these options all off.

Then, click "Generate" in the upper right hand corner. Either give your generated dataset a name (perhaps "raw") or default to the timestamp.

Note: all options are off.

Once the dataset is generated, a version is created on the left-hand side. We are automatically directed to download this version of the dataset and select the format in what we want it downloaded.

From the dropdown menu, note that you can create any data format you may need: TensorFlow Object Detection CSV, COCO JSON, VOC XML, YOLOv3 Keras, YOLOv3 Darknet (text files), and, of course, TFRecord.

Export as TFRecord.

Select TFRecord. You may download the data locally to your computer as a zip, or show the download code. Showing the download code enables you to easily drop the data into a Jupyter or Colab notebook as you need.

And that's it! If you downloaded your files locally, note that Roboflow includes your TFRecord file, label_map.pbtxt file, and a README about the version of the dataset you downloaded.

It's that easy!


Want to be the first to know about new computer vision tutorials and content like our synthetic dataset creation guide? Subscribe to our updates 📬.

Roboflow accelerates your computer vision workflow through automated annotation quality assurance, universal annotation format conversion (like PASCAL VOC XML to COCO JSON and creating TFRecords ), team sharing and versioning, and easy integration with popular open source computer vision models. Getting started with your first 1GB of data is completely free.

Show Comments