This post is the inaugural post on a series of images on preprocessing and augmentation.
“Garbage in, garbage out.”
This old machine learning adage conveys a salient machine learning point: unless input data is of high quality, model accuracy — even with the best computer vision architectures — will suffer.
But what’s forgotten is how much control data scientists, developers, and computer vision engineers have over input data, even if not the principal agent collecting the data. What’s more: steps taken in the image input pipeline can turn what was once high quality data into lower signal-producing inputs.
This is not to say data quality is not an a priori concern. Striving to collect high quality data for the task at hand is always important. But there are instances where deep learning engineers may blindly apply preprocessing and augmentation steps that reduce model performance on the same data. And, even when having high quality data, preprocessing allows the best possible results to be obtained.
What Is Preprocessing? Augmentation?
Image augmentation are manipulations applied to images to create different versions of similar content in order to expose the model to a wider array of training examples. For example, randomly altering rotation, brightness, or scale of an input image requires that a model consider what an image subject looks like in a variety of situations.
Image augmentation manipulations are forms of image preprocessing, but there is a critical difference: while image preprocessing steps are applied to training and test sets, image augmentation is only applied to the training data. Thus, a transformation that could be an augmentation in some situations may best be a preprocessing step in others.
Consider altering image contrast. A given dataset could contain images that are generally low contrast. If the model will be used in production on only low contrast in all situations, requiring that every image undergo a constant amount of contrast adjustment may improve model performance. This preprocessing step would be applied to images in training and in testing. However, if the collected training data is not representative of the levels of contrast the model may see in production, there is less certainty that a constant contrast adjustment is appropriate. Instead, randomly altering image contrast during training may generalize better. This would be augmentation.
Knowing the context for data collecting and model inference is required to make informed preprocessing and augmentation decisions.
Why Preprocess and Augment Data?
Preprocessing is required to clean image data for model input. For example, fully connected layers in convolutional neural networks required that all images are the same sized arrays.
Image preprocessing may also decrease model training time and increase model inference speed. If input images are particularly large, reducing the size of these images will dramatically improve model training time without significantly reducing model performance. For example, the standard size of images on iPhone 11 are 3024 × 4032. The machine learning model Apple uses to create masks and apply Portrait Mode performs on images half this size before its output is rescaled back to full size.
Image augmentation creates new training examples out of existing training data. It’s impossible to truly capture an image that accounts for every real world scenario a model may encompass. Adjusting existing training data to generalize to other situations allows the model to learn from a wider array of situations.
This is particularly important when collected datasets may be small. A deep learning model will (over)fit to the examples shown in training, so creating variation in the input images enables generation of new, useful training examples.
What Preprocessing and Augmentation Steps Should Be Used?
Identifying the correct preprocessing and augmentation steps most useful for increasing model performance requires a firm understanding of the problem, data collected, and production environment. What may work well in one situation is not appropriate in all others.
Thus, considering techniques and why each may be valuable enables informed decisions. In this post, we’ll surface considerations and provide recommendations that are generally best. Again, there is no free lunch, so even “generally best” tips can be disproven.
Changing the size of an image sounds trivial, but there are considerations to take into account.
Many model architectures call for square input images, but few devices capture perfectly square images. Altering an image to be a square calls for either stretching its dimensions to fit to be a square or keeping its aspect ratio constant and filling in newly created “dead space” with new pixels. Moreover, input images may be various sizes, and some may be smaller than the desired input size.
Best tips: preserving scale is not always required, filling in dead pixels with reflected image content is often best, and downsampling large images to smaller images is often safest.
When an image is captured, it contains metadata that tells our machines the orientation by which to display that input image relative to how it is stored on disk. That metadata is called its EXIF orientation, and inconsistent handling of EXIF data has long been a bane of developers everywhere.
This applies to models, too: if we’ve created annotated bound boxes on how we perceived an image to be oriented but our model is “seeing” that image in a different orientation, we’re training the model completely wrong!
Best tips: strip EXIF orientation from images and ensure pixels are all ordered the same way.
Color changes are an example of image transformations that may be applied to all images (train and test) or randomly altered in training only as augmentations. Generally, grayscaling is a color change applied to all images. While we may think “more signal is always better; we should show the model color,” we may see more timely model performance when images are grayscaled. Color images are stored as red, green, and blue values, whereas grayscale images are only stored as a range of black to white. This means for CNNs, our model only needs to work with one matrix per image, not three.
Best tips: grayscale is fairly intuitive. If the problem at hand explicitly requires color (like delineating a white line from yellow line on roads), it’s not appropriate. If we’re, say, deciphering the face of a rolled set of dice, grayscale may be a great option.
Randomly mirroring an image about its x- or y-axis forces our model to recognize that an object need not always be read from left to right or up to down. Flipping may be illogical for order-dependent contexts, like interpreting text.
Best tips: for most real world objects, flipping is a strong way to improve performance.
Rotating an image is particularly important when a model may be used in non-fixed position, like a mobile app. Rotating can be tricky as it, too, generates “dead pixels” on the edges of our images and, for bounding boxes, requires trigonometry to update any bounding boxes.
Best tips: if an object may be a variety of different orientations relative to the captured images, rotation is a good option. This would not be true for, say, screenshots, where the image content is always in a fixed position.
Adjusting image brightness to be randomly brighter and darker is most applicable if a model may be required to perform in a variety of lighting settings. It’s important to consider the maximum and minimum of brightness in the room.
Best tips: fortunately, brightness is fairly intuitive as well. Adjust brightness to match conditions the model will see in production relative to the images available for training.
Adding noise to images can take a variety of forms. A common technique is “salt and pepper noise,” wherein image pixels are randomly converted to be completely black or completely white. While deliberately adding noise to an image may reduce training performance, this can be the goal if a model is overfitting on the wrong elements.
Best tips: if a model is severely overfitting on image artifacts, salt and pepper noise can effectively reduce this.
How Do I Apply Preprocessing and Augmentations?
Roboflow also supports one-click preprocessing and augmentation options as well as handling all annotation corrections required to keep bounding boxes accurate. It’s free to get started and you can use it with your models whether they're written in Tensorflow, PyTorch, Keras, or some other tool.
This post serves as a high level conceptual introduction to preprocessing and augmentation. In future posts, we’ll introduce code to perform various transformations. Sign up to be the first to know about new content.
Roboflow accelerates your computer vision workflow through automated annotation
quality assurance, universal annotation format conversion (like
PASCAL VOC XML to COCO JSON and
), team sharing and versioning, and easy integration with popular
open source computer vision models.
Get started with your first 1000 images for free.