Resizing images is a critical preprocessing step in computer vision. Principally, our machine learning models train faster on smaller images. An input image that is twice as large requires our network to learn from four times as many pixels — and that time adds up. Moreover, many deep learning model architectures require that our images are the same size and our raw collected images may vary in size.
But all resize strategies are not created equally.
How small of an image is too small? If you have some images that are significantly larger than others, how should you handle the differences? If you need images to be square but have rectangles, should you stretch the images to a new shape? If you maintain the aspect ratio of the raw image to a resized image, how do you fill the newly created padding?
In fairness, there are not objectively correct answers for each of these questions in every situation. But there are tips that help based on our context.
How Small is Too Small?
There is not a magic set of dimensions. The good news, however, is starting smaller is generally easier than starting bigger. A good strategy to employ is progressive resizing. Our first set of models will be generally experimental. And we can save time by starting with smaller image inputs. Even better, we can use the learned weights from our small models to initiate training on our larger input models.
Progressive resizing is straightforward: train an initial model with very small input images and gauge performance. Use those weights as the starting point for the next model with larger input images.
Now, how small to start is a function of your specific problem. If you’re detecting objects or classifying images where the area of the distinguishing attributes is the majority of your captured images, downsizing is less likely to be as hindering to performance. Consider attempting a model as small as, arbitrarily, 80 x 80 before increasing input sizes.
What If My Images Vary in Size?
In circumstances where we do not control the specific camera being used to capture images for inference and deployment, we may find ourselves with images of various input sizes.
How varied in image size plays a crucial role. If a small handful (say, less than 5 percent) of overall images are dramatically misshapen compared to the rest and those images do not overwhelmingly represent a single class or other attribute, there may be a case for removing them from our dataset.
If we have varied image sizes but all within comparable aspect ratios (say, not image is greater than 50 percent larger on one dimension than any other image in the dataset), we should consider resizing to the smallest input variable.
Downsizing larger images to match the size of smaller images is often a better bet than increasing the size of small images to be larger.
If we increase small images to be larger, we stretch small image pixels. This can obscure our model’s ability to learn key features like object boundaries. There is active research on using generative techniques to intelligently create new pixels rather than stretch existing ones.
What If Images Need to be Square?
Converting images from rectangles to squares presents two options: either maintain the existing aspect ratio and add padding to the newly resized image or stretch a raw image to fill the desired output dimensions.
Let’s consider stretching an image to fit. If the aspect ratio of the input does not matter, stretching can be an ok way to make use of the most pixels fed to the network. However, this also requires that our production model receives comparably stretched images. (Said another way: if we teach our model that a very stretched out looking car is what a car looks like, we need to assure our model always sees very stretched out cars for identification.)
If we are keeping a consistent aspect ratio, we will need to check which raw image dimensions is greater, scale that dimension to be equivalent to the max dimension of our output, and modify the second dimension to scale proportionally. For example, if we’re rescalling 1000x800 images to be 416x416, the 1000 edge becomes 416, and the 800 becomes 332.8. The space between 332.8 and 416 becomes padding that we fill.
In general, it is safer to maintain the raw image aspect ratio and resize proportionally.
How Should I Fill Padding Pixels?
Padding refers to the pixels between our raw image content and the edge of the image we’re feeding to our network. In our aspect resize example, we’ve generated new ‘dead pixels’ between the edge of our proportionally resized image and edge of the square image.
Often, padding is filled with either black or white pixels. A third option exists: filling padding with a reflection of the image content. As is often the case, results vary, but the technique is particularly promising for some classification tasks.
Consider running small batch experiments of different types of padding, including reflection padding.
Roboflow supports resize in any matter in two clicks. Simply toggle “Resize” on, and select which option you prefer. Consult https://docs.roboflow.ai for additional details, or write us at firstname.lastname@example.org
Roboflow accelerates your computer vision workflow through automated annotation
quality assurance, universal annotation format conversion (like
PASCAL VOC XML to COCO JSON and
), team sharing and versioning, and easy integration with popular
open source computer vision models.
Get started with your first 1000 images for free.