Machine Learning Dataset Augmentation 101 - Part A

Ashwin A Raikar
4 min readMay 13, 2022

A general guide on augmentation types and tips to produce quality datasets

Introduction:

In this article, we will explain each augmentation type, their significance, and some tips to take care of while choosing a particular augmentation technique.

Object Detection Augmentation

Flipping

— — — — — — — — — — — — — — — — — — — — — — — — — — — —

A flipped image or reversed image is an image that is generated by a mirror-reversal of an original across a horizontal axis while a flopped image is mirrored across the vertical axis.

Types of flipping

Horizontal Flip: The image will be mirrored along the horizontal axis of the original image.

Fig 1. Original image — Horizontal Flip

Vertical Flip: The image will be mirrored along the vertical axis of the original image.

Fig 2. Original image — Vertical Flip

Tips:

Here are a few things to consider while using flipping:

  1. Usually, this helps double your original dataset, and if coupled with both types (H-flip 2x + V-flip 2x ) will Quadruple your image quantity!
  2. It is better not to use flipping in some dataset types → If you are training an OCR model, it won’t make sense if the characters are flipped.

Affine Transformations

— — — — — — — — — — — — — — — — — — — — — — — — — — — —

Affine transformation is a linear mapping method that preserves points, straight lines, and planes. Sets of parallel lines remain parallel after an affine transformation. The affine transformation technique is typically used to correct for geometric distortions or deformations that occur with non-ideal camera angles [1].

Types of Affine Transformations

Scaling

In computer graphics and digital imaging, image scaling refers to the resizing of a digital image.

Upscaling: is the process of stretching a lower resolution image onto a larger resolution.

Downscaling: is the process of stretching a higher resolution image onto a lower resolution.

Fig 3. Original image — Downscaled image — Upscaled image

Tips:

Here are a few things to consider while using scaling:

  1. Scaling is useful to resize images to specific network input dimensions.
  2. Upscaling usually results in the loss of features, based on the type of interpolation method used for upscaling images.
  3. Downscaling makes the image dense or compressed with features in a relatively small area. This is useful for data variability.
  4. If you want to resize images and preserve features without stretching the image another technique called Padding is used.

Rotation:

Fig 4. Original image —Rotation -30°— Rotation +30°

Tips:

Here are a few things to consider while using rotation:

  1. You can almost always use +/- 10° or 15° rotation safely without affecting your dataset
  2. If your model doesn't support rotated bounding boxes, you’ll need to verify the labels generated for such augmentations.
  3. Rotation angles must be chosen based on the type of object being detected → If too much rotation is used they might end up inducing false positives in the dataset. If you’re detecting persons they don’t usually stand upside down (180°) or at 75° angles (Micheal Jackson from the smooth criminal is an exception 😆).

Blur

— — — — — — — — — — — — — — — — — — — — — — — — — — — —

Blur is the most commonly used technique in general dataset augmentations. It makes the image less sharp, thereby smoothing lines and edges in the process.

Normal Blur: uniformly blurs the entire image with a specified filter size. As seen in Fig. 5 all the pixels are blurred.

Fig 5. Original image — Normal Blur

Gaussian Blur: is used to reduce image noise and detail. This is used to smoothen edges without losing many features in the image.

Fig 6. Original image —Gaussian Blur

Tips:

Here are a few things to consider while using blur:

  1. Blur strength must be minimal → To avoid smudging or erasing features
  2. Blurring is essential while creating an Optical Character Recognition Dataset, as often the captured data from the real world is smudged due to motion, shaky hands, or just low resolution of input devices.
  3. This will not only increase the dataset quantity but will also make a model more immune to real-world noisy data.

Summary:

— — — — — — — — — — — — — — — — — — — — — — — — — — — —

We discussed the following augmentation techniques,

  1. Flipping — useful for doubling the dataset
  2. Scaling — useful for resizing images to fit network inputs
  3. Rotation — Increase variability in datasets
  4. Blur — To accommodate real-world images in the dataset

In Part B we will discuss more techniques like padding, noise, e.t.c

--

--

Ashwin A Raikar

Artificial Intelligence, Computer Vision, Researcher & Developer