Data augmentation on-the-fly

We can define a function that applies data augmentation on the fly. Let’s assume that the images batches that we draw have the shape (sample, channel, height, width) and that we wish to randomly choose 50% of the images in the batch to be horizontally flipped:

# Define our batch augmentation function.
def augment_batch(batch_X, batch_y):
    # Create an array of random 0's and 1's with 50% probability
    flip_flags = np.random.binomial(1, 0.5, size=(len(batch_X),))

    # Convert to `bool` dtype.
    flip_flags = flip_flags.astype(bool)

    # Flip the horizontal dimension in samples identified by
    # `flip_flags`
    batch_X[flip_flags, ...] = flip_flags[flip_flags, :, :, ::-1]

    # Return the batch as a tuple
    return batch_X, batch_y

We can add our augment_batch function to our batch extraction pipeline by invoking the map() method like so:

# Construct an array data source from ``train_X`` and ``train_y``
ds = data_source.ArrayDataSource([train_X, train_y])

# Apply augmentation using `augment_batch`
ds = ds.map(augment_batch)

# Drawing batches of 64 elements in random order
for (batch_X, batch_y) in ds.batch_iterator(
        batch_size=64, shuffle=np.random.RandomState(12345)):
    # Processes batches here...