evoaug_tf.augment#
Library of data augmentations for genomic sequence data.
To contribute a custom augmentation, use the following syntax:
class CustomAugmentation(AugmentBase):
def __init__(self, param1, param2):
self.param1 = param1
self.param2 = param2
def __call__(self, x: tensorflow.Tensor) -> tensorflow.Tensor:
# Perform augmentation
return x_aug
Module Contents#
Classes#
Base class for EvoAug augmentation for genomic sequences. |
|
Randomly cuts sequence in two pieces and shifts the order for each in a training |
|
Randomly mutates sequences in a training batch according to a user-defined |
|
Randomly inserts a contiguous stretch of nucleotides from sequences in a training |
|
Randomly deletes a contiguous stretch of nucleotides from sequences in a training |
|
Randomly applies a reverse-complement transformation to each sequence in a training |
|
Randomly add Gaussian noise to a batch of sequences with according to a use-defined |
|
Randomly inserts a contiguous stretch of nucleotides from sequences in a training |
|
Randomly deletes a contiguous stretch of nucleotides from sequences in a training |
|
Randomly cuts sequence in two pieces and shifts the order for each in a training |
|
Randomly applies a reverse-complement transformation to each sequence in a training |
- class evoaug_tf.augment.AugmentBase#
Base class for EvoAug augmentation for genomic sequences.
- abstract __call__(x, y=None)#
Return an augmented version of ‘x’.
- Parameters:
x (tf.Tensor) – Batch of one-hot sequences (shape: (N, L, A)).
- Returns:
Batch of one-hot sequences with random augmentation applied.
- Return type:
tf.Tensor
- class evoaug_tf.augment.RandomTranslocation(shift_min=0, shift_max=20)#
Bases:
AugmentBaseRandomly cuts sequence in two pieces and shifts the order for each in a training batch. This is implemented with a roll trasnformation with a user-defined shift_min and shift_max. A different roll (positive or negative) is applied to each sequence. Each sequence is padded with random DNA to ensure same shapes.
- Parameters:
- __call__(x)#
Randomly shifts sequences in a batch, x.
- Parameters:
x (tf.Tensor) – Batch of one-hot sequences (shape: (N, L, A)).
- Returns:
Sequences with random translocations.
- Return type:
tf.Tensor
- class evoaug_tf.augment.RandomMutation(mutate_frac=0.05)#
Bases:
AugmentBaseRandomly mutates sequences in a training batch according to a user-defined mutate_frac. A different set of mutations is applied to each sequence.
- Parameters:
mutate_frac (float, optional) – Probability of mutation for each nucleotide, defaults to 0.05.
- __call__(x)#
Randomly introduces mutations to a set of one-hot DNA sequences.
- Parameters:
x (torch.Tensor) – Batch of one-hot sequences (shape: (N, A, L)).
- Returns:
Sequences with randomly mutated DNA.
- Return type:
torch.Tensor
- class evoaug_tf.augment.RandomInsertion(insert_min=0, insert_max=20)#
Bases:
AugmentBaseRandomly inserts a contiguous stretch of nucleotides from sequences in a training batch according to a random number between a user-defined insert_min and insert_max. A different insertoins is applied to each sequence. Each sequence is padded with random DNA to ensure same shapes.
- Parameters:
- __call__(x)#
Randomly inserts segments of random DNA to a set of DNA sequences.
- Parameters:
x (tf.Tensor) – Batch of one-hot sequences (shape: (N, L, A)).
- Returns:
Sequences with randomly inserts segments of random DNA. All sequences are padded with random DNA to ensure same shape.
- Return type:
tf.Tensor
- class evoaug_tf.augment.RandomDeletion(delete_min=0, delete_max=20)#
Bases:
AugmentBaseRandomly deletes a contiguous stretch of nucleotides from sequences in a training batch according to a random number between a user-defined delete_min and delete_max. A different deletion is applied to each sequence.
- Parameters:
- __call__(x)#
Randomly delete segments in a set of one-hot DNA sequences.
- Parameters:
x (tf.Tensor) – Batch of one-hot sequences (shape: (N, L, A)).
- Returns:
Sequences with randomly deleted segments (padded to correcct shape with random DNA)
- Return type:
tf.Tensor
- class evoaug_tf.augment.RandomRC(rc_prob=0.5)#
Bases:
AugmentBaseRandomly applies a reverse-complement transformation to each sequence in a training batch according to a user-defined probability, rc_prob. This is applied to each sequence independently.
- Parameters:
rc_prob (float, optional) – Probability to apply a reverse-complement transformation, defaults to 0.5.
- __call__(x)#
Randomly transforms sequences in a batch with a reverse-compleemnt transformation.
- Parameters:
x (tf.tensor) – Batch of one-hot sequences (shape: (N, L, A))
- Returns:
Sequences with random reverse-complements applied.
- Return type:
tf.tensor
- class evoaug_tf.augment.RandomNoise(noise_mean=0.0, noise_std=0.2)#
Bases:
AugmentBaseRandomly add Gaussian noise to a batch of sequences with according to a use-defined noise_mean and noise_std. A different set of noise is applied to each sequence.
- Parameters:
- __call__(x)#
Randomly adds Gaussian noise to a set of one-hot DNA sequences.
- Parameters:
x (tf.Tensor) – Batch of one-hot sequences (shape: (N, L, A)).
- Returns:
Sequences with random noise.
- Return type:
tf.Tensor
- class evoaug_tf.augment.RandomInsertionBatch(insert_min=0, insert_max=20)#
Bases:
AugmentBaseRandomly inserts a contiguous stretch of nucleotides from sequences in a training batch according to a random number between a user-defined insert_min and insert_max. A different insertoins is applied to each sequence. Each sequence is padded with random DNA to ensure same shapes.
- Parameters:
- __call__(x)#
Randomly inserts segments of random DNA to a set of DNA sequences.
- Parameters:
x (tf.Tensor) – Batch of one-hot sequences (shape: (N, L, A)).
- Returns:
Sequences with randomly inserts segments of random DNA. All sequences are padded with random DNA to ensure same shape.
- Return type:
tf.Tensor
- class evoaug_tf.augment.RandomDeletionBatch(delete_min=0, delete_max=20)#
Bases:
AugmentBaseRandomly deletes a contiguous stretch of nucleotides from sequences in a training batch according to a random number between a user-defined delete_min and delete_max. A different deletion is applied to each sequence.
- Parameters:
- __call__(x)#
Randomly delete segments in a set of one-hot DNA sequences.
- Parameters:
x (tf.Tensor) – Batch of one-hot sequences (shape: (N, L, A)).
- Returns:
Sequences with randomly deleted segments (padded to correcct shape with random DNA)
- Return type:
tf.Tensor
- class evoaug_tf.augment.RandomTranslocationBatch(shift_min=0, shift_max=20)#
Bases:
AugmentBaseRandomly cuts sequence in two pieces and shifts the order for each in a training batch. This is implemented with a roll trasnformation with a user-defined shift_min and shift_max. A different roll (positive or negative) is applied to each sequence. Each sequence is padded with random DNA to ensure same shapes.
- Parameters:
shift_max (int, optional) – Maximum size for random shift, defaults to 20.
- __call__(x)#
Randomly shifts sequences in a batch, x.
- Parameters:
x (tf.Tensor) – Batch of one-hot sequences (shape: (N, L, A)).
- Returns:
Sequences with random translocations.
- Return type:
tf.Tensor
- class evoaug_tf.augment.RandomRCBatch(rc_prob=0.5)#
Bases:
AugmentBaseRandomly applies a reverse-complement transformation to each sequence in a training batch according to a user-defined probability, rc_prob. This is applied to each sequence independently.
- Parameters:
rc_prob (float, optional) – Probability to apply a reverse-complement transformation, defaults to 0.5.
- __call__(x)#
Randomly transforms sequences in a batch with a reverse-compleemnt transformation.
- Parameters:
x (tf.tensor) – Batch of one-hot sequences (shape: (N, L, A))
- Returns:
Sequences with random reverse-complements applied.
- Return type:
tf.tensor