Breakbeat Manipulation with GANs

September 20, 2020

Breakbeats are percussion-only passages that are primarily sourced from samples of Funk and Jazz recordings from 1960s to 1980s. Electronic music producers often repurpose breakbeat samples by segmentation, resequencing and further manipulation using audio effects. A system for manipulating breakbeats using a generative adversarial network (GAN) is presented. A dataset of some popular breakbeats was collected and then augmented through time-stretching, pitch modification, and distortion to increase the number of training examples.

The figure above presents an overview of the method for breakbeat manipulation. The system learns the underlying probability distribution of a dataset compiled of segmented breakbeats. Information about the system architecture can be found here [1]. Synthesis is controlled by an input latent vector and condition that enables continuous exploration and interpolation of generated waveforms.

20 popular breakbeats (e.g., Amen, Funky Drummer, Think, Apache) were selected as training data, reduced to the length of 1-bar and quantised using a 16th-note resolution. The quantised breakbeats were then sliced into 16th-note note segments and labelled based on their position, resulting in 16 conditions and an 20 examples per condition. Labels are used to condition the system on an integer value that can be used to generate audio at the desired 16th-note location. To increase the size of the dataset, the individual breakbeat segments were augmented using techniques commonly used by music producers---that are pitch shifting, re-sampling and distortion. Each breakbeat is pitch-shifted to a tempo of 161.5 beats per minute and sampled at 44.1kHz, as each 16th-note segment conveniently has 4096 samples which is the nearest power of 4 to satisfy the symmetric structure of the generator and discriminator networks.

The model is trained using the WGAN-GP training strategy to minimise the Wasserstein distance between the training data distribution and generated data distribution for approximately 80,000 iterations with a minibatch size of 64 (~ 1 day) on NVIDA 2080ti GPU.

Audio Examples

Training Data

A selection of breakbeats from the dataset used in training.

"The Worm" breakbeat
"Cold Sweat" breakbeat
"Think" breakbeat
"Humpty Dumpty" breakbeat

Generations

A selection of breakbeats generated by the system.

Generated breakbeat 1
Generated breakbeat 2
Generated breakbeat 3
Generated breakbeat 4

Breakbeat morphing

A demonstration of the systems ability to morph between generated breakbeats by interpolating the latent space.

Breakbeat morphing demonstration
Breakbeat morphing in a composition
Interpolation 1
Interpolation 2
Interpolation 3
Interpolation 4

References

[1] Drysdale, J., M. Tomczak, J. Hockman, Adversarial Synthesis of Drum Sounds. Proceedings of the 23rd International Conference on Digital Audio Effects (DAFX), 2020.
@inproceedings{drysdale2020ads,
  title={Adversarial synthesis of drum sounds},
  author={Drysdale, Jake and Tomczak, Maciej and Hockman, Jason},
  booktitle = {Proceedings of the International Conference on Digital Audio Effects 
  (DAFx)},
  year={2020}
}

Help

Any questions please feel free to contact me on jake.drysdale@bcu.ac.uk


© 2022 Jake Drysdale, Built with Gatsby