Creative Text-to-Audio Generation Via Synthesizer Programming
ICML(2024)
Abstract
Neural audio synthesis methods now allow specifying ideas in naturallanguage. However, these methods produce results that cannot be easily tweaked,as they are based on large latent spaces and up to billions of uninterpretableparameters. We propose a text-to-audio generation method that leverages avirtual modular sound synthesizer with only 78 parameters. Synthesizers havelong been used by skilled sound designers for media like music and film due totheir flexibility and intuitive controls. Our method, CTAG, iteratively updatesa synthesizer's parameters to produce high-quality audio renderings of textprompts that can be easily inspected and tweaked. Sounds produced this way arealso more abstract, capturing essential conceptual features over fine-grainedacoustic details, akin to how simple sketches can vividly convey visualconcepts. Our results show how CTAG produces sounds that are distinctive,perceived as artistic, and yet similarly identifiable to recent neural audiosynthesis models, positioning it as a valuable and complementary tool.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined