AudioCraft: A Generative AI Tool For Audio and Music

Imagine a professional musician being able to explore new compositions without having to play a single note on an instrument. Or a small business owner adding a soundtrack to their latest video ad on Instagram with ease.

That’s the promise of AudioCraft — Meta’s latest AI tool that generates high-quality, realistic audio and music from text.

AudioCraft consists of three models: MusicGen, AudioGen and EnCodec. MusicGen, which was trained with Meta-owned and specifically licensed music, generates music from text prompts, while AudioGen, which was trained on public sound effects, generates audio from text prompts.

Meta releases an improved version of the EnCodec decoder, which allows higher quality music generation with fewer artifacts. They are also releasing the pre-trained AudioGen models, which let you generate environmental sounds and sound effects like a dog barking, cars honking, or footsteps on a wooden floor. And lastly, they’re sharing all of the AudioCraft model weights and code.

The AudioCraft family of models are capable of producing high-quality audio with long-term consistency, and they’re easy to use. With AudioCraft, we simplify the overall design of generative models for audio compared to prior work in the field — giving people the full recipe to play with the existing models that Meta has been developing over the past several years while also empowering them to push the limits and develop their own models.

AudioCraft works for music, sound, compression, and generation — all in the same place. Because it’s easy to build on and reuse, people who want to build better sound generators, compression algorithms, or music generators can do it all in the same code base and build on top of what others have done.

Learn more about AudioCraft on AI blog.