The Dream of Desoloing

Have you ever wanted to remove the vocals or lead instrument from a favorite song? Often I wish I could extract and remix a specific element of a song for my own musical daydreams. So regardless to say, this has been a life long dream for me. I have thought up all sorts of possible artistic uses.

So imagine my surprise when I found an separateLeadStereo open-source python script that will do just so! Now it is by no means a perfect extraction, but I find the algorithmic decisions and accordingly glitchy sound quite fascinating. I really enjoy how the solo/accompaniment affect each other by ducking out the volume against each other and certain shared frequencies subtly leak through. I am in love with sounds that are obviously aleatoric. Music that shows its roots in a state of entropy.

This process is known as Automatic Extraction of the Main Melody from Polyphonic Music Signals. Or a short nickname: desoloing.

Here are a few examples of songs I have processed.

But there is a catch… By my own tests and rough calculations, it takes 24 hours of rendering for every 45 seconds of audio you want desoloed (estimated with a quad 3.2 GHz processor). Its also a fairly RAM intensive process, with 5 minutes of audio filling up 4GB of RAM. So you should know what your getting into and choose your audio tests wisely.

UPDATE (2013-11-25) – Since I’ve written this post, I’ve processed around one hundred songs. Now a 4 minute song takes only 3 hours to render! That is a huge speed boost. The bottleneck was fixed by installing the MKL build of the NumPy library (Intel’s high performance Math Kernel Library). Also, by installing the 64-bit version of python and the required libraries, you can use much more RAM and process longer songs. On 8GB of RAM, the max song length is about 10 minutes.

You’ll want to choose songs where the lead vocal or instrument stand out from the rest of the music. But some interesting things can happen when the algorithmically-followed lead catches pitches from other instruments. So there are many happy accidents to be expected from this process and maybe even choosing a song where you have no idea what it will try and extract. Only 44.1khz wav’s allowed.

Some command-line experience would be very helpful. But if that foreign to you, well then I’m going to try my best and outline the required steps below. There are no required settings to setup, you just need to point the python script to the audio file you want to desolo.

I would like to thank Jean-Louis Durrieu for releasing his work to the open-source community. It is utterly fantastic and fascinating work. Bravo!




1. Install Python

2. Install these Python libraries – (make sure to download the MKL build of NumPy)

3. Now we need to tell Windows to initialize Python as a regular thing to load. This will allow us to use the command-line prompt much easier with Python.
Control Panel > System > Advanced > Environmental Variables > System Variables > Path
Add this to the end of the line     C:\Python27;
Click ‘ok’ and close all those windows.




1. Download and unzip the separateLeadStereo Python scripts directly into a folder on your desktop called ‘desolo’. Make sure the scripts are not nested within another folder. Also drop the music that you wish to process into this folder (WAV @ 44.1khz).

6. Open the start menu. In the start menu search box, type: ‘cmd’ and hit enter.
This is your command-line prompt, where all the rendering will be triggered from.

7. Now we need to navigate the cmd prompt to the desolo folder where the python scripts and wav are waiting for us.
Type: ‘cd desktop\desolo’ and hit enter.

8. Now for the last step, to tell it to render the wav!
Type: ‘python name-of-your-audio-file.wav’ and hit enter.

9. And now its rendering! It may not look like its doing anything for a while but have no fear and just wait. You will know when its done rendering when it says ‘Done!’. There is no percentage meter. Just make sure not to close the CMD prompt until its done.

The following two tabs change content below.


I'm interested in aleatoric music that involves some type of random chance and automated computer processing. I'm just curious to see what a powerful machine will run with when given an algorithm. But I love collecting and listening to all genres of music on some good speakers. Most of my time is spent working as a Science Visualizer at the Charles Hayden Planetarium.

Latest posts by Jason (see all)

Tagged , , , , , , , ,

5 thoughts on “The Dream of Desoloing

  1. Adam says:

    This is great! I’ve tried doing this before with less success. Does anyone know how to pull this off on a Mac? Josiah?

    • Jason says:

      Since I haven’t tried installing all the dependencies on my mac yet, I could give it a go and write up a tutorial. Let me know how interested you are.

  2. Ben says:

    Python comes already installed on the Mac OS. However, installing new libraries means you’ll need to do some googling and dive into some terminal stuff. I think there is a tool called pip that is used.

    Another way to “desolo” is by using Mid Side processing. There are plugins for your DAW that can isolate and pull out the center channel from everything else, which is usually right where the lead vocal is sitting. I’m not sure how similar (or not) this is to the separateLeadStereo algorithms, but it is possibly along the same lines. It also can be hit or miss and give some crazy results all depending on how the song was mixed and where instrumentation sits in the stereo field.

  3. Murthy says:

    It didn’t work for me and getting the following error.could you please help me solve this.
    Writing the different following output files:
    separated lead in taane_taane_lead.wav
    separated accompaniment in taane_taane_acc.wav
    separated lead + unvoc in taane_taane_lead_VUIMM.wav
    separated acc – unvoc in taane_taane_acc_VUIMM.wav
    estimated pitches in taane_taane_pitches.txt
    Some parameter settings:
    Size of analysis windows: 2048
    Hopsize: 256.0
    Size of Fourier transforms: 2048
    Number of iterations to be done: 30
    Number of elements in WM: 40.0
    Traceback (most recent call last):
    File “”, line 1195, in
    File “”, line 633, in main
    window=sinebell(windowSizeInSamples), nfft=NFT)
    File “”, line 123, in stft
    data = np.concatenate((np.zeros(lengthWindow / 2.0),data))
    TypeError: ‘float’ object cannot be interpreted as an index

    • Jason says:

      Make sure you are using Python 2.7.6. I don’t think the latest version of Python will work correctly with the separateLeadStereo script.

Leave a Reply

Your email address will not be published. Required fields are marked *