One of the ‘500 new features‘ of Mango is Music Search, an option to have your WP7 handset hear to the music you’re listening to on the radio or in a store, and tell you what it is. This is essentially a built in service that will kill off the many third party Apps that do this such as Shazam. In addition the results will link you to the Zune Marketplace allowing you to instantly buy the song and download it to your handset.
The Windows Phone Blog had an interview with Elliot Kirk, who was responsible for testing the new feature, program manager Steve Cosman and lead programmer Houston Wong:
Q: Some apps in Marketplace can already do this—the identifying part, at least. How is Music search different?
Steve : Most other apps listen to a song for a fixed amount of time, and then analyze and try to match it. One of the things we do differently is we’re continuously listening and analyzing. As soon as we know what the song is, we return the result to you.
Houston: What this means is that you might actually get near instant results in the extreme case.
Q: That’s cool. How does Music search work?
Steve: We’re using the microphone to record and then doing something called ”fingerprinting,” where we look for unique acoustic features of the music. We listen for about 3 seconds, create a fingerprint, and then we send that fingerprint to Bing, which looks for a match in the Zune music catalog.
Q: You don’t transmit the actual audio?
Steve: No, we don’t—which means we’re using less of your data plan. And it’s generally quicker to use fingerprints since we’re matching against a very large data set of music: millions and millions of tracks.
Q: So someone has already scanned all the tunes in the Zune catalog and created a library of digital fingerprints for each song?
Steve: Yep, exactly. At that point it’s pretty much just a straight up search. We look at the fingerprint we’ve created on the phone, compare it to the millions of fingerprints generated from tracks in the Zune music catalog, and see what matches.
Q: How was Bing involved?
Elliot: This started as a Bing research project. They developed the fingerprinting algorithm. The Bing team has been amazing. It’s been a great experience working with them.
Q: Are some kinds of songs more challenging to match than others—say, all those covers of “Louie Louie” or samples of original songs embedded in other songs, like you find in hip hop?
Steve: It’s pretty interesting how we pick the right track. We’re working on that still. One problem is when you get, for example, the German karaoke version of Britney Spears’ “Toxic” instead of the multi-platinum U.S. album version. There’s also the situation where the identical hit song is on 25 different albums. How do you figure out which one to return? That’s another problem we deal with.
Elliot: But the fingerprint is actually getting good enough that we can identify the album version of a song from the live version—as long as that album is in the database.
Steve: No one plays a song the exact same way twice. Even your ear can’t detect the differences we can.
Coming in Mango: Tap the music note icon in Bing to start a new Music search.
Q: Are there other situations Music search finds challenging?
Steve: It’s fine if there are voices talking over the music. But if you sing along with a song, you might screw up the detection process.
Elliot: When you sing over a song you actually alter the timing of it, so the fingerprint we create doesn’t quite match the original.
Q: Sounds like this feature must have really been a challenge to test.
Steve [gesturing to Elliot]: His test stories are the best. There was this point where Elliot for about a week was running around to everybody’s office to find who had the quietest office for our tests.
Elliot: We wanted to know what the lowest and highest sound levels we could detect were. I also came in late on a Saturday evening hoping no one else was around to find out how loud we could go. It turns out to be around 120 decibels, which is almost as loud as a jet engine. I had to upgrade from PC speakers to a full 110-watt receiver with surround sound. I had my earplugs in and my fingers over my ears, and I just cranked it. That was fun. You could hear it all the way across the building.
Q: What song did you use?
Elliot: I think it was Britney Spears’ “Toxic”. It was painful.
Steve: I wouldn’t have admitted that if I were you.