How does Shazam works?

Reading Time: 2 minutes

   Have you ever been in a situation, where a random track starts playing on the radio, but you have no clue what is the name of the song? I can’t even imagine how many times that has happened to me or how many of my most favorite songs, I wouldn’t have known if the Shazam application didn’t exist. That made me think on how is that possible, that with the one tap the app is able to compare millions of songs and find the one you were looking for. In 2003 the cofounder of Shazam – Avery Li-Chun Wang published the research paper, in which he explained how it works. To put it in a nutshell audio fingerprinting is the answer.

Shazam Logo transparent PNG - StickPNG

   According to the research by Wang each file is fingerprinted. This is a process in which reproducible hash tokens are extracted. Whole process was analyzed by Trey Cooper Professional Musician and Web Developer. The fingerprint is shown as a spectrogram based on frequency and time. ,,For each section of an audio file, the strongest peaks are chosen and the spectrogram is reduced to a scatter plot. Through a process called combinatorial hashing, points on the scatter plot are chosen to be anchors that are linked to other points on the plot that occur after the anchor point during a window of time and frequency known as a target zone. Each anchor-point pair is stored in a table containing the frequency of the anchor, the frequency of the point, and the time between the anchor and the point known as a hash. This data is then linked to a table that contains the time between the anchor and the beginning of the audio file. Files in the database also have unique IDs that are used to retrieve more information about the file such as the song’s title and the artist’s name. Once we have all of the possible matches for the Shazam user’s recording, we need to find the time offset between the beginning of the Shazam user’s recording and the beginning of one of these possible matches from the database. This offset in timing can be calculated by subtracting the time of the anchor-point pair’s occurrence in the Shazam user’s recording from the matching hash’s time of occurrence in the audio file from Shazam’s database. If a significant amount of matching hashes have the same time offset, that song is determined to be a match”. In conclusion the song you are scanning with your phone is being fingerprinted, based on that your audio file is being compared to all the files in the data base and if the hashes match your song name and the author is being shown to you.

Sources:

https://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf

https://medium.com/@treycoopermusic/how-shazam-works-d97135fb4582

One thought on “How does Shazam works?

  1. 47524 says:

    an excellent example of large data being used to accomplish something that we all use day to day basis.

Leave a Reply