Community, Engineering

The ShazamKit API – Potential Uses and Possibilities

Sep 06, 2021
tweeter facebook linkedin
The ShazamKit API – Potential Uses and Possibilities

Introduction to the Shazam App

Shazam as an application has been around for a while. Most of us have had the music-recognizing app in our pockets for the better part of the last decade. When Apple purchased Shazam Entertainment in 2018, a lot of people worried that the application would get taken off the App Store, but Apple kept the application available to all. They got to work in the background, integrating it’s functionality into their operating systems. 

At WWDC 2021, Apple, among other things, announced the iOS 15 SDK. This SDK contains ShazamKit, a new API to communicate between your apps and Shazam’s vast database of song samples. In this article we’ll go over just how to use this API to implement music recognition functionality in your apps and some potential use cases for this technology.  


The new ShazamKit framework is only available in the new XCode 13, which is only available via beta at the time of writing.  

Getting Started 

Open a new XCode project and select a SwiftUI lifecycle application. Name it anything you wish. Copy the app’s bundle ID. 

Creating Entitlements 

Head over to the Certificates, Provisioning, and Identifiers section and log in with your developer account. Click on the Identifiers section on the side column. Search for your app’s bundle ID in the list of available bundle IDs. If your app’s bundle ID isn’t available, click on the plus symbol and add a new app. Now, click on App Services and enable ShazamKit for your app. 


It is finally time to start coding. Create a new Swift file and name it ContentViewModel.swift.  

This will be our ViewModel (we’ll use an MVVM architecture) and will be responsible for the audio processing and will handle callbacks if necessary.  

Create a class called ContentViewModel and let it conform to NSObject and ObservableObject 

class ContentViewModel: NSObject, ObservableObject { } 

We make it conform to NSObject because we’ll have to extend this class later to conform to the ShazamAPI’s delegate. To conform to delegates, we have to make the parent class inherit from NSObject. Conforming this class to ObservableObject allows our View to subscribe to this class and listen to any changes in certain class variables which will then trigger a reload of the view with the new data. 

Creating ShazamMedia Object 

Here, we’ll create a struct that will contain the data that will be extracted from the delegate callback (the delegate will be discussed very soon). Create a struct in the same file as the class, but outside the class. Call it ShazamMedia and include these properties:  

struct ShazamMedia: Decodable { 

    let title: String? 

    let subtitle: String? 

    let artistName: String? 

    let albumArtURL: URL? 

    let genres: [String] 

These properties will match the data types that will be returned to us by the delegate. 

With the creation of the custom object, we can return to building out the ViewModel. 

Add two @Published properties to the class 

@Published var shazamMedia =  ShazamMedia(title: “Title…”, 

subtitle: “Subtitle…”, 

artistName: “Artist Name…”, 

albumArtURL: URL(string: “”), 

genres: [“Pop”]) 

@Published var isRecording = false 

A property in a class conforming to ObservableObject that is marked with the @Published property wrapper will cause a view, that’s subscribed to this class, to reload itself with new data. Therefore, anytime the published variables change in value, the view will reload. 

Next, create these properties in the ViewModel class: 

private let audioEngine = AVAudioEngine() 

private let session = SHSession() 

private let signatureGenerator = SHSignatureGenerator() 

audioEngine: An AVAudioEngine contains a group of connected AVAudioNodes (“nodes”), each of which performs an audio signal generation, processing, or input/output task. 

session: This is a part of the new ShazamKit framework. SHSession stands for ShazamSession that will be used to perform our audio requests to the server. 

signatureGenerator: SHSignatureGenerator provides a way to convert audio data into instances of SHSignature which is the required form of data that is required to be sent to the server for processing. Shazam will not process raw audio buffer data. 


Since we are conforming ContentViewModel to NSObject we’ll override the default init and call super. Also in this init method, we’ll set the delegate of the session object to self. 

override init() { 


   session.delegate = self 

Conforming to SHSessionDelegate 

Now that we’ve set the delegate of the session object to self, we have to conform the class to the SHSessionDelegate. Below the class, create an extension to the class that conforms the class to SHSessionDelegate. 

extension ContentViewModel: SHSessionDelegate { } 

The SHSessionDelegate has two available and optional methods. 

These functions work exactly how they sound like . For this tutorial, we’ll be using the didFindMatch function to retrieve the music’s metadata once Shazam identifies it. In the extension, call  

func session(_ session: SHSession, didFind match: SHMatch) { } 

The function provides us two values: session and match. We’re interested in the match value which contains the actual media items. In this function, first, fetch the mediaItems property of the match item. 

let mediaItems = match.mediaItems 

mediaItems contain an array of values since Shazam can find multiple matches for the same song signature. For this tutorial, we’ll only consider the first item in the list of mediaItems. 

if let firstItem = mediaItems.first { 

   let _shazamMedia = ShazamMedia(title: firstItem.title, 

subtitle: firstItem.subtitle, 

artistName: firstItem.artist, 

albumArtURL: firstItem.artworkURL, 

genres: firstItem.genres) 

   DispatchQueue.main.async { 

       self.shazamMedia = _shazamMedia 


Let’s quickly go through what’s happening here. We’re accessing the first element in the list of mediaItemsusing if let since fetching the first item in an array returns an optional (array can be empty). Once we have access to the first element, we’re creating a new instance of the custom ShazamMedia object that we created earlier. Then, set this new instance’s values to the @Published property of the class. This has to be done on the main thread since, as I mentioned, changing or setting new values to @Published properties makes SwiftUI perform reload on views that were subscribed to the class and all UI changes always have to happen on the main thread. Hence, we set the new value to the @Publlished property of the class using DispatchQueue.main.async. 

Configuring the microphone 

Info.plist key 

Since access to the microphone is a privacy sensitive feature, we have to first add in a new key into the Info.plist file of our project. Add the Privacy – Microphone Usage Description key and set the value to any string that you’d like to show to the user when asking for permission to access their device’s microphone.Page Break 

Back to ContentViewModel 

Create a new function called startOrEndListening() that will be called every time a button (that we’ll configure later in the view) will be tapped. This function will be responsible for setting up the audio engine to start recording, if the engine isn’t already running or to turn off the engine if it’s already running. To add this initial check, type in this code inside of the function. 

guard !audioEngine.isRunning else { 


   DispatchQueue.main.async { 

       self.isRecording = false 



We’ve created a guard clause that will only proceed into the rest of the function if the audioEngine wasn’t already on. If it was on, this function will turn off the engine and set the value of the isRecording property (marked with the @Publlished) to false indicating the microphone isn’t listening and therefore no processing of sound signature is taking place. 

Now, paste this code below the end of the guard statement. 

All this function does is set the audioSession to an active state, create an inputNode and a recording format. Once these are created, there’s a tap installed on the inputNode. The closure returns us a PCMBuffer that we’ll pass into the signatureGenerator and then start matching the session to the buffer. 

Outside of this closure, we prepare the audioEngine before starting it. Since starting the audioEngine can throw an error we wrap it in a do try catch block and handle errors by asserting with the error’s description. 

Once the do try catch block completes successfully, we can set the class variable isRecording to true to indicate the app is listening to audio and continuously matching the buffer with Shazam’s database. 

You can then create a view and simply call viewmodel.startOrEndListening() on a button click action.  

I do hope this short tutorial of the ShazamKit API has been helpful.  


Leave a Reply

Your email address will not be published.