Multi-key DRM: A balance between security and accessibility

Multi-key DRM: A balance between security and accessibility

This article provides brief technical documentation on how a security mechanism known as multi-key DRM operates, which is also implemented in Vidio.

Multi-key DRM is a security method for media streaming that involves encrypting separate video track groups (such as HD and SD) with different DRM keys and security levels. This approach enhances flexibility by allowing content to be played on a wider range of devices while maintaining varying levels of protection.

Without multi-key DRM, content providers must choose a single security level for all video tracks, limiting compatibility or compromising security. For instance, encrypting content with the highest security levels of two DRM systems might prevent playback on devices that don't support those systems. Conversely, using the lowest security levels of both systems ensures wider compatibility but exposes the content to increased piracy risks.

The dilemma of using traditional Single-key DRM

It is always in a content provider’s best interest to provide both the best possible viewing experience and the best possible security to its users. A common practice to achieve the latter is to incorporate DRM into premium video content as a protection mechanism against piracy or other illegal distribution activities.

DRM secures video content by encrypting its manifest using one or more systems like Widevine, Playready, Fairplay, or Clearkey. Some systems, such as Widevine and Playready, offer multiple security levels for content providers to choose from. For instance, Widevine provides three levels, with L1 offering the highest security but compatibility with fewer devices, while L3 provides the lowest security but widest device support. Widevine L3 uses software-based protection, suitable for devices without Widevine-certified CPUs, whereas L1 requires hardware-level security through a certified CPU.

Traditionally, DRM employed a single-key mechanism, limiting content providers to a single security level. This presented a challenge in balancing protection and accessibility. As a solution, multi-key DRM was developed.

Multi-key DRM enables content providers to apply multiple security layers from different DRM systems and levels, rather than relying on a single layer. Given that modern content often includes multiple resolutions (e.g., HD and SD), multi-key DRM can encrypt each resolution category with a distinct security level. This approach resolves the previous dilemma by allowing for the highest security level to protect the premium content (HD) for advanced users, while the lower security level safeguards the alternative version (SD) for a broader audience. As a result, content providers can effectively balance security and accessibility.

The structure of a multi-key DRM content

At the time of writing, multi-key DRM was exclusively available for DASH streaming using Widevine and Playready systems. DASH streams employ a playlist manifest format known as MPD (Media Presentation Description). This XML document, adhering to its specific schema, contains information about DASH segments, including timing, URLs, and media characteristics such as video resolution and bitrate. Client players consume this data. DASH is an adaptive streaming technology that offers multiple video resolutions, allowing clients to select the optimal option based on user bandwidth. Within the MPD manifest, each resolution is represented by a tag called Representation.

Figure 1. (Left) The highest level structure of an MPD manifest of a single key DRM content and (Right) on a multikey DRM content
Figure 1. (Left) An example of the top-level organization of an MPD manifest for content protected by single-key DRM and (Right) by multi-key DRM

The Representation tag is not exclusively used for video; it also encompasses other streamable media components, such as audio or subtitles. An Adaptation Set is a collection of multiple media components, each represented by a Representation tag. Additional Adaptation Sets can be included to accommodate other available media components, like subtitles or audio descriptions. For instance, one Adaptation Set might contain the primary video component, while another houses the primary audio component. Alternatively, there could be separate Adaptation Sets for HD and SD video components, along with a dedicated set for subtitles.

Typically, a standard media content includes a single Adaptation Set for video components, encompassing all available resolutions. Additionally, standard media content protected by single-key DRM incorporates one or more Content Protection tags. These tags specify the DRM system used to secure the content and contain a PSSH (Protection System Specific Headers) tag. PSSH is a standardized container holding metadata relevant to the employed protection system. While it does not contain the encryption key itself (it's a secret), it provides essential encryption information like the key ID, encryption scheme, and data required to obtain the key from a license server. PSSH is encoded as a Base64 string within MPD files.

Figure 2. The ContentProtection and PSSH within the MPD file
Figure 3. A sample of decoded information contained inside a PSSH value

DRM is applied during the packaging process. Packaging is the process of transforming raw video into segmented chunks and a manifest playlist for efficient HTTP-based internet delivery. This process involves a tool called a Packager. Shaka Packager, an open-source tool developed by Google, is a widely used example.

When applying DRM protection, a Packager encrypts content and embeds metadata within a container called a PSSH box. Multiple DRM systems can be applied to a single video, each with its own PSSH box. This box contains DRM system-specific data, such as the PlayReady Header for PlayReady-proteced content. DRM uses asymmetric encryption, meaning it uses two separate keys for encryption (public key) and decryption (private key). Packagers typically don't generate encryption keys but obtain them, along with a PSSH box, from a secure Key Service. This service, acted as a backend service that provides cryptographic material to the packaging process, creates encryption keys and corresponding PSSH boxes. The method for returning the PSSH box is defined by the Key Acquisition Protocol, like AWS SPEKE or Widevine Common Encryption. When a player encounters encrypted content, it extracts the PSSH box and uses its information to request the decryption key from another backend service called License Server. We'll explain about License Server in later section.

The creation of multi-key DRM manifest

As previously mentioned, DRM-enabled manifest creation occurs during the packaging process, the final stage of the transcoding workflow. The Packager receives the fully transcoded video stream as input. Before initiating packaging, essential information for key generation by the Key Service must be provided in a standardized format. CPIX (Content Protection Information eXchange Format), developed by DASH-IF, is a popular choice. This open specification facilitates content protection information exchange among different systems in a video streaming environment. CPIX offers a common structure for representing DRM-related data like keys, licenses, and encryption methods, streamlining workflows and minimizing errors. A CPIX document is an XML file containing elements for DRM system information, Key Identifier (KID), and License acquisition URL.

<?xml version="1.0" encoding="utf-8"?>
<CPIX xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns="urn:dashif:org:cpix"
  xmlns:ds="http://www.w3.org/2000/09/xmldsig#"
  xmlns:enc="http://www.w3.org/2001/04/xmlenc#"
  xmlns:pskc="urn:ietf:params:xml:ns:keyprov:pskc">
  <ContentKeyList>
    <ContentKey 
      kid="abcd1234-ef56-gh78-ij90-qwer0987asdf" 
      explicitIV="tmbbKNEhVytcjpIppDYXGq==">
      <Data>
        <pskc:Secret>
          <pskc:PlainValue>sfHHMaldQGZYfLdxEhUVRO==</pskc:PlainValue>
        </pskc:Secret>
      </Data>
    </ContentKey>
    <ContentKey 
      kid="fdsa7890-09ji-87hg-65fe-4321dcbarewq" 
      explicitIV="EIQBqDFIAQKwPZUZkfzbRu==">
      <Data>
        <pskc:Secret>
          <pskc:PlainValue>nMXPykwzARsyTBNpFWWoEL==</pskc:PlainValue>
        </pskc:Secret>
      </Data>
    </ContentKey>
  </ContentKeyList>
  ... 
</CPIX>
Code 1. An example of a simple CPIX document

The initial step is specifying the desired DRM system(s). Typically, multiple systems are employed (multi-DRM, not to be confused with multi-key DRM), as opposed to a single system (single-DRM). In a multi-key DRM setup, separate DRM system information is required for each video track group (HD and SD).

The second crucial piece of information is the Key Identifier (KID), a unique identifier linked to a specific encryption key. Devices use the KID to determine the correct decryption key, facilitating key management. The KID is generated before the transcoding process. There is no standardized format for KID, as long as it’s unique for each DRM application process.

The final input is the license manager URL, embedded in the PSSH for client-initiated license requests. However, this URL is optional, when the Key Service and License Server are provided by the same company (which is common), as the necessary information is automatically included in the returned data.

Figure 4. The process of obtaining encryption keys via CPIX request from a Key Service
Figure 5. The process of multi-key DRM manifest being generated by Packager. Showing here on the right, two PSSH values, each for different systems, and contained inside two separate Key ID for HD and SD tracks

After the necessary information is prepared, it is incorporated into a CPIX document and sent to the Key Service. The Key Service generates an XML response containing two KIDs (one for SD and one for HD tracks), two corresponding secret encryption keys, and an Initialization Vector (IV). An IV is a random or pseudorandom number combined with the encryption key to create the ciphertext. As a crucial component of the encryption process, the IV introduces variability, ensuring the uniqueness of each encrypted block.

Once the necessary information is obtained from the Key Service, the next step is to label the video streams. The process begins by taking video streams from the Transcoder. Each resolution has its own stream. For each resolution, we label it as either HD or SD. The value of the resolution which will become the limit of the HD or SD decision is sent when the DRM packaging process was initiated from the system and can be changed as needed.  

Finally, the last step is to pass it to the Packager alongside the already labeled video streams and other DRM-specific settings. The Packager then generates an MPD manifest incorporating the PSSH for each requested DRM system.

Specifying the security levels of the multi-key DRM content

As discussed in the previous section, multi-DRM content can be configured to use multiple levels of protection. This configuration is separate from the PSSH within the manifest created by the Packager and occurs during the process of decryption key retrieval.

Once the manifest is created, it's ready for consumption by the client player. However, before playback can begin, the content must be decrypted. This requires a decryption key. In a multi-key DRM content, two keys are needed: one for HD tracks and one for SD tracks. The process of obtaining these keys involves a specialized entity called a License Server. A License Server is the central authority in the DRM ecosystem, responsible for managing and issuing licenses that grant authorized users access to and decryption of protected content.

Figure 6. The high level diagram of a multi-key DRM-License retrieval

Figure 6 illustrates the decryption key retrieval process:

  1. The process starts with knowledge of the encryption used in the content, which is stored in the PSSH within the MPD manifest. Given an MPD manifest URL, the client player requests the manifest from the CDN. In the second step, the CDN returns the manifest, including the PSSH for both HD and SD tracks, as explained in previous sections. The information within the PSSH will be used in subsequent steps.
  2. The client player retrieves the manifest containing the two PSSH sets and stores this information on the device.
  3. The client player then makes another request, asking for device-related configuration information for the DRM used in the PSSH. This includes the levels of each DRM system and additional security options (e.g., HDCP). The request is sent to a server owned by the content provider, which is typically a separate entity from both the Key Service and License Server. By storing this configuration on a server, the content owner can modify it at any time without requiring forced updates for client players, especially those on mobile devices.
  4. The client player receives the license request, which includes a security policy.
  5. The client player sends the license request to the License Server.
  6. The License Server examines the security policy associated with the requested content and the client's device capabilities. This determines if the client is authorized to play the content and under what conditions (e.g., resolution, platform, location). Finally, the License Server generates the requested decryption keys and returns them within the DRM license.
  7. The client player retrieves the license, begins decrypting the media content, and ultimately plays it to the user.

To enhance security, encryption keys are often rotated periodically. This means that new keys are generated and used, while old keys are retired. This practice helps to mitigate the risk of compromised keys being exploited. The License Server or Key Service is typically responsible for managing key rotation.

Handling resolutions in the client player

Protecting content with multi-key DRM presents a unique challenge. Prior to implementing multi-key DRM, Vidio's client player would display all available resolutions for two separate video track groups within an MPD manifest. This exposed HD resolution options to users whose devices couldn't handle the highest level of DRM. If such a user selected an HD resolution, their player would fail to play the content.

To address this, we modified the client player to include a flag. When the app retrieves content from the backend, it requests detailed information before consuming the MPD manifest. This flag indicates whether the content uses multi-key DRM. If so, the player checks the device's DRM capabilities. If the device cannot handle high-level DRM, the player adjusts the resolution switcher to hide higher resolution options from the user.

Handling the legacy client app

At this stage, nearly all use cases are addressed. The manifest now employs multiple levels of DRM, and the client players can adaptively adjust resolution. All seems good. Except one more issue: the legacy client app, whose player lacks the logic to handle multi-key DRM resolutions (as discussed previously).

To resolve this, we modified the MPD manifest creation process. The MPD manifest received by the legacy app now only contains SD video tracks, preventing users with devices incapable of high-level DRM from accessing HD video tracks.

As mentioned earlier, MPD creation occurs at the end of the transcoding pipeline. While the MPD creation processes for Video on Demand (VOD) and Live Streaming differ slightly, they share a common implementation of an algorithm we called MPD Manifest Resolution Throttling.

MPD Manifest Resolution Throttling

This algorithm has a clear objective: given an MPD manifest and the maximum quality allowed for legacy clients, it parses the manifest's content to retain only one AdaptationSet containing Representations up to the specified maximum resolution. The Code 2. code snippet provides pseudocode for this process.

function parseMPDManifest(mpdManifest, maxResolution):
    # Read the contents of the MPD manifest XML
    doc = readMPDManifest(mpdManifest)

    # Query AdaptationSet tags and sort them by maxHeight
    adaptationSets = queryAdaptationSets(doc)
    sortedAdaptationSets = sortAdaptationSetsByMaxHeight(adaptationSets)

    # Find the maximum allowed height
    selectedMaxHeight = sortedAdaptationSets[0].maxHeight
    for adaptationSet in sortedAdaptationSets:
        if adaptationSet.maxHeight >= maxResolution:
            selectedMaxHeight = adaptationSet.maxHeight
            break

    # Remove AdaptationSets with higher maxHeight
    removeAdaptationSetsWithHigherMaxHeight(doc, selectedMaxHeight)

    # Remove Representations with higher maxHeight
    removeRepresentationsWithHigherMaxHeight(doc, maxResolution)

    # Find the maximum width of remaining Representations
    maxWidth = findMaxRepresentationWidth(doc)

    # Modify AdaptationSets with new maxWidth and maxHeight
    modifyAdaptationSets(doc, maxWidth, maxResolution)

    # Return the modified MPD manifest
    return doc
Code 2. The MPD Manifest Resolution Throttling

Here's a breakdown of the algorithm:

  1. Read the MPD Manifest: The function reads the contents of the MPD manifest XML file
  2. Query and Sort AdaptationSets: It queries the AdaptationSet tags within the MPD document and sorts them in descending order based on their maxHeight attribute.
  3. Find Maximum Allowed Height: The function determines the maximum allowed height by iterating through the sorted adaptation sets. If an adaptation set's maxHeight is greater than or equal to the specified maxResolution, it stores that value and breaks out of the loop.
  4. Remove AdaptationSets with Higher MaxHeight: It removes any AdaptationSet tags from the MPD document that have a maxHeight greater than the selectedMaxHeight.
  5. Remove Representations with Higher MaxHeight: It removes any Representation tags within the remaining AdaptationSet tags that have a maxHeight greater than the maxResolution.
  6. Find Maximum Width: It finds the maximum width among the remaining Representation tags.
  7. Modify AdaptationSets: It updates the maxWidth and maxHeight attributes of all AdaptationSet tags in the MPD document to match the calculated values.
  8. Return Modified MPD Manifest: The function returns the modified MPD manifest document.

Handling the request from client

Before implementing the MPD Manifest Resolution Algorithm throttling, we needed to ensure it only applied to requests from legacy client apps. All client apps have their version information embedded in the request header. When a client app requests an MPD manifest URL from the CDN, the backend checks this information by extracting it from the request header. The backend then parses the client app's version. If it's higher than the specified version, indicating the client player can handle multi-key DRM, the original MPD manifest is returned. Otherwise, if it's equal to or lower, the MPD manifest that has already been throttled by the MPD Resolution Throttling algorithm is returned instead.

Figure 7, The flowchart of MPD manifest URLretrieval for client app

The alternative workflow involves serving the Multi-Key DRM manifest dynamically. After the Packager finishes creating the manifest, the transcoding pipeline saves it in a storage. When a client app requests the MPD manifest, instead of providing a signed URL for a static manifest, a backend API controller retrieves the manifest content from storage. It then determines whether to return the original manifest or a throttled version based on the client's request headers. Whichever of them is chosen, it will be returned as an API response.

Vidio utilizes the former approach for VOD and the latter for Live Streaming.

Figure 8, The flowchart of an alternative approach of MPD manifest URLretrieval for client app

Conclusions and Verdict

Multi-key DRM offers an interesting solution for content providers seeking to balance security and accessibility in media streaming. By encrypting different video track groups with distinct DRM keys and security levels, this approach enables a more flexible and varied content protection strategy. It enhances flexibility by allowing content providers to offer varying levels of protection for different video tracks, catering to a wider range of devices and user preferences. It also improves security by protecting premium content with higher security levels while providing a more accessible option for users with less secure devices. Lastly, it ensures content can be played on a broader range of devices without compromising security.

However, it's important to note that multi-key DRM is not a silver bullet for content protection. While multi-key DRM enhances security, it's not entirely immune to potential attacks and vulnerabilities. The technology for security counter-measuring this multi-key DRM is also developing every day, which means that content providers must continuously evaluate and update their security measures to stay ahead of potential threats.

Despite these challenges, multi-key DRM remains a valuable tool for content providers seeking to strike a balance between security and accessibility. By carefully considering the potential benefits and drawbacks, content providers can make informed decisions about whether multi-key DRM is the right approach for their specific needs.