As of version 5017 of GPAC, I’ve added a first preliminary support of storage of TTML in MP4. The support is according to the ISOBMFF spec (MPEG-4 Part 30, to appear) and to the DECE CFF format. This post describes why and how to use this implementation.
In short, TTML content is XML content describing subtitles. Typically, in an HTML 5 environment, the browser would retrieve a video file and a TTML file separately. However, in live scenarios, when the subtitles are edited in real-time or when streaming multiple presentations with different subtitle documents, one wants to be able to package multiple XML documents in a stream, with timing information indicating when each document should be played. Interestingly, one might want to package the subtitle documents together with the audio/video streams. That is what the storage of timed text amendment to the ISOBMFF was made for.
As described in my earlier post, in ISOBMFF, each TTML document is stored as an independent and Random Access Points (RAP) sample of an XML Subtitle Track. The track header indicates what the namespace of the documents are (e.g. TTML, SMPTE-TT, EBU-TT). Each sample may contain also images, but this is not yet supported in GPAC.
I’ve added the ability to import XML documents into an MP4 using GPAC’s MP4Box. The following command line shows how to add TTML documents to the file.mp4 file creating an english stream, using an intermediary file called file.nhml:
MP4Box -add file.nhml:lang=en file.mp4
The NHML format used for the NHML file is a feature of GPAC. It helps describing how MP4Box should import media data into an MP4 track. Full documentation is available here. An example of NHML file for TTML streams is given below.
<?xml version="1.0" encoding="UTF-8" ?>
<NHNTStream version="1.0" timeScale="1000" trackID="1" mediaType="subt" mediaSubType="stpp" width="800" height="600" parNum="1" parDen="1"
<NHNTSample DTS="0" isRAP="yes" mediaFile="Not_Your_Fathers_Captions.ttml" duration="72000"/>
In this example, a track is declared with a trackID 1, of type “subt” (for Subtitle), and subtype “stpp” (for XMLSubtitle), the namespace is given in the “xml_namespace” attribute. Each TTML document is declared with an element NHNTSample indicating its start time “DTS” (in timeScale units), its RAP status (always yes), the file to use to build the sample (“mediaFile”). You can have as many NHNTSample as you like. Note that the last sample should have a duration attribute. For other samples, the duration is computed as the difference between the sample’s DTS and the next one.
One you have that file created, you can use it as usual with MP4 files and MP4Box: concatenate two TTML tracks (-cat), split a track into two files (-split), export samples from a track (-raws 1), fragment the file (-frag), segment the file for DASH use (-dash )…
Update as of 05/02/2014:
MP4Box tries to be media-agnostic when segmenting/fragmenting. So preparing TTML for DASHing involves using a proper encoding of your TTML content, just like preparing a video for DASHing requires specific encoding regarding GoP size. For a video stream, we don’t want MP4Box to be reencoding the frames at segment boundaries because it couldn’t find a RAP. The same is true for TTML, we don’t want to modify the TTML content when creating segments. In the case the TTML stream contains one long sample (e.g. a whole movie), we indeed repeat the same TTML file in every segment but with a different start and end time and with the ‘redundant’ flag, so that the player already playing has the ability to know that it should just extend the presentation of the previous sample. If you don’t want that behavior, you should split your TTML file into N TTML files, one per segment, modify your NHML file to create N samples, each pointing to a different TTML file. At import time, MP4Box will create different samples, and at DASH time, it will create segments with different samples (hopefully respecting the boundaries).