WebVTT, MP4 files, DASH and GPAC

In a previous post, I described how to package and manipulate WebVTT content in MP4 files according to the latest ISO standard using MP4Box. Basic import of WebVTT or SRT file is as follows:

MP4Box -add file1.vtt:lang=en subtitle1.mp4


MP4Box -add file2.srt:FMT=VTT:lang=en subtitle2.mp4

and then, the basic usage to create DASH subtitle segments of 10 seconds:

MP4Box -dash 10000 subtitle.mp4:role=subtitle video.mp4

It is now possible to play these MP4 files with the GPAC players (on all supported platforms: Win, Mac, Linux, Android, iOS). So, try it out and let me know if it has bugs. You can for instance test this file or its DASH version.

MP4Client http://download.tsi.telecom-paristech.fr/gpac/webvtt/counter-vtt.mp4
MP4Client http://download.tsi.telecom-paristech.fr/gpac/webvtt/dash/counter-subtitles.mpd

This post describes some details on how the rendering was achieved.

[slideshare id=24802693&doc=m30301-reportonwebvttimplementation-130731093103-phpapp01]

The rendering of WebVTT cues is performed as follows. Upon detection of a WebVTT stream in an MP4 file, a dedicated decoding pipeline is created (by the vtt_in module) and a graphical layer is added on top of the video layer, if a video is also present in the MP4 file.

The WebVTT decoder is in charge of receiving the WebVTT/MP4 samples, of parsing the box-structured cues and of calling the graphical layer for rendering. The graphical layer is in charge of rendering WebVTT Cues. We have chosen to base the rendering upon the existing SVG rendering capabilities of GPAC. So, in real time, WebVTT cues are transformed into SVG elements, namely SVG Tiny 1.2 textArea elements. In fact, upon reception of a new WebVTT sample, for each cue, the decoder calls a JavaScript function to trigger the rendering of that cue. Due to the JavaScript approach, such rendering is highly configurable and can be modified without recompiling GPAC. You can for instance set the font-size, text color by editing the JavaScript file.

So far, WebVTT rendering in GPAC supports :

  • Top-based line positioning, bottom-based line positioning, explicit positioning
  • Line wrapping
  • Multiline cues
  • Multiple cues simultaneously
  • Basic text styling: italic, bold, underlined

Support for regions has not been added.

With this player we have been able to validate the playback of WebVTT cues in the following conditions:

  • Stored in MP4 files and read from disk or HTTP;
  • Stored in fragmented MP4 files and read as part of a DASH presentation;
  • With a video or without a video
  • Seeking into the stream

Currently, WebVTT playback is limited to cues packaged in MP4 files. Raw WebVTT files are not yet played. Some of the reasons for that are:

  • Storage of WebVTT cues in MP4 files ensures that each sample is a RAP; therefore, at each sample, previous cues can be safely discarded. Rendering is easier.
  • Storage in MP4 also easily enables delivering WebVTT content in a DASH streaming session, which is of high interest at the moment.

The associated MPEG contribution is here:

[textimport http://biblio.telecom-paristech.fr/cgi-bin/ws/biblio.cgi?type=standardisation&etat=submitted&id=14196]

One thought on “WebVTT, MP4 files, DASH and GPAC”

Leave a Reply

Your email address will not be published. Required fields are marked *