If you haven’t had a look at the HTML5 track element — check it out my tutorial on HTML5 Rocks.
The track element provides a simple, standardised way to add subtitles, captions, screen reader descriptions and chapters to video and audio.
Tracks can also be used for other kinds of timed metadata. The source data for each track element is a text file made up of a list of timed cues, and cues can include data in formats such as JSON or CSV. This can enable DOM manipulation and other behaviour synchronised with media playback — as well as deep linking and media navigation via text search.