CHM = Component-based Hypervideo Model
CHM uses the principle of nesting high level components for document hierarchical composition. While a toolbox of meaningful and useful high-level components eases the conceptualization and coding tasks, the model is extensible since further components can be created as needed from the existing lower-level ones.
The document content is abstracted from its form and delivery contexts through the adoption of a fully annotation-driven approach. The annotation structures add semantics and structure to video-based hypermedia and provide the needed infrastructure for augmenting the video with meaningful data that can be further used to construct rich views of the document.
A general overview of the model is provided in the figure above. A CHM hypervideo is formed by a set of low- and high-level components, building blocks that represent the formal information and composition units. These components are related to video flows contained within TimedMedia components. The rendering of these components allow the definition of the notion of timeline reference which is an abstract clock inferred from the video intrinsic time and used to synchronize all document components. Component data is retrieved from annotation structures, accessed through readers.
The CHM annotation concept expresses the association of any data or resource to a video -or a fragment thereof- for enrichment, explanation, structuring, linking or visualization purposes. An annotation is defined as any data associated to a logical spatio-temporal video fragment, which defines its scope -begin and end instants- in video time. For instance, an anchor/hotspot annotation addresses a fragment that covers its presentation interval; a link and its source anchor are defined by annotations associated to the same video fragment. Attributes of an annotation include its type, media reference, begin/end timecodes and content. Depending on user requirements, the proposed annotation model can be easily extended adapted to specific needs.
In the present work, we are not interested in video editing and feature extraction, neither by proposing annotation tools, which already exist. The annotation data content may be written by hand or generated by third party software such as Advene.
The figure presents the main elements and components involved in the hypervideo model. A hypervideo consists in a set of rendering components, expressed by the Component element, which is associated with a list of composition, placement, synchronization and behavioral attributes supplied by the author or retrieved from the annotation structure.
A hypervideo references at least one main audiovisual document accessed through the TimedMedia element, a "paramount" component that addresses at least a temporalized stream, audio or video. A TimedMedia component has an intrinsic duration which conveys a timing capability to the document, expressed by a virtual reference called TimeLine Reference (TLR) that synchronizes the presentation objects related to the TimedMedia element. Many TimedMedia components (therefore, many TLRs) may be present within the document, defining different hypervideo sub-documents.
A generic Component element within a hypervideo may not relate to any TLR, and if so is said to be time-independent. Components bound to a TLR are more specialized TLComponent elements with synchronization constraints. The time-independent Components can be used as containers or for static illustrations. A GlobalTimelineRef element allows the synchronization between different TLRs and the synchronization of the non time-dependent components.
The figure presents the basic data components that form a hypervideo. While Component elements are generic for handling data, the content with visual manifestation is held by the VisualComponent elements. Presentation specification attributes are associated to the component and can be used by the rendering engine.
Specific synchronized display components offer interactive interfaces for rendering temporalized data, provided as annotations. Multiple AnnotationReaders can dispatch annotation data, either user supplied or possibly automatically extracted from the media elements (textual transcription, screenshots, videoclip references...).
Among the plain display components, the continuous media players such as VideoPlayer and AudioPlayer present a generic Player interface for rendering and interacting with the content. Document content viewers such as TextViewer, RichTextViewer and ImageViewer allow the display of the textual and graphic content, retrieved from the annotation structure or manually introduced by the author. The rich text viewer is a general container for heterogeneous content like HTML pages, RSS feeds and broad XML-based content. For instance, a synchronized Wikipedia content retrieved by a set of URLs can provide wider information about different topics. The element Container is an abstract receptacle for components grouping in order to ease their spatial clustering and to unify their processing. The TimelineRefControlGUI element allows the definition of a graphic user interface for controlling and interacting with the TLR.
To enhance video accessibility, Text Captions, Graphic Overlays and wider Multimedia Overlays can be placed over the video object, like subtitles, animations and different graphical annotations. Such containers are instances of the appropriate viewer placed over the video player interface with appropriate temporal and spatial attributes. These components are convenience elements and are not explicitly defined, since any visual component can be overlayed by a multimedia content through a proper layout definition.
For hypervideo document design, we propose the set of high level components shown in the above figure, built upon the plain ones. This extensible set of useful built-in components eases the coding task. When a needed component does not exist, the author can still create it from the existing lower-level and possibly other high level ones.
A transcript component displays an interactive text generated from a textual transcription of the audiovisual document. It allows to navigate from the transcription to the corresponding time in the video, and possibly highlights in some way the text that corresponds to the video fragment being played.
These components offer guided navigation and selection of a particular narrative path through specific contexts (time intervals, layout, etc) of the hypervideo. A document map gives a general sketch of the presented content and offers a branching opportunity choice to navigate to a particular perspective. Many maps can be defined within the document to illustrate different features. The Map component can be made of screenshots or transcriptions. The TableOfContent (ToC) component defines a hypervideo table of content with navigation capabilities and reveals the structure of the video regarding a selected feature or annotation type. Many tables of content may be instantiated, presented as a plan or a hierarchical tree.
A Timeline component is a visual interactive representation of the hypervideo time, to spatially place particular features over time in a graphical and chronological representation. The timeline is supplied a slider to indicate the current position and standards buttons to control the presentation playback. Timelines place media elements, meaningful events and links along a timed axis, on different tracks. A Track component is the atomic temporal media representation, showing the active period of the corresponding annotation. The time axis is represented in a relative way since the effective tracks begin, end and duration may not be known, for instance when they are event-based.
The data access components presented in the figure are middlewares with functional interfaces that allow document components to query the data structures (annotations and resources). The AnnotationReader element describes a generic data access component. Among the dedicated data readers we can find:
The spatial model presented in the figure describes the physical layout of the document visual content on the presentation interface. This model is intended to take advantage for its implementation of existing layout paradigms and implementations, such as HTML/CSS.
A VisualComponent is a component with explicit visual manifestations having the main information reflected by the rendered appearance on the user interface. A non-visual component is usually an audible-only component or a hidden one. The visual components are placed within spatial regions (SpatialRegion element) which may embed other spatial regions. Placements and dimensions can be expressed explicitly or implicitly, absolutely or relatively. The Layout root element determines the placement of these regions on a virtual rendering surface.
The layout and spatial regions have attributes for presentation specification like sound level, default fonts and dimensions. Along with attributes of the contained high level and lower level components, this model encapsulates the AHM notion of Channel used for the same purposes.
The document temporal specification is achieved through a timeline-based model. Timeline-based concepts are often used within continuous media editors such as Director and LimSee2. Such models use an explicit time scale as a common temporal reference for all media objects.
A Timeline Reference (TLR) is a virtual time reference attached to a video playback component or to the global document, in order to schedule all the related document components. The time-based components are activated/deactivated when reaching their timecodes provided or computed by reference to the TLR. The non time-dependent objects are associated to the global document clock, the top level reference of the entire user declared TLRs.
The access and control of a TLR is performed thanks to the "position", "state" and "duration" attributes. Position indicates the playback point of the TLR, while State indicates whether the TLR time is in progress, paused or stopped. Duration is a read-only attributes holding the length of the TLR. Any update of the TLR position or state affects all the related component playback.
Timeline reference data is also used to introduce the notion of the hypervideo presentation context. A context captures the hypervideo state at a meaningful point of the presentation, corresponding to document events (user-defined events, time dynamics, navigation, interaction, layout modification, etc.) and defined by the components spatial positions and their playback times and states at the observed instants of each TLR time, combined with the global TLR state and position and the layout configuration.
The time scope of a component can be provided through absolute timestamps, provided by the author or retrieved for example from the related annotations timecodes, or can be event-based in order to define item-relative relations.
Media synchronization can be hard or soft. A hard synchronization forces the media to maintain a high synchronization with its TLR. A soft synchronization allows the media to slip in time with respect to the TLR; synchronization is then available for some meaningful instants: start and end of the media playback. During the playback interval, the synchronization is not actively maintained, introducing a great presentation flexibility. In both cases, pausing or stopping the TLR implies pausing or stopping all the related media components.
Document items "de-synchronization" is an issue in many scenarios, due for example to distant resource access through an unstable or unpredictable network. Regarding the content discrimination, the document presentation has to keep synchronized with the main video via the TLR. For instance, if the main video pauses for buffering, all the related items will be paused. Nevertheless, for an added content de-synchronization, the TLR would try to recover the content synchronization by resolving its temporal playback position at each time update.
CHM Hypervideo links are defined in space and time and are unidirectional, thus they are attributes of the source anchors, represented by the Anchor and Hotspot elements. There is no separate link component like it can be found in AHM (SMIL and HTML also do not use separate link components).
A classic hypertext anchor can be defined on a specific region of a textual or graphical component. When placed on a region of a continuous media, with spatial and temporal constraints, it is represented by a HotSpot element. A hotspot is a TLComponent positioned over the video player interface which fires events when activated. Hotspots can be defined through a complex structure to describe a moving region whose location changes over time.
A link may be internal or external. An internal link leads to a particular video hotspot, an instant of the timeline reference or any other point in the hypermedia space. Activating such a link causes a temporal shift of the presentation by an update of the TLR position. A return behavior can be specified to express whether the presentation will pause, stop, return to the source anchor point or continue from the target one. An external link leads to a foreign anchor expressed by an URL. The target anchor can be displayed in a new window or replace the current content. At such link activation, the current presentation can pause, continue or stop playing.
Differently from AHM, CHM does not rely only on a link-based model to navigate across independent story fragments but also uses an event model to trigger navigation actions. Moreover, different behaviors can be added for anchor and link activation thanks to event definition. For instance, a hotspot can be also used for other visualization needs, for example for displaying a pop-up window on mouse traversal.
The dynamic behaviour of a CHM hypervideo is represented by an event-based mechanism, expressed by the Event and Action elements and shown in the figure. Defined as an instantaneous interest occurrence, an event is triggered by a particular environment change at a specific point in space and time. Event listeners are responsible for detecting events occurrence and invoking appropriate treatment procedures called actions. An action can be an atomic instruction that acts on the document or a set of operations which may trigger other events and cause further actions.
Among the hypervideo domain specific events:
Many actions can be associated to an event: