The data model defined in the previous sections must be serialized to be exchangeable. To this effect, 2 serialization formats have been specified, in order to improve interoperability between applications that use the Cinelab data model. The first format is based on XML, while the second one uses the JSON syntax. In addition, a Zip-based serialization has been specified in order to make it more convenient to store huge, non-text resources.
To ease identification of Cinelab package, applications SHOULD use the following file extensions: .cxp for plain XML files (Cinelab XML Package), .czp for compressed files (Cinelab Zip Package), .cjp for JSON files (Cinelab JSON Package).
XML does not offer a standard way to correctly handle large binary objects like images, application files, etc. Moreover, plain XML files can reach huge sizes. The same arguments apply to JSON syntax. We thus use a OpenDocument-like format to store the XML representation of a package with its associated binary files, and to compress this content. The file is a standard Zip file, whose structure is described below.
Information about the files present in the package is stored in a XML manifest file. It is always stored as META-INF/manifest.xml. Its main data is
The package is serialized as a .zip file, using the same layout and principles as the OpenDocument format (see pp. 684-692 of http://www.oasis-open.org/committees/download.php/12572/OpenDocument-v1.0-os.pdf ).
General layout:
Contents (for annotations, relations, views, etc) can either be stored directly in the XML file, or externalized in the data/ directory. (cf OpenDocument p. 686)
In a given file contained in a package, relative URIs are used to reference other files of the same package, but also to reference other files of the filesystem. The following restrictions are imposed for internal references: * only files of the same package can be referenced internally * URIs referencing another file of the same package MUST be relative and MUST NOT contain paths that are not part of the package. This notably means that files in a package MUST NOT be referenced through an absolute URI. * a file in a package cannot be referenced from the outside of the package (either from the filesystem or another package)
A relative path present in a file contained in a package must be parsed exactly like it would if the package is uncompressed in a directory with the same basename as the package. The base URI of relative path is the URI of the directory containing the file containing the relative path.
For instance, the userfiles/foo.txt references a user file (package resource). ../file.txt allows to access a file in the same directory as the package.
Any other URI reference, specifically those that specify a protocol (http:), an authority (i.e. //) or an absolute path (i.e. /) do not need any specific processing. This means that absolute paths do not reference files inside of the package, but inside of the hierarchy (filesystem most of the time) containing the package.
A graphical, iconic representation of the document MAY be generated when the file is saved. It should be a representation of the default view for the packagem, and should be generated without effect, frame or borders.
The icon is saved as Thumbnails/thumbnail.png. The file and containing directory are not mentioned in the manifest.xml file, since they are not really part of the document.
In accordance with the Thumbnail Managing Standard (TMS) (cf www.freedesktop.org), icons MUST be saved as 24-bit PNG files, non-interlaces, with complete alpha transparency. The required size is 128x128 pixels.
Cf OpenDocument spec, p. 687
The encoding of XML serialisation MUST be UTF-8.
In accordance with the model, package metadata MUST contain the following keys: dc:creator, dc:created, dc:contributor, dc:contributed. In package elements, these metadata may be omitted from the serialisation, and are then inherited (since they must be available in the model) using the following rules:
In the example XML file, multiple commented cases are proposed.
The package pm:namespaces metadata is specifically processed: it is encoded in the XML root element as xmlns attributes.
To make the generated XML easier to read, some metadata specified in the applicative model are encoded as attributes instead of plain metadata (type for annotations and relations), or as elements (annotation-type, relation-type, schema). See the RelaxNG below for more information.
The compact RelaxNG notation is used to specify the proposed format: cinelab.rnc
# Advene XML format description
# RNC Tutorial: http://relaxng.org/compact-tutorial-20030326.html
# Cinelab Application Model:
default namespace = "http://advene.org/ns/cinelab/"
namespace cam = "http://advene.org/ns/cinelab/"
# Dublin Core model
namespace dc = "http://purl.org/dc/elements/1.1/"
# XML Schema datatypes
datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes"
grammar {
start = element package {
## id_attribute &
## Not really an ID, but something is needed to *suggest* ids for
## dynamic imports or application identifier (for the web server).
## Should probably be dynamically extracted from the URI instead.
uri_attribute?
&
element medias {
element media {
id_attribute &
url_attribute &
attribute unit { "ms" | "frame" }? & # defaults to ms
attribute origin { xsd:long }? & # defaults to 0
tags_element? &
element meta {
common_meta_elements &
element duration { xsd:long }? &
uri_meta_element?
}?
}*
}?
&
element imports {
element import {
attribute id { import-identifier } &
url_attribute &
uri_attribute? &
tags_element? &
element meta {
common_meta_elements
}?
}*
}?
&
element annotations {
element annotation {
id_attribute &
attribute media { identifier-ref } &
attribute begin { xsd:long } &
attribute end { xsd:long } &
content_element &
tags_element? &
element meta {
common_meta_elements &
element type { id-ref_attribute }
}?
}*
}?
&
element relations {
element relation {
id_attribute &
content_element? &
element members {
element member { id-ref_attribute }*
} &
tags_element? &
element meta {
common_meta_elements &
element type { id-ref_attribute }
}?
}*
}?
&
element tags {
element tag {
id_attribute &
tags_element? &
element imported-elements {
element \element { id-ref_attribute }*
}? &
element meta {
common_meta_elements &
constraint_meta_element?
}?
}*
}?
&
element annotation-types {
element annotation-type { type_structure }*
}?
&
element relation-types {
element relation-type { type_structure }*
}?
&
element lists {
element \list { list_structure }*
}?
&
element schemas {
element schema { list_structure }*
}?
&
element queries {
element query {
id_attribute &
content_element &
tags_element? &
element meta {
common_meta_elements &
constraint_meta_element?
}?
}*
}?
&
element views {
element view {
id_attribute &
content_element &
tags_element? &
element meta {
common_meta_elements &
constraint_meta_element?
}?
}*
}?
&
element resources {
element resource {
id_attribute &
content_element &
tags_element? &
element meta { common_meta_elements }?
}*
}?
&
element external-tag-associations {
element association {
attribute \element { identifier-ref } &
attribute tag { identifier-ref }
}*
}?
&
element meta {
element dc:creator { text } &
element dc:contributor { text } &
element dc:created { xsd:dateTime } &
element dc:modified { xsd:dateTime } &
element * -
(dc:creator | dc:contributor | dc:created | dc:modified | cam:*) {
id-ref_attribute | text
}*
}
}
##
## Reusable elements and structures
##
## tags_element is used in all elements.
tags_element = element tags {
element tag { id-ref_attribute }*
}
## content_element is used in annotation, relation, queries, views &
## resources.
## A content has a mimetype, and defines its data either through a
## reference to an external resource, or through its #DATA section.
content_element = element content {
attribute mimetype { text }?, # defaults to text/plain
attribute encoding { "base64" }?, # only encoding supported
(url_attribute | text)
}
## type_structure defines the common structure of the following elements:
## annotation-type, relation-type
type_structure =
id_attribute &
tags_element? &
element meta {
common_meta_elements &
constraint_meta_element? &
element representation { TALESstring }? &
element element-color { TALESstring }? &
element content-mimetype { text }? &
element content-model { id-ref_attribute }?
}
## list_structure defines the common structure of the following elements:
## list, schema
list_structure =
id_attribute &
element items {
element item { id-ref_attribute }*
}? &
tags_element? &
element meta {
common_meta_elements
&
constraint_meta_element?
}
##
## meta-data related elements and structures
##
common_meta_elements =
element dc:creator { text }? &
element dc:contributor { text }? &
element dc:created { xsd:dateTime }? &
element dc:modified { xsd:dateTime }? &
# Almost any element can define a color
element color { TALESstring }? &
# Allow to have user-defined metadata items
element * - (dc:creator | dc:contributor | dc:created | dc:modified |
cam:*) {
id-ref_attribute | text
}*
constraint_meta_element = element element-constraint {
# Reference an existing view/test
id-ref_attribute
}
uri_meta_element = element uri { xsd:anyURI }
##
## reusable attributes
##
## NB: it would have been sensible to use 'href' instead of 'url', which is
## common practice to hold a link (HTML, XLink), but since the API uses
## 'url', it seemed a better idea to keep the XML format consistent with the
## URI
url_attribute = attribute url { xsd:anyURI }
uri_attribute = attribute uri { xsd:anyURI }
id_attribute = attribute id { identifier }
id-ref_attribute = attribute id-ref { identifier-ref }
##
## attribute special datatypes
##
## Naming identifiers
identifier = xsd:string {
pattern = "([a-zA-Z_][a-zA-Z0-9_\-]*|:[a-zA-Z0-9_:\-]*)"
}
## import identifiers have additional restrictions restrictions
import-identifier = xsd:ID { pattern = "[a-zA-Z_][a-zA-Z0-9_\-]*" }
## Identifier references allow linked elements (beginning with : )
identifier-ref = xsd:string {
pattern = "([a-zA-Z_:][a-zA-Z0-9_:\-]*)"
}
## a regexp for TALESstrings would be overly complex, so: A
## TALES-string is a character string which can contain TALES
## expressions embedded with the ${...} notation. For example, the
## TALES expression foo/bar alone is represented by the TALES
## string ${foo/bar}
TALESstring = text
}
An example of conforming XML is given below, and can be downloaded here.
<package xmlns="http://advene.org/ns/cinelab/" xmlns:dc="http://purl.org/dc/elements/1.1/">
<meta>
<dc:creator>pchampin</dc:creator>
<dc:created>2010-09-01T12:33:53.403508</dc:created>
<dc:contributor>oaubert</dc:contributor>
<dc:modified>2010-09-06T12:33:53.420459</dc:modified>
<dc:description>Example Cinelab package</dc:description>
<dc:title>Nosferatu analysis</dc:title>
<default_utbv xmlns="http://www.advene.org/ns/advene/">start_view</default_utbv>
</meta>
<imports>
<import id="cam" url="http://liris.cnrs.fr/advene/cam/bootstrap" />
</imports>
<annotation-types>
<annotation-type id="free-text-annotation">
<tags>
<tag id-ref="important" />
<tag id-ref="todo" />
</tags>
<meta>
<dc:modified>2010-09-02T12:33:53.416368</dc:modified>
<!-- dc:creator, dc:created are inherited from the package -->
<!-- dc:contributor is inherited *from the package*, as no dc:creator
is explicitly specified here -->
<color>#55ff55</color>
<element-color>${here/tag_color}</element-color>
<element-constraint id-ref=":constraint:free-text-annotation" />
<dc:description>Shot layout of the movie</dc:description>
<dc:title>Shots</dc:title>
</meta>
</annotation-type>
<annotation-type id="shots">
<meta>
<dc:created>2010-09-02T12:33:53.414772</dc:created>
<dc:creator>oaubert</dc:creator>
<!-- dc:contributor, dc:modified are inherited from dc:creator and
dc:created, respectively -->
<element-constraint id-ref=":constraint:shots" />
</meta>
</annotation-type>
</annotation-types>
<tags>
<tag id="important">
<meta>
<color>#00ff00</color>
<dc:created>2010-09-02T12:33:53.407836</dc:created>
<dc:creator>oaubert</dc:creator>
<dc:modified>2010-09-03T12:33:53.409026</dc:modified>
<!-- dc:contributor is inherited from dc:creator -->
<dc:description>Important things to note</dc:description>
<dc:title>Important</dc:title>
</meta>
</tag>
<tag id="todo">
<meta>
<dc:contributor>pchampin</dc:contributor>
<dc:modified>2010-09-03T12:33:53.406964</dc:modified>
<!-- dc:creator and dc:created are inherited from the package -->
<color>#ff4444</color>
<dc:description>Things to work on</dc:description>
<dc:title>TODO</dc:title>
</meta>
</tag>
</tags>
<medias>
<media id="m1" url="/data/video/Nosferatu.avi" origin="0" unit="ms">
<meta>
<uri>http://liris.cnrs.fr/advene/videos/baz.avi</uri>
<dc:contributor>oaubert</dc:contributor>
<dc:created>2010-09-06T12:33:53.404347</dc:created>
<dc:creator>oaubert</dc:creator>
<dc:modified>2010-09-06T12:33:53.404904</dc:modified>
</meta>
</media>
</medias>
<annotations>
<annotation begin="1230" end="4560" id="a1" media="m1">
<content mimetype="text/plain">{ 'num' : 1, 'title': 'Introduction', 'characters': [ 'john doe', 'jane doe' ] }</content>
<meta>
<type id-ref="free-text-annotation" />
<dc:contributor>oaubert</dc:contributor>
<dc:created>2010-09-06T12:33:53.417550</dc:created>
<dc:creator>oaubert</dc:creator>
<dc:modified>2010-09-06T12:33:53.420459</dc:modified>
</meta>
</annotation>
<annotation begin="1230" end="4560" id="a3" media="m1">
<content encoding="base64" mimetype="application/json" />
<meta>
<type id-ref="shots" />
<dc:contributor>oaubert</dc:contributor>
<dc:created>2010-09-06T12:33:53.419975</dc:created>
<dc:creator>oaubert</dc:creator>
<dc:modified>2010-09-06T12:33:53.419975</dc:modified>
</meta>
</annotation>
<annotation begin="4560" end="7890" id="a2" media="m1">
<content encoding="base64" mimetype="image/png" />
<meta>
<type id-ref="free-text-annotation" />
<dc:contributor>oaubert</dc:contributor>
<dc:created>2010-09-06T12:33:53.418975</dc:created>
<dc:creator>oaubert</dc:creator>
<dc:modified>2010-09-06T12:33:53.418975</dc:modified>
</meta>
</annotation>
</annotations>
<views>
<view id=":constraint:free-text-annotation">
<content mimetype="application/x-advene-type-constraint">mimetype=application/json</content>
<meta>
<dc:contributor>oaubert</dc:contributor>
<dc:created>2010-09-06T12:33:53.410127</dc:created>
<dc:creator>oaubert</dc:creator>
<dc:modified>2010-09-06T12:33:53.416718</dc:modified>
</meta>
</view>
<view id=":constraint:shots">
<content mimetype="application/x-advene-type-constraint" />
<meta>
<dc:contributor>oaubert</dc:contributor>
<dc:created>2010-09-06T12:33:53.414208</dc:created>
<dc:creator>oaubert</dc:creator>
<dc:modified>2010-09-06T12:33:53.414208</dc:modified>
</meta>
</view>
</views>
</package>
The JSON serialization has been defined to facilitate the exchange of package information in web-based contexts.
The encoding of JSON serialisation MUST be UTF-8.
To make the generated JSON easier to read, some metadata specified in the applicative model are encoded as attributes instead of plain metadata (type for annotations and relations), or as elements (annotation-type, relation-type, schema).
The package is represented by a JSON object with the following properties:
In accordance with the model, package metadata MUST contain the following keys: creator, created, contributor, contributed. In package elements, these metadata may be omitted from the serialisation, and are then inherited (since they must be available in the model) using the following rules:
The following example JSON file provides an example package.
{
"format": "http://advene.org/ns/cinelab/",
"imports": [{
"id": "acav",
"url": "http://acav.dailymotion.com/std-schemas-v1.cjp"
}],
"medias": [{
"id": "video",
"url": "http://www.dailymotion.com/video/xdg0h0",
"meta": {
"title": "Ben se fait des films"
}
}],
"annotation_types": [
{
"id": "Character",
"meta": {
"description": "Appearance of the main characters.",
"content-mimetype": "application/json",
"content-model": { "id_ref": "Characters_model" }
}
},
{
"id": "Supernatural",
"meta": {
"description": "An appearance of something supernatural."
}
}
],
"resources": [
{
"id": "Character_model",
"content": {
"data": {
"enum": ["Dracula", "Jonathan", "Nina", "Reinfield"]
}
}
},
],
"tags": [
{
"id": "funny"
},
{
"id": "scary"
}
],
"annotations": [
{
"id": "a1",
"type": "Supernatural",
"media": "video",
"begin": 1234,
"end": 5678,
"content": {
"data": "a flying toaster"
},
"tags": [ "funny" ]
},
{
"id": "a2",
"type": "acav:TimedText",
"media": "video",
"begin": 1234,
"end": 5678,
"content": {
"mimetype": "application/json",
"data": {
"text": "ceci est un sous-titre",
"style": "font-size: 120%"
},
"model": "acav:TimedText_model"
},
"tags": [ "funny", "scary" ]
},
{
"id": "a3",
"type": "Character",
"media": "video",
"begin": 234,
"end": 567,
"content": {
"data": '"Nina"'
},
"tags": [ "funny" ]
},
],
"meta": {
"creator": "Pierre-Antoine Champin",
"created": "2011-06-09T07:25:43",
"contributor": "Pierre-Antoine Champin",
"modified": "2011-06-09T07:25:43"
}
}
Two JSON-Schema schemas are proposed: a general schema and a more strict schema that does not allow additional undefined properties to be added to elements.
We include below the more permissive schema:
{
"description":"Cinelab JSON Package (CJP)",
"version": "1.0",
"$schema" : "http://json-schema.org/draft-03/schema#",
"id" : "http://advene.org/ns/cinelab/cjp#",
"type":"object",
"properties":{
"__definitions": {
"description": "A placeholder for reusable subschemas",
"type": [
{
"id": "#id_ref",
"type": "string",
"pattern": "^([a-zA-Z_][a-zA-Z0-9_\\-]*:)?([a-zA-Z_][a-zA-Z0-9_\\-]*|:[a-zA-Z0-9_:\\-]*)$"
},
{
"id": "#strict-id_ref",
"type": "string",
"pattern": "^[a-zA-Z_][a-zA-Z0-9_\\-]*:([a-zA-Z_][a-zA-Z0-9_\\-]*|:[a-zA-Z0-9_:\\-]*)$"
},
{
"id": "#TALESstring",
"type": "string",
"description": "a regexp for TALESstrings would be overly complex, so: A TALES-string is a character string which can contain TALES expressions embedded with the ${...} notation. For example, the TALES expression foo/bar alone is represented by the TALES string ${foo/bar}"
},
{
"id": "#proto-content",
"type": "object",
"properties": {
"mimetype": {
"description": "TODO: better regex for mimetypes?",
"type": "string",
"pattern": "^[-+a-z0-9]+/[-+a-z0-9]+$",
"default": "text/plain"
},
"model": { "$ref": "#id_ref" }
}
},
{
"id": "#content-with-data",
"extends": [{ "$ref": "#proto-content" }],
"properties": {
"url": {
"type": "string",
"format": "uri",
"required": true
},
"data": {
"disallow": "any"
},
"encoding": {
"disallow": "any"
}
}
},
{
"id": "#content-with-url",
"extends": [{ "$ref": "#proto-content" }],
"properties": {
"data": {
"type": ["string", "object"],
"required": true
},
"encoding": {
"enum": ["base64"]
},
"url": {
"disallow": "any"
}
}
},
{
"id": "#content",
"type": [
{ "$ref": "#content-with-url" },
{ "$ref": "#content-with-data" }
]
},
{
"id": "#meta",
"type": "object",
"properties": {
"creator": {
"type": "string"
},
"created": {
"type": "string",
"format": "date-time"
},
"contributor": {
"type": "string"
},
"modified": {
"type": "string",
"format": "date-time"
}
},
"additionalProperties": {
"type": ["object", "string", "number", "boolean"],
"properties": {
"id_ref": { "$ref": "#id_ref" }
},
}
},
{
"id": "#element",
"type": "object",
"properties": {
"id": {
"type": "string",
"pattern": "^([a-zA-Z_][a-zA-Z0-9_\\-]*|:[a-zA-Z0-9_:\\-]*)$",
"required": true
},
"tags": {
"type": "array",
"items": { "$ref": "#id_ref" }
},
"meta": {
"extends": [{ "$ref": "#meta" }],
"properties": {
"color": { "$ref": "#TALESstring" }
}
}
}
},
{
"id": "#element-with-content",
"extends": [{ "$ref": "#element" }],
"properties": {
"content": {
"extends": [{ "$ref": "#content" }],
"required": true
}
}
},
{
"id": "#type",
"extends": [{ "$ref": "#element" }],
"properties": {
"meta": {
"properties": {
"content_mimetype": { "type": "string" },
"content_model": { "type": "object" },
"element_constraint": { "type": "object" },
"representation": { "$ref": "#TALESstring" },
"elementColor": { "$ref": "#TALESstring" }
}
}
}
},
{
"id": "#list",
"extends": [{ "$ref": "#element" }],
"properties": {
"items": {
"type": "array",
"items": { "$ref": "#id_ref" }
},
"meta": {
"properties": {
"element_constraint": { "type": "object" }
}
}
}
}
]
},
"format": {
"enum": [ "http://advene.org/ns/cinelab/" ],
"required": true
},
"@context": {
"type": "object",
"patternProperties": {
"[a-zA-Z_][a-zA-Z0-9-_.]*": {
"type": "string",
"format": "uri",
}
}
},
"@": {
"type": "string",
"format": "uri",
},
"imports": {
"type": "array",
"items": {
"extends": [{ "$ref": "#element" }],
"properties": {
"id": {
"type": "string",
"pattern": "^[a-zA-Z_][a-zA-Z0-9_\\-]*$",
},
"url": {
"type": "string",
"format": "uri",
"required": true
},
"uri": {
"type": "string",
"format": "uri"
}
}
}
},
"medias": {
"type": "array",
"items": {
"extends": [{"$ref": "#element"}],
"properties": {
"url": {
"type": "string",
"format": "uri",
"required": true
},
"unit": {
"type": "string",
"enum": ["ms", "frame"],
"default": "ms"
},
"origin": {
"type": "integer",
"default": 0
},
"meta": {
"properties": {
"duration": {
"type": "integer"
},
"uri": {
"type": "string",
"format": "uri"
}
}
},
"frame_of_reference": {
"type": "string",
"format": "uri"
}
}
}
},
"annotations": {
"type": "array",
"items": {
"extends": [{ "$ref": "#element-with-content" }],
"properties": {
"type": {
"extends": [{ "$ref": "#id_ref" }],
"required": true
},
"media": {
"extends": [{ "$ref": "#id_ref" }],
"required": true
},
"begin": {
"type": "integer",
"required": true
},
"end": {
"type": "integer",
"required": true
}
}
}
},
"relations": {
"type": "array",
"items": {
"extends": [{ "$ref": "#element" }],
"properties": {
"type": {
"extends": [{ "$ref": "#id_ref" }],
"required": true
},
"members": {
"type": "array",
"items": { "$ref": "#id_ref" }
},
"content": { "$ref": "#content" }
}
}
},
"tags": {
"type": "array",
"items": {
"extends": [{ "$ref": "#element" }],
"properties": {
"imported_elements": {
"type": "array",
"items": { "$ref": "#strict-id_ref" }
},
"meta": {
"properties": {
"element_constraint": { "type": "object" }
}
}
}
}
},
"annotation_types": {
"type": "array",
"items": { "$ref": "#type" }
},
"relation_types": {
"type": "array",
"items": { "$ref": "#type" }
},
"lists": {
"type": "array",
"items": { "$ref": "#list" }
},
"schemas": {
"type": "array",
"items": { "$ref": "#list" }
},
"queries": {
"type": "array",
"items": {
"extends": [{"$ref": "#element-with-content"}],
"properties": {
"meta": {
"properties": {
"element_constraint": { "type": "object" }
}
}
}
}
},
"views": {
"type": "array",
"items": {
"extends": [{ "$ref": "#element-with-content" }],
"properties": {
"meta": {
"properties": {
"element_constraint": { "type": "object" }
}
}
}
}
},
"resources": {
"type": "array",
"items": { "$ref": "#element-with-content" }
},
"tagging": {
"type": "array",
"items": {
"type": "object",
"properties": {
"element": {
"extends": [{ "$ref": "#strict-id_ref" }],
"required": true
},
"tag": {
"extends": [{ "$ref": "#strict-id_ref" }],
"required": true
}
}
}
},
"meta": {
"extends": [{ "$ref": "#meta" }],
"creator": { "required": true },
"created": { "required": true },
"contributor": { "required": true },
"modified": { "required": true }
}
}
}