The Digital Media Project



Philip Merrill


DEU #40 Applying descriptions, ratings, processing and/or governance to a DM at the granularity required by the application




Philip Merrill

Affiliation/additional information:

Active Contributor, Pasadena, California, US

Date submitted:







Name of DEU

Applying descriptions, ratings, processing and/or governance to a DM at the granularity required by the application


Summary description of DEU

Applications permitting End-Users to perform commentary, quotation, and many other critical functions require the ability to address the source DM with sectional references as described in TRU quote. Such sectional references will require differing degrees of granularity depending on the application, and DMP standards should provide consistent methods ensuring that sufficient granularity can be achieved.

For example, a commentator might create a multimedia assemblage such that metadata-structured field sets are generated, able to exist either with the source DM file/bitstream being commented on or else stored in a scattered fashion with either personal or public linking to the source. Specific solutions to enable this should be able to handle selections from a wide range of content processing including but not limited to Resource Type transcoding, transmoding, and the assignment of sectional reference locators bookmarked for additional description(s) to be added subsequently


Example usages of DEU

TRU WS usages

Three usages were presented at the TRU Workshop April 26-27 2004 in Santa Monica, by John Clevenger of Virtual Conservatory, George Kerscher as presented by Robert Martinengo, and Prof. Alan Melby of Brigham Young University.

John Clevenger (DMP 0082) presented a conventional "Director" CD-ROM using destructively edited musical excerpts for music theory auditory skills instruction.

Robert Martinengo presented the need for transmoding content for rendering in formats the disabled can perceive, including the unique hacking rights presently enjoyed by the disabled as a recognized DMCA exception (recognized by the Librarian of Congress).

Prof. Alan Melby (DMP 0060) showed how playlists and edit decision lists in the form of "Electronic Film Reviews" can enable altogether new ways of appreciating audiovisual content. Many university instructors would like a greater freedom to do this but are presently holding back because they don't want to be sued. For example, scenes containing one actor can be viewed in sequence, and browsed between.

Asked to come up with requirements, Robert Martinengo suggested by April 29, 2004 e-mail:

Digital media shall not be protected in such a way that blocks access to the content by persons with disabilities.

Digital media shall enable alternate presentations of media for use by persons with disabilities.

Although these point to the challenge, they are maybe not the best way of approaching the correct specification of requirements. Leonardo commented on the first: "Is this the right formulation? The issue should not be that something is or is not protected in a way that blocks access to the content by persons with disabilities, but that persons with disabilities have access to the content." Leonardo commented on the second: "This again exchanges the goal (letting persons with disabilities access content) with the solution (media with different presentations)."

Desirable usages

By e-mail on the DMP reflector, I expressed strong opinions about desirable usages for what I then called "extensible transcodability" including the following (edited excerpts):

"Granular reference of multimedia assemblages such that metadata-structured field sets can be generated and spawned to exist and be edited independently - extensible transcoding and transmoding including XML and MPEG exploitations."

commenting on Robert Martinengo's proposed requirements -
"I suggest extensible transcodability covers more than these, at a higher level, and could cover disabled users by allowing Audio only or Visual only presentation conversions built in, for example captions, animations and audio-only descriptions or 'readings'. This would also support extensible metadata (e.g., large XML fields of meaningful data) such as a personal diary with linked media quote-bookmarks."

"Here is my effort to define 'extensible transcodability'. Start with Martin on transcoding Add playlists with timecode and the ability to blog derivative works by extracting data from any piece of digital content. We are not providing security by making extraction hard to do. We are providing tight security independently, so industry use of interoperable interfaces should support some wonderful things. For example, it liberates the black boxes to peform ingeniously cheap design breakthroughs because the box can use dedicated hardwired technology or software only technology or anything in between, always provided it meets conformance standards at the interface. However the data itself really must be plastic enough to be poured into a variety of formats, including text only (e.g., xml with coded video stored as text), audio only and video only. The skeleton of extensibility is important primarily to create derivative blogs and to allow a wide variety of community uses. As far as business models and getting paid for this stuff, we also provide that independently - security and e-commerce are off to the side. The data itself should lend itself to transcoding and extraction, particularly so that the disabled will not be entitled to hack the encryption since audio only or all visual will already be options. The need is to support this format-wise and in choice of technology. Once it has been accommodated it can sink or swim, but it is likely to comprise double digits of digital media consumption. Oh yeah, and accommodating this will make it easier to handle sectional references to support TRU quote."

Leonardo asked: "I understand transcoding, but it is not clear to me what is 'extensible' added to it. Do you mean the ability to add metadata (possibly sophisticated audio and visual metadata la MPEG-7)?"
"Yes, but I also mean the ability to split the cell so that, for example, extensive personal metadata could be stored with a reduced-resolution version of rich media that exists in one file, while the derived metadata and low-res copy have spawned their own independent file. For example, if this is my commentary and analysis of the movie Spiderman such that the full-screen-resolution copy of Spiderman has restrictive security that my spawned low-res-plus-metadata does not have provided I use it on my home or personal network. By extensibility I mean file independence and the ability to reform the skeleton of sectional references around a newly generated database (aka metadata/XML)."

Craig later said, "Or, are you talking about 'transmoding'? Transmoding is when you take content in one form, e.g. audio and transform it into text or video -> a slide show of still images, etc. There has been a lot of work done in MPEG-21 DIA, Digital Item Adaptation that not only deals in great depths with 'transcoding' but 'transmoding' as well. If you can name an auditory or visual impairment, they've dealt with it. I think they have even dealt with physical impairments such as those effecting how one can interact with a given Digital Item. I could see Transmoding as being different than Transcoding."

More e-mails followed including the way transcoding and transmoding work together with sectional reference - for example the ability to create "hooks to hang things on" by bookmarking points within a piece of DM content at which one user has already placed fields to be filled out with comments for answers to questions by another user.

Keyword Indexing

Research at PARC - called "ScentIndex" [1] - allows custom keyword indexing of ebooks. This is a good example of a usage at the granular level of the textual "word".

Keyframe identification

A human MPEG video encoder will look over the frames to be data-compressed and identify desirable keyframes for subsequent processing of the "in between" frames based on changes from the keyframe.

Use Case examples for words used in the name of this DEU

  • descriptions — Prof. Melby's example of Electronic Film Reviews
  • ratings — if one could link sectional quotations with the media reviews at
  • processing — if one could quote a screenshot vantage point from a game such that the appropriate data would be stored as a "state" of the game and then rendered (as opposed to just a bitmap screenshot)
  • governance — the ability to quote to selected frames and short video sequences on low-resolution copies of classic or recent films could be expected to include extensive governance procedures to improve security protecting the underlying content


TRUs related to this DEU

1. TRU to quote
2. TRU to make personal copy
13. TRU to annotate for personal use
14. TRU to edit for personal use
18. TRU to apply a rating to a piece of content
26. TRU to transcode
44. TRU of reasonable modification
47. TRU of factual reporting (see Possible Digital Support)
50. TRU of translation
54. TRU of copying for classroom instruction (see Possible Digital Support)
68. TRU to assign content description
72. TRU to access information about content
74. TRU to improve end-user experience
82. TRU of adaptation
88. TRU to make a print of a video scene (repurposing)


Enabling technologies

There has never been a comprehensive solution for this, although W3C DOM addressed the issue for web pages. SMPTE timecode is normally used for audio and video. MIDI uses measure number followed by a subdivisional number (1.000-4.999) that can be accompanied by additional logical fields.

There is an interesting contrast between the way most audio editors such as Pro Tools and Cubase handle sectional trimming and working with reusable parts contrasted with the approach of Sony ACID.

The ACID approach had been developed by Sound Forge of Wisconsin to maximally exploit DirectX and related Windows-only commands, and runs fast with fewer performance hits and latencies than most Microsoft software (in other words, extremely well). It allows a panel for browsing the file/folder hierarchy and performs powerful processing operations on the orginal file source, making it easy to select small loops and put a lot of processing on them, while the source of the fragment remains the entire source file (augmented by some sort of recordkeeping metadata). Duties like trimming, or tempo-shifting without artificial-sounding frequency shifting, are all automated with easy commands, icons and other graphic representations, command key-combinations, etc.

The contrast between Pro Tools/Cubase on the one hand and ACID on the other boils down to what you have to do to get a loop and place it. In short, Pro Tools and Cubase force the user to act destructively on some source file unless "save as" is used, whereas ACID is basically non-destructive, working through some sort of extensible metadata. A fragment/loop in ACID is additional information (including start and stop) of what portion of the source file to use, whereas Pro Tools and Cubase presupposes destructive editing (an exception is Pro Tools looping). An informed user would either start by copying raw audio under a new name in Pro Tools or Cubase, or else export a loop and then quit without saving. A "pool" exists in Cubase listing fragments and loops. ACID on the other hand could let one do this too, but it is less effort to use ACID's unique subdividing metadata.

XML is now a developed tool that could be easily offered as an option. For example, raw video as well as annotational metadata commentary can be well represented in XML interface categories/tags. Such representation - capable of being stored on printed paper - could be only one option of a transcodable and convertible logic that could also be burned and hardwired into equipment. Also note Recordare's MusicXML as an instance of progressive uses of XML for digital media representation.

MPEG-4 in particular has coped with a variety of different file types and it is suggested that it would be nice to know to what degree granular sectional reference can now be supported and for what formats. Many of the inherently mathematical formats are easily convertible, such as angles, distances and 3D points, but of course only support however many decimal points of subdivision are available in datatypes. A similar survey should be made for most common MIME types including Unicode support for the world's languages.

Of course it is expected that REL would be involved, and this should be convertible to other similar standards in use by groups such as OMA.


Benefits of DEU

Benefits End-Users. This DEU is a powerful enabling technology for many important TRUs that have not been supported adequately or securely supported (with access restrictions) in the past. Adequate granularity support could open up a bold new level of social interaction for digital media.





hooks to hang things on



[1] also described at