Metadata: What's in a Name
The industry strives for standardization, but how do you standardize the human mind?
By Ken Kerschbaumer -- Broadcasting & Cable, 11/24/2002 7:00:00 PM
Digital-asset management (DAM) is only as good as its metadata, the information associated with a digital asset that defines it and explains what is in the asset.
"Metadata must be organized with a standard vocabulary to enable users to search and locate media assets," says Brian Lay, Harris Automation Systems director of product marketing. "Archives are useless if you can't find what you're looking for."
After all, a metadata tag on the clip is subjective. One person's "riot" is another person's "civil disturbance," and how a clip is tagged has a lot to do with finding it later.
And that gets to the challenge of metadata collection. Ideally, it is about automating the process as much as possible. Using associated scripts and caption information can often help create a fairly comprehensive and accurate list of keywords.
Workflow for digital-asset management begins when the material is captured—whether in the field with a camera or pulling in a satellite feed in a station. It also extends beyond the video images to the audio, graphics, stills, caption information and even music that may eventually be used in a package.
Where, when, what
Examples of simple metadata includes producer or cameraperson comments during the shoot: information like where, when and what. Other information can include use of global positioning systems to provide truly accurate location information. At this point, the metadata is embedded on the tape in the camera or in the tape deck.
At the next step, ingest, the metadata is taken into the server for editing. There is where open standards become important. Unfortunately, it's a challenge for manufacturer and broadcaster alike.
"In general, we see asset-management systems being built around core open technology, but our own efforts to champion a standard like OMM have never met with wide appeal," says David Schleifer, director of Avid Broadcast. "We're delivering APIs into our systems that will allow third parties to achieve a tight level of integration for the custom implementations that we see many of them focusing on."
At ingest, even more metadata is entered, usually elaborating on the clips, noting shot changes, etc. The producers go through the shots ingested and begin logging which shots are good and which are bad. Here, too, the more metadata, the easier it is for the producer to get a handle on what the content is without having to watch all the video. Then, it's off to editing, playout and archiving.
The move to a DAM system also points up one of the new realities of the broadcast industry: IT has become as important as traditional broadcast engineering. Relying fully on computer systems, DAM also taps into several approaches to represent the metadata. In a presentation at the International Broadcasting Convention in September, D.J. Rayers, of the BBC's research and development department, laid out a few metadata format possibilities: XML, MXF, AAF and the UMID.
Search for a standard
"XML is a standard that allows us to represent data in a form that is easily exchanged in files or through the Web," he said in his presentation entitled "Metadata in TV Production. "The parties to the exchange still have to agree what the data is and on a data model as XML only solves the coding and formatting, not the design and agreement problem."
The big advantage of XML, Rayers says, is that it is relatively simple, open and well supported. The disadvantage is that it is inefficient in bandwidth and storage space, although efficiency can be improved with the use of a binary format called BiM, to which XML documents can be converted.
MXF, or the Material Exchange Format, allows metadata to be combined directly in a simple file. In addition, says Rayers, it can be streamed, which means that the metadata and content material can be sent as a whole through and between systems as either files or streams.
"It can also be transferred with File Transfer Protocol [FTP], and, in fact, any existing known file-handling technique can carry MXF," says Rayers. "It is probable that MXF will become a key technology in our business."
The key to MXF is the use of the UMID protocol, or Unique Material IDentifier. The UMID serves as a lookup ID number for the metadata, essentially a pointer via a database.
"The advantage is that there is no need, and there may not be space, to hold the metadata directly on a tape or in a data stream," says Rayers. "The disadvantage is that relatively sophisticated database software is required to be operational at many points in the broadcast chain."
That leads to a caveat on the potential of UMID. Rayers notes that it isn't clear that the sophisticated database is a practical possibility in a real, complete broadcast chain. "The UMID may be better exploited initially within smaller subsystems like 'capture to ingest.'"
But the advantage of the UMID, which refers to metadata rather than having the values in the file itself, is clear. For one thing, it allows the value to be administered, checked and corrected at one central point.
Explains Rayers, "If someone's name and address were included as metadata in files, and they subsequently moved houses, then the reference would keep it up-to-date, whereas the value would not. It's important when designing a system to be clear that this behavior is correct for the application under construction."
Beyond defining the metadata is the issue of actually searching it. Keyword and category searches are good in some contexts, bad in others. Anyone who has ever used a search engine on the Internet is well-versed in the limits and frustrations. And while search engines for digital-asset management are improving, there is still work to be done.
It appears that XML will play an important part in asset-management integration for the foreseeable future. The advantage is that, as far as standards go, it's well-accepted, and, odds are, different products from different asset-management tools work with XML. But that still leaves the challenge of data integration and establishment of naming conventions and the creation of data fields.
One effort striving for standardizing terms and fields is the Dublin Core Metadata Initiative (DCMI), which was created in 1995 (and named for Dublin, Ohio) and currently has more than 800 people from 45 countries working on a new metadata initiative.
The mission of the DCMI is to develop metadata standards for discovery across domains, define frameworks for the interoperation of metadata sets, and facilitate development of community-specific metadata sets that are consistent with the first two goals.
As it stands, the Dublin Core has defined 10 attributes from the ISO/IEC 11179 standard for the description of data elements (see box above).
The DCMI is working closely with other development communities, including W3C, the RDF and XML developer communities, and says that providing guidance for the encoding of Dublin Core metadata in HTML, XML and RDF is critical to the success of the Initiative.
But the challenge of getting the message out remains, as is the case with most efforts, based on good old-fashioned meetings and word of mouth.
One of the attractive aspects of DCMI is that, of the 10 attributes, six are common to all Dublin Core elements: version, registration authority, language, obligation, DataTips and maximum occurrence. So that reduces the elements that can change to four: name, identifier, definition and comment.
The gets to the core goal of the Dublin Core: simplicity. It's intended to be usable by non-catalogers as well as by resource-description specialists. Most of the elements have commonly understood semantics of roughly the complexity of a library catalog card.
The DCMI effort is not the only one delving into metadata standards. The BBC has created its own system, called Standard Media Exchange Framework (SMEF), which it will be sharing with other broadcasters. Another is MPEG-7, the "Multimedia Content Description Interface."
In the end, however, the key may be in making sure that whatever system is in place, the metadata is built around plain human language. No matter which way the standards zig or zag, one thing is certain: Odds are great that the best words to use in cataloging an element will be the best words for finding that element tomorrow.
No related content found.
No Top Articles
Digital Rapids provides market-leading software and hardware solutions, technology and expertise for transforming live and on-demand video to reach wider audiences on the latest viewing platforms more efficiently, more effectively and more profitably. Empowering applications from..more