Current Online

The case for shared metadata standards
FAQs for those of us who don't know what that means

In case you haven't followed recent developments in metadata (few have!) and didn't read Steven Vedro's article "Why metadata matters" in Current last September, you may feel as if you've missed a meeting on the subject. Consider this a make-up class. The article was commissioned by CPB.

Originally published in Current, May 13, 2002
By Mary Jane McKinven

"Metadata" sounds like the name of a mascot in "Star Wars." The word is starting to pop up in public broadcasting with alarming frequency, along with a Chewbacca-sized phrase, "Digital Asset Management" (DAM). Chances are that you will run into these two characters, Metadata and DAM, before too long. So by way of introduction, here are a few FAQs (Frequently Asked Questions) about them. There are those among you--the kind of person who knows what "Dublin Core" means--who may find this too elementary. Fine. The rest of you, keep reading.

What is metadata?

Very simply, metadata is a way of organizing data about data, or information used to retrieve information. A bibliographic record such as a library catalog card is metadata about a book; the nutritional label on a soup can is metadata about the soup. Because it uses a set of standardized fields and a "controlled vocabulary" to fill those fields, the nutritional label is a metadata model or "scheme": because that scheme is universally accepted, it can also be considered a "metadata standard." The Dewey Decimal System and ZIP codes are other examples of metadata standards.

In public broadcasting we're most familiar with metadata for completed programs, such as the NOLA code for TV programs and their rights and releases, descriptions of program content, and so on. But production components (or "chunks") also need metadata. Production components that could be coded and retrieved according to a metadata scheme include: footage and audio, full interview transcripts, stock footage clips and stills. Their usefulness extends beyond the broadcast. Companion websites are now offering such program elements as full interviews.

But to date there is no systemwide agreement, or standard, about how completed programs or production components will be classified and sorted.

OK, got that. So what's DAM?

DAM helps you know what digital content you have that is of value, where it is, what form it is in and how to get hold of it. As you know, going digital means that everything--a video clip, an audio track, a still image--can be reduced to ones and zeros, fed through a wire or the air, and reassembled at the other end, whether on a TV screen, computer or Rio player. Call it MP3 or call it JPEG, it's all information that can now be accessed and manipulated to an unprecedented degree--but only if that information is tagged in a fashion that allows it to be found. Hence the need for metadata.

For broadcasters, the DAM process consists of: digitizing or "ingesting" text, audio, video, stills, etc. at various bit rates; naming and describing this content in a way such that it can be cataloged or indexed; storing the content and the information associated with it using tapes or server technology; browsing and searching both, via an intranet or web portal; and, finally, retrieving the content as needed via streaming, tape or other means. Whew.

Video indexing and retrieval is still a relatively new field, but fortunately people are developing ways to automate the job. There are now "ingestion" systems that can search video using closed captions, print on screen or speech recognition. Visitors to www.pbs.org can search the transcript of The NewsHour with Jim Lehrer and Julia Child Lessons with Master Chefs and call up video segments on their topics of interest (the video plays from the point at which the specific word occurs). Higher-level systems can convert speech into searchable text, or do scene searches that provide first and last frames of a scene. Researchers are also working on video-search and database tools that will be able to find images directly.

Is anyone in public broadcasting actually doing DAM?

Public broadcasting already has asset management systems of sorts--notebooks, tape libraries, edit lists, traffic software and even some databases. But these repositories of data aren't remotely integrated or accessible: they do not allow media to flow seamlessly throughout a station, let alone from Macs to PCs, out through e-mail to constituents and customers. You may already have the appropriate technology for retrieval and use of digital media sitting on your desk in the form of a PC, but that dream scenario of flowing content can only work if there are agreed-upon metadata standards so that all parts understand each other. The bottom line: You need a metadata standard to make DAM work.

The first major DAM initiatives in public television are taking place at major producing stations--notably WGBH, with 160,000 hours of program assets--and at stations with university connections that are encouraging content integration, such as Wisconsin PTV and KUED in Salt Lake City. South Carolina ETV and Kansas City's KCPT have been pioneers in making digital content accessible for education. As manager of the public radio satellite system, NPR is developing a service model for content distribution, the "Content Depot," which ultimately would allow stations to access programs on request. Public Radio International (PRI) is piloting a DAM infrastructure project with several stations to explore its potential use in program development and the industry's need for new methods of distribution.

Wait a minute--does anyone really want all this digital content?

As Antiques Roadshow demonstrates each week, you just never know what people will value in the future. But the most fundamental reason you can't just keep pumping out content without regard for its digital value is that audiences are slowly but relentlessly changing their behavior toward information.

The future of any modern media organization will depend on how well it can customize content for individual consumers. We are heading toward an on-demand future, when people will map their own media journeys and assemble their own content streams. No one is quite sure when we will reach the tipping point, but when we do, we will need to be able to provide a critical mass of "hits" to users. And we can't do that without a shared metadata standard.

The good news is that no one really thinks public broadcasting should digitize 50 years' worth of archival content. The challenge is to identify those assets that still have value. The way you treat new content from this point forward is another set of issues: What's the best scheme for the workflow in your station and your customer base?

How would a metadata model work at my station?

In a Platonically Perfect World (PPW), all the appropriate metadata about a program would be digitally pushed or pulled to where it would be most useful at your station: into your traffic and automation software, your print and Internet program guides, to your promotion and underwriting departments, your website, and so forth. We're still a long way from that PPW. But it's possible to imagine a digital platform where a program, its website, promotion and other communications were built in a truly interactive manner without unnecessary duplication.

One current and very important destination for metadata is a TV station's PSIP (Program System Information Protocol) encoder, which creates metadata that rides along with a DTV signal. That metadata set includes channel identification, program descriptions, program ratings, caption information, etc. Without it, digital set top boxes don't know what channel is coming in, and electronic program guides can't find your programming.

How could DAM help my station in my community?

As a matter of survival, public radio and television stations have been urged to consider themselves "digital libraries" or "telecommunications centers" and to forge new partnerships in their communities. But as long as stations continue to be mainly over-the-air broadcasters, their wealth of intellectual content cannot be easily repurposed for community use. Digital asset management and a consistent metadata scheme allow content to be repurposed with a minimum of additional resources. There may be new revenue opportunities in liberating your assets; however, for public broadcasting the true value may be in a re-energized public service role.

But my station doesn't have a lot of original program assets. Why should I join the push for a shared metadata standard?

The Internet has been compared to a library where all the books are strewn randomly on the floor. If that's the case, then public broadcasting is like that library--with most of the books' pages glued shut. The system needs to think about program content as an information stream that must be coded if users are ever going to find it.

There are currently several large projects related to DAM under way in public broadcasting that will affect your station, no matter what its size. And the success of all of them will depend upon public broadcasting coming to some shared understandings about metadata and DAM.

The first is replacement of public television's interconnection system. The PBS satellite lease is up in 2006, but planning requires that public TV complete a design this year so that it can request federal funding and notify satellite carriers by a 2004 deadline whether it intends to renew its lease. Since the interconnection system is the backbone of content distribution activities, it will need uniform DAM agreements or protocols.

There is also PBS' Digital Television Advanced Traffic and Programming Project (DTV-ATP), a.k.a. Orion. When planners began this effort to replace the home-grown PBS traffic software known as NOLA several years ago, it quickly became apparent that stations needed more and better content information. The project grew into a plan to provide a common database at PBS, accessible to others. The plan is being realized piece by piece, holding back some aspects until technology catches up and costs come down. But it's clear that Orion will have to be interoperable with any major asset management initiative in public television.

NPR's Content Depot is another DAM project of far-reaching impact. Like Orion, it's a long-haul project that will advance in stages to enhance program delivery and scheduling.

The ultimate vision for the Content Depot is integrated access to all the services provided by the public radio satellite system, including on-demand archives for subscribers.

A television effort that will ultimately rely upon shared metadata is the ADDE (Advanced Digital Distribution Entity), a mechanism proposed to allow stations to share master controls and digital storage and other facilities, reducing duplicative operational costs and freeing scarce resources to improve services to their local communities. CPB is supporting a regional prototype involving several Northwest stations and institutions, a project spearheaded by Dennis Haarsager of KWSU, a.k.a. the "ADDE Daddy." Any ADDEs in the future will require content storage and communication consistent with other asset management systems.

This sounds rather Big Brother-like. Aren't public stations too different for one metadata standard to work for all?

Public broadcasting stations vary widely in nature and content, so the idea of "one metadata or content classification scheme fits all" may chafe. But consider your choices:

Option A: You [the station] can devise your own method of organizing information. In doing so, isolate your program assets, limit your options and create more work for yourself as well as additional profits for vendors you hire to custom-tailor software for your "unique" situation.

Option B: You can participate in a shared metadata standard that allows you to more easily communicate about your content with other stations and national organizations, create records that are predictable and reliable, and take advantage of an industrywide format that assures data will still be compatible across platforms as computer technology advances.

"A" may seem more in the tradition of public broadcasting, but "B" would seem the wiser choice for the future.

What's holding us up, then? Why can't we just start sharing our assets?

You've got to start somewhere. Agreeing upon metadata standards is an essential and relatively inexpensive first step to take. According to many observers, the greatest obstacle to agreement may be public broadcasting's culture, which resists standardization in any form. But that resistance is now a luxury the system can't afford.

I find it vaguely depressing to think of so much content being endlessly recycled. Is this Groundhog Day come true?

The adoption of a shared metadata standard doesn't mean the end of original production - just the end of duplicative production. Shrinking production budgets are a fact of this new media age, as channels split finer and finer. Whether or not programs will ever be labeled like recycled paper products--"This program is made up of 10 percent postbroadcast recycled material"--recycling is already a fact of modern production life. A metadata standard would make it far less laborious for producers to find specific content.

DAM conceivably could generate revenue for stations that aren't currently in the business of footage sales. If there's a shared metadata standard and delivery mechanism, it may be your "cow shot"--to borrow a phrase from Wisconsin PTV's Byron Knight--that an advertiser buys for its dairy commercial. To paraphrase a famous cartoon, on the Internet no one knows you're a small station.

This brave new world will never come to pass--digital video storage is too expensive, right?

Yes, and a computer used to fill a room. If technology has taught us anything, it's that it eventually gets faster and cheaper. It's true that current digital storage is not cost-effective for large quantities of video material. But according to PBS's John Tollefson, PBS has looked at systems that store from one to three petabytes (a petabyte is one million gigabytes) of content--the equivalent of five to 15 years of new PBS content. Within two years, these systems will be available at costs competitive with current digital tape drive storage.

But won't copyright issues kill you?

Yes, they're daunting. But that won't prevent the future from arriving. The biggest corporations in America are discovering that litigation and regulation are not effective long-term strategies to protect their work, and are now launching services that will provide authorized access to content online. (For more on this issue, see "Hollywood Goes Digital, Like it or Not" by Charles C. Mann on the Frontline website.)

So what should we be doing now?

Ideally, stations contemplating DAM systems should be talking with each other; many already are. According to Alison White of CPB Television Operations, the corporation is encouraging collaboration by building an asset management website (soon to be launched), and supporting special meetings on this complex topic.

The CPB Future Fund is also supporting the creation of a public broadcasting Metadata Working Group. The project is being administered by WGBH, which has been active in this arena, and includes representatives from many disciplines.

Not surprisingly, library and industry groups are already far down the road on their own metadata schemes, and the Working Group is being advised by many experts from these communities. At the group's first meeting on April 24 and 25, participants agreed on several key points, including the urgent need to set metadata standards and the requirement that any metadata scheme be scalable to different licensees.

Even if your station isn't already engaged in DAM work, there are things you can do--and not do, according to CPB's White. She advises it's probably not the time to rush into any major purchasing decisions associated with DAM until there's greater system agreement on methods and models. But you should be assessing how well prepared your station is for information exchange. You could ask yourself, for example, how many separate databases you're using, White suggests. Who determines what terms are used, and who is affected? Stations can lay the groundwork for DAM by encouraging staff discussion about program information and workflow.

At this point you are probably worrying about what DAM will cost to implement. Candidly, there aren't yet any clear revenue models for DAM. CPB will soon engage a small group of station and national personnel to begin to address how to pay for asset management activities. Stay tuned. CPB will also shortly be launching a website devoted exclusively to DAM issues.

Let's presume this working group eventually comes up with metadata standards for public broadcasting. Who's going to make sure that all stations use them?

No one can compel stations to use them. But members of the new Metadata Working Group want the process of arriving at a metadata standard to serve the needs of all stations, small and large. In the long run, market forces may indeed bring about compliance: those that don't employ metadata standards will find themselves isolated and resource-poor.

So what is the "Dublin Core," anyway?

This is your reward for reading this far: use of the phrase "Dublin Core" will mark you as a real metadata maven. Only true cognoscenti know that "Dublin" refers to "Dublin, Ohio" rather than "Dublin, Ireland"--the Ohio town being the site of a 1995 meeting of librarians and computer types who locked themselves in a room until they came out with a core set of standards for resource description.

The "Dublin Core" is a set of 15 elements, or attributes, that can be applied to many forms of media. The group has now become the Dublin Core Metadata Initiative (DCMI), an international and cross-disciplinary effort, and their metadata scheme has been widely adopted by broadcasters and other institutions abroad. Most licensees who are seriously addressing the metadata issue, for example, Minnesota Public Radio, KUED, WGBH, et al., are using the Dublin Core as a starting point.

Mary Jane McKinven, a communications consultant in Washington, D.C., previously worked as director of science and explorations programming at PBS. She says metadata appeals to her "inner librarian." E-mail: [email protected].

 
 
To Current's home page
Earlier article: "Why metadata matters: it greases digital wheels" by Steven Vedro, Sept. 10, 2001.


Web page posted May 20, 2002
Current
The newspaper about public television and radio
in the United States
A service of Current Publishing Committee, Takoma Park, Md.
E-mail: webatcurrent.org
301-270-7240
Copyright 2002