Let Me Keep My Metadata!

Posted by on Jul 25, 2005 | 2 Comments

If you buy a (conventional audio) CD, you expect to find basic
information about the music printed on the disc and/or the
paper insert:

  • Year published (copyright date)
  • Song titles
  • Performers/composers
  • (maybe) lyrics and other descriptive information

If you buy an MP3 file, most of this metadata should be embedded in
the file in addition to the sounds. Programs like iTunes and
Musicmatch that organize libraries of MP3s depend on the metadata.
Sometimes these programs ask the user to provide some of the metadata.
(These programs usually mostly ignore the file name.
“MyFavoriteSong.mp3″ is not a reliable indication of what song
is in the file!)

50 years ago, if you took a snapshot, you could expect the
photo-processor to put the processing date on the prints and/or slides.
You might pencil a description on the back of the print or the border of
the slide.

Today, if you take a digital photo, you can expect your camera
to put all sorts of metadata in the JPEG file: date, time,
exposure settings, focus settings, pixel counts
(e.g., 2048×1536), etc. Sometime in the future, you or
someone else might want to know the names of people in the picture, where it was taken, etc.

MP3 and JPEG files are analogous:

  • Both depend on esoteric compression algorithms to reduce huge data files
    to relatively small files, without loss of quality in the perception
    of the non-expert.
  • Both are immensely popular on the Internet, in portable devices, personal computers, et al.
  • Correspondingly, both have inspired prodigious numbers of
    software projects, from the obscure individual programmers to
    the mega-multinational companies, e.g., Microsoft.

Beyond the obvious difference that MP3 is for audio and JPEG is for still
images, a friend argues that the second biggest difference is that the
JPEG metadata must be provided by individuals (though as more and more
individuals create MP3s, e.g., for podcasts, this difference
fades). The second biggest difference, in my opinion, is that
MP3 efforts have been enormously successful (ignoring the copyright wars)
and JPEG software has been relatively (but not totally) unsuccessful.

Why? MP3 software is targeted at typical end users. How many people listen to music?
JPEG software seems mostly targeted at very sophisticated users.
How many people understand F-stops and ISO film speeds?
Metadata, the arcane additional information stored in an MP3 file
to describe the audio and stored in a JPEG file to describe the image,
has been treated radically, and unnecessarily, differently in the
MP3/JPEG worlds:

  1. With MP3, the metadata (ID3 and its competitors) started out simplistically, almost too
    simplistically. ID3 and its competition have evolved slowly enough to
    become de facto standards.
    The initial ID3 totaled 128 bytes, allowing for

    • Track Name – e.g., the song title
    • Artist Name
    • Album Name
    • Year
    • Comment

    and one byte for genre: “blues,” “classic
    rock,” “country,” etc. On the other hand, metadata for
    JPEG started more comprehensively and leapt forward without apparent
    consensus.
    Where a small set of analogs of the above would be a great starting
    point, there are a plethora of fields consuming as many bytes as
    needed. For example, in lieu of the “Track Name” there
    are fields for “Headline,” “Location,”
    multiple variants of “Title” and other fields that potentially
    name the image.
    There are just shy of a dozen ways to describe the “Creator.”
    There are at least two fields for “Description.” Get the
    “picture?”
    See JPEG captions and more,
    EXIF, and
    IPTC IIM for some of
    the background. If you do pursue these sources, pay attention to
    the complexity and redundancy.

  2. As a consequence of (1), MP3 software tends to be relatively
    consistent in handling of metadata. Further, there are lots of
    utilities readily available for manipulating MP3 metadata.
    On the other hand, JPEG software often ignores the metadata
    entirely. The few programs that attempt to notice the metadata do
    so in different ways making the metadata unreliable at best and useless
    at worst.
  3. Digital cameras typically include reasonable starting values for the
    metadata and store them in the photo file. Things start to fall apart
    when the file leaves the camera.

I have tried numerous pieces of photo-oriented software (Windows,
Mac and Open Source) and several photo-oriented Web sites.
With few exceptions, these experiences have been very disappointing.
The software and Web sites not only ignore existing metadata when they
could make good use of the metadata, they usually discard
the metadata. Aargh!

The two most promising sources of commercial
software for JPEG metadata seem to be Adobe and JASC.
Even these make it harder to find/edit the metadata than I would like,
but at least they have some usable provisions for handling metadata.
Google’s Picasa seems to comprehend some of the more interesting
metadata, but then seems to store changes to the metadata in a private
database, rather than the JPEG file.

Flickr seems to recognize some
interesting metadata when a file is uploaded, but after that point
seems to do things its own way. It does appear that the paid subscription
version of Flickr does allow for
downloading of the original files, contrary to what I said before
permanent reference link. I have not yet paid for a subscription, so I have not tested.

Some programming languages have classes or other support for JPEG metadata.
Of these, PHP seems to be the most comprehensive and interesting.
Since PHP is widely used for Web sites, the PHP support seems
especially encouraging.
See Metadata Toolkit Example.

Consistently handling metadata seems key to capabilities to
identify/sort/search/share digital photos.
Those are the next things to talk about.

They Took My JPEGs! & won’t give ‘em back!” (July 16, 2005)

They Took My Kodachrome!” (July 2, 2005)

“Don’t take my Kodachrome away” – Paul Simon (1973)

They took our jobs!South Park (April 28, 2004)

  • Bob Hall

    You should update the metadata thing. I am referring only to jpegs and tiffs. Lightroom does OK, but Bridge (in CS2 and CS3) is powerful. Microsoft has PhotoInfo which is very slick combined with Image Viewer. The only way to PRINT metadata alongside the photo appears to be Qimage, a terriffic printing package at a reasonable price.

  • Bob Hall

    I miss you guys on TechTV. What a shame that went away.