Let Me Keep My Metadata!
If you buy a (conventional audio) CD, you expect to find basic
information about the music printed on the disc and/or the
paper insert:
- Year published (copyright date)
- Song titles
- Performers/composers
- (maybe) lyrics and other descriptive information
If you buy an MP3 file, most of this metadata should be embedded in
the file in addition to the sounds. Programs like iTunes and
Musicmatch that organize libraries of MP3s depend on the metadata.
Sometimes these programs ask the user to provide some of the metadata.
(These programs usually mostly ignore the file name.
“MyFavoriteSong.mp3″ is not a reliable indication of what song
is in the file!)
50 years ago, if you took a snapshot, you could expect the
photo-processor to put the processing date on the prints and/or slides.
You might pencil a description on the back of the print or the border of
the slide.
Today, if you take a digital photo, you can expect your camera
to put all sorts of metadata in the JPEG file: date, time,
exposure settings, focus settings, pixel counts
(e.g., 2048×1536), etc. Sometime in the future, you or
someone else might want to know the names of people in the picture, where it was taken, etc.
MP3 and JPEG files are analogous:
- Both depend on esoteric compression algorithms to reduce huge data files
to relatively small files, without loss of quality in the perception
of the non-expert. - Both are immensely popular on the Internet, in portable devices, personal computers, et al.
- Correspondingly, both have inspired prodigious numbers of
software projects, from the obscure individual programmers to
the mega-multinational companies, e.g., Microsoft.
Beyond the obvious difference that MP3 is for audio and JPEG is for still
images, a friend argues that the second biggest difference is that the
JPEG metadata must be provided by individuals (though as more and more
individuals create MP3s, e.g., for podcasts, this difference
fades). The second biggest difference, in my opinion, is that
MP3 efforts have been enormously successful (ignoring the copyright wars)
and JPEG software has been relatively (but not totally) unsuccessful.
Why? MP3 software is targeted at typical end users. How many people listen to music?
JPEG software seems mostly targeted at very sophisticated users.
How many people understand F-stops and ISO film speeds?
Metadata, the arcane additional information stored in an MP3 file
to describe the audio and stored in a JPEG file to describe the image,
has been treated radically, and unnecessarily, differently in the
MP3/JPEG worlds:
- With MP3, the metadata (ID3 and its competitors) started out simplistically, almost too
simplistically. ID3 and its competition have evolved slowly enough to
become de facto standards.
The initial ID3 totaled 128 bytes, allowing for- Track Name – e.g., the song title
- Artist Name
- Album Name
- Year
- Comment
and one byte for genre: “blues,” “classic
rock,” “country,” etc. On the other hand, metadata for
JPEG started more comprehensively and leapt forward without apparent
consensus.
Where a small set of analogs of the above would be a great starting
point, there are a plethora of fields consuming as many bytes as
needed. For example, in lieu of the “Track Name” there
are fields for “Headline,” “Location,”
multiple variants of “Title” and other fields that potentially
name the image.
There are just shy of a dozen ways to describe the “Creator.”
There are at least two fields for “Description.” Get the
“picture?”
See JPEG captions and more,
EXIF, and
IPTC IIM for some of
the background. If you do pursue these sources, pay attention to
the complexity and redundancy. - As a consequence of (1), MP3 software tends to be relatively
consistent in handling of metadata. Further, there are lots of
utilities readily available for manipulating MP3 metadata.
On the other hand, JPEG software often ignores the metadata
entirely. The few programs that attempt to notice the metadata do
so in different ways making the metadata unreliable at best and useless
at worst. - Digital cameras typically include reasonable starting values for the
metadata and store them in the photo file. Things start to fall apart
when the file leaves the camera.
I have tried numerous pieces of photo-oriented software (Windows,
Mac and Open Source) and several photo-oriented Web sites.
With few exceptions, these experiences have been very disappointing.
The software and Web sites not only ignore existing metadata when they
could make good use of the metadata, they usually discard
the metadata. Aargh!
The two most promising sources of commercial
software for JPEG metadata seem to be Adobe and JASC.
Even these make it harder to find/edit the metadata than I would like,
but at least they have some usable provisions for handling metadata.
Google’s Picasa seems to comprehend some of the more interesting
metadata, but then seems to store changes to the metadata in a private
database, rather than the JPEG file.
Flickr seems to recognize some
interesting metadata when a file is uploaded, but after that point
seems to do things its own way. It does appear that the paid subscription
version of Flickr does allow for
downloading of the original files, contrary to what I said before
. I have not yet paid for a subscription, so I have not tested.
Some programming languages have classes or other support for JPEG metadata.
Of these, PHP seems to be the most comprehensive and interesting.
Since PHP is widely used for Web sites, the PHP support seems
especially encouraging.
See Metadata Toolkit Example.
Consistently handling metadata seems key to capabilities to
identify/sort/search/share digital photos.
Those are the next things to talk about.
“They Took My JPEGs! & won’t give ‘em back!” (July 16, 2005)
“They Took My Kodachrome!” (July 2, 2005)
“Don’t take my Kodachrome away” – Paul Simon (1973)
“They took our jobs!” South Park (April 28, 2004)





