Friday, May 29, 2009

Grouping your files for storage

In my earlier post about image storage, I introduced some of the fundamental guidelines which should help you in designing a filing system that's going to work for you. I've already expanded on the topic of filenames, now we'll move on to where you put those files.

Group your files into different sets

Consider the various files you work on. For photography it's primarily going to be images and catalogs (e.g. Lightroom databases). I'll leave other more general-purpose files (e.g. email) out of this for now. Any system to organise your files is going to be much simpler if you can group the files together. For example, you might have a folder on your computers internal drive called Images1 containing lots of photo files, and an Images2 on an external Firewire or USB drive containing more.

EOS 5DmkII, 24-105mm/4 (A2_020073)
Putting all your photo files under one or two folders your management of the files is going to be easier than having them scattered across the rest of your files. Not only will it make it simpler when it comes to backing them up (and restoring them if necessary) but it will make your day-to-day operations simpler also. If you create some new files (e.g. PSD composites) and want to add them to your Lightroom database, all you'll need to do is use Import or Synchronize Folder to find any new files within the tree.

A corollary to that is that if you're using something like Lightroom to manage the files within a folder tree, you should only place files into that tree that you intend to import to Lightroom's database. Don't clutter the folders up with exported JPEG copies (e.g. formatted for the web or for printing) if you don't need to keep those files long-term. If they're easy to recreate with a batch operation, you should write them to an output folder outside your storage area (e.g. a "temp" folder) for later deletion. If your workflow involves saving and cataloging the files delivered to clients you may want to place that output folder inside one of your storage areas, but consider keeping them separate from the original/"master" files. Keeping them amongst the master files can complicate using things like using Synchronize Folder within Lightroom.

Whether you use the English "catalogue" or the American "catalog" is up to you. Here I'm just being consistent with the spelling used by Lightroom and Expression Media.
You should also have another folder somewhere to contain your Lightroom database (catalog) itself. For a Lightroom catalog called SampleCatalog there will be a folder called SampleCatalog which contains SampleCatalog.lrcat along with other files such as the matching Previews database. I like to have a Catalogs folder to contain the SampleCatalog sub-folder. This provides a known place to put any additional catalogs you decide to work with. I also place Expression Media catalogs within the Catalogs folder (I use EM to manage my video and audio files). If you're using Apple's Aperture you should be aware of where the Aperture database is being stored.

By default Lightroom will place its catalog within your Pictures (Mac) or My Pictures (Windows) folders, but you can easily move it to somewhere of your own choosing. On my laptop this Catalogs folder is located within my home directory so I can access the catalogs even if the external Firewire drives are disconnected. In some scenarios you might decide to place the Catalogs folder on the same drive as your Images folder so you can plug the drive into various computers and have both the catalogs and the images accessible (this is useful in school environments so students can work on the same files at home as easily as at school).

Don't be tempted to place the Catalogs folder within the Images folder. This will complicate things like backups, as well as opening yourself up to accidentally importing files within Catalogs as new images into the Lightroom database.

From here on I'll be referring to these top-level folders (e.g. Catalogs, Images1, Images2) as "sets". Later on we'll be talking about how to maintain backup copies of these sets. Your sets might contain images, catalogs/databases, movies, etc. By separating them we can easily do things like tell Apple's Time Machine to ignore them. Exactly how you separate them is in your hands: in my environment I put recorded videos, audio, and photos into the same set.

Don't tie the folder/set names to the name of the drive they're on. The drive's name will probably be something boring. Whatever naming scheme you use, be prepared for the future addition of more drives and don't get too attached to a name like Jim's Image Drive. The names I've been using for a while are Store01, Store02, etc.

Don't make the drive itself the root of the set: even if the drive only contains the Images2 set, you should have an Images2 folder. You should have the flexibility to move the sets between drives. Maybe because you're moving it to a bigger drive, maybe because the original drive is failing. The Images2 set might start out on Drive2, but in a year's time it might be on Drive5...

If you do move the location of a set containing images, all you have to do is in your Lightroom catalog is use the Find Missing Folder function to reset the location of the top folder in the set. Lightroom will then reset the locations in the catalog of however many thousand affected images and you can then continue on as normal. Expression Media has an equivalent Reset Folder Path function.

Organising your files within sets

One of the things I haven't spelt out is how to organise the sub-folders within these sets. Different people find different systems work for them. In my own system I usually use date-based folder trees (ending in per-month, per-week, or even per-day folders). Some people prefer to group their files by job (e.g. SmithWedding). As long as you've adhered to the guideline that all your files should have unique names, you will have the flexibility to reorganise the files as you experiment and find a system that works for you.

You should at least be consistent in your folder structures: you should consider the possibility of merging these folders in the future.
For example to make space on a drive you may wish to move a sub-folder to a storage area on a different drive, and life is going to be easier if the sub-folder doesn't need renaming. If you have two different SmithWedding folders which had been on different drives, you would have to first rename one of them to avoid a mess. This is very similar to the guideline about having unique filenames: sub-folder paths should be unique across your entire setup.

Keep checking back for the next installment in this series, where we'll talk about maintaining backups of your sets.

-- David

Continue reading "Grouping your files for storage"...

Tuesday, May 26, 2009

Naming your files

As mentioned in my previous article on image storage guidelines, each file in your collections of images should have a unique filename. It's time to talk about that some more:

You generally should NOT leave files with just their original camera-generated names. The standard "DCF" (Digital Camera Format) names have 8 characters, with the last 4 being digits (for example DSC_4756 or _IMG3921). You generally should set your cameras to "continuous" file numbering (so it doesn't reset the number to 0001 on each card format) but even then the camera can't guarantee that the filename will be unique. The camera is just trying to make sure that the filename is unique ON THAT CARD, but once you copy the file to your computer there's no longer any guarantee that the filename is unique.

Eventually the file number will wrap around back to 0001. You might be putting the new files into a different folder than you did 10,000 photos ago, but if you ever end up reorganising your files (e.g. making a collection of your favourite portraits) you might end up with conflicting filenames trying to be in the same folder. Also if you use a second camera (your own, or just a borrowed one) that camera might be producing conflicting filenames. In fact if you shoot with two cameras and swap cards between the bodies in mid-shoot, you can almost guarantee that you'll get the same filenames from each camera.

EOS 5DmkII, 24-105mm/4 (A2_020052)
In the previous article I wrote: Long filenames don't matter (within limits). When was the last time you used something other than pointing and clicking to open a photo file? Having unique filenames does matter.

While modern filesystems have for decades been able to cope with very long filenames, you can still today come across applications that have problems with this (including truncating your carefully-constructed filenames!). To avoid problems you should keep your filenames to no longer than 31 characters (not including the extension such as .TIF).

You should also avoid using ANY other characters than
ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz 0123456789_-
No spaces, no quotes, no periods ("full-stops"), no colons. Only the above characters. You may be using a Mac today, but at some point in the future you may need to recover your backups onto a different computer, so your filenames need to be accessible on all systems. Obviously the same will apply to any folders you use to organise the files.

EOS 5DmkII, 24-105mm/4 (A2_019991)
A common choice for constructing filenames is to extract the photo date/time from the EXIF data, and combining this with the camera-generated filenumber (note that using the hours and minutes along with the date is usually enough: you're very unlikely to be generating the same camera filenumber within the same minute!). I'll talk about how to do this below, but first we need to think about what we're trying to achieve.
For example DSC_4756.NEF taken at 3:45PM on the 26th of January 2008 could get renamed to 200801261545_4756.NEF. By omitting the DSC_ from the name we're able to make the name a bit shorter, but you could leave it in there if you preferred.

Note that using the date/time in this order (YYYYMMDDhhmm) means that if you are navigating your filesystem with a browser such as Finder (or Windows Explorer) the filenames will be conveniently sorted by default into chronological order. It should also be obvious that if your camera is set to the correct time that will increase the usefulness of these names!

Some people like to prefix this name format with their identifier so they can distribute copies of files and have it obvious whose photos they are. For example if I used DBP (for David Burren Photography) the file might be called DBP_200801261545_4756.NEF
Note that the length of this filename is 25 characters, leaving just 6 characters before you hit the 31-character limit mentioned above. So a PSD file derived from this might be named
By keeping the base filename the same but adding a suffix it's easy to ensure that the new filename is also unique. The association between the source and derivative files is also easy to see. Naming the new PSD something like TwoBirdsOnBeach.psd wouldn't help on either count.

To save on characters (maybe that versi above was trying to be version) some people will only use two digits of year (e.g. 0801261545_4756.NEF) although if you also have photos from last century in your collection that might get confusing.

Some people use a sequential counter to assign filenames rather than relying on the combination of date/time and camera file number, but you need to be careful that the counter is in fact sequential and never repeats the same number (especially if you use multiple computers and/or ever reinstall your computer). A variation of this could be to have a "job number" (call it XXXX) and use a format such as DBP_XXXX-yyyy (where yyyy is a sequential number starting at 1 for that job. That way you only need to remember the next job number (although there is still room for error).

But however you generate your filenames doesn't matter as long as they're going to be unique. Having consistency in your names will of course make things tidy...

Renaming the files as soon as they're copied to your system (or as they're copied to your system) and giving them unique filenames is your best option. Once each original image has its own filename, other images derived from those originals can inherit the name simply modified by the addition of a suffix to keep it unique. Rename early, rename once.

In my own setup I use a combination of date and unique number. I have it set so I can download a card and assign the files unique names without having to manually set the job number or sequence number for each card. Insert the card, initiate the download, and you're done.

Software can do the renaming for us (as long as you choose a suitable format: some software packages are not as flexible as others). If you are reorganising an existing collection of images you will want the functions to rename files in place, but for new images you will want the rename-during import functions.

Generating names in Lightroom

Lightroom has a Rename Photo function available in the Library module, and it uses Filename Templates to define the filenames. When importing files, if you select the option to copy files to a folder on your hard drive it also uses Filename Templates to define the new filenames.

The example on the right is using a photo file originally called IMG_0294.CR2. You can build a template that uses fixed text as well as fields extracted from each photo's metadata to create new names. If you're naming a group of files in one operation, you can use the Sequence field to allocate consequtive numbers. It will allow you to choose the starting number, but it is easy to forget to change the number (or even what to change it to) and thus end up re-allocating the same numbers. This is discussed above, and an example screenshot is shown below.

Once you've set up the template the way you want it you should save it as a new Preset, allowing you to quickly access the appropriate name format from either the Rename Photos or Import Photos dialog.

When Lightroom renames a file it stores the original filename in its internal originalFilename metadata field. This can be useful, but take note that if you rename it a second time the original name will be lost. Incidentally, Filename Templates are also used in the Export function to generate the new names of exported files.

Note that if you're renaming files in your database as part of an overhaul, if you do it outside Lightroom it will get confused and not know where the files it was managing went to. Doing the renaming (and any folder reorganisation) through Lightroom lets the catalog keep track of the files.

Generating names in Bridge

Adobe Bridge is a file browser (but one with very good integration with the Photoshop and Camera Raw - and thus with Lightroom if you export metadata to file XMP data).

Under its Tools menu is the Batch Rename function, and as you can see on the right you can build up similar name structures as you can with Lightroom. In some ways it's less flexible than Lightroom, but you can use the Load and Save buttons on that window to access a saved name format (similar to a Lightroom Filename Template).

You can use Batch Rename to rename files in-place, or as part of a copy operation. But there is also a Get Photos from Camera function that invokes the Bridge Photo Downloader. This is intended to rename photos as they're being copied from flash cards, but it's severely restricted in the name formats available. You can only choose from a pre-prepared list (even in the "Advanced Dialog") and I would not recommend this as a way of assigning good filenames.

Bridge's Batch Rename is much more useful, especially in light of the most common use for Bridge: browsing existing folders of files. But I generally consider Lightroom's rename function to be more practical. Your mileage may vary of course...

Generating names in Expression Media

Microsoft's Expression Media 2 (before Microsoft bought it the name was iView MediaPro) is a comprehensive program for cataloging and organising files. Not just photo files: everything. Before Lightroom I used iView to manage my photos, and I still use Expression Media to manage my video and audio files.

It has a Batch Rename function with similar capabilities to that of Bridge (although in the example on the right I was unable to strip the IMG_ portion of the original name).

Expression Media also has a function to import directly from a camera card, but its rename-on-import capability is extremely limited. You can either preserve the original names, or you can assign a new fixed string and let Expression Media append a sequence number to it. Not very flexible. Again you may be better off with the internal Batch Rename.

Other programs

I've always used an external program to rename incoming files according to my own format and then use Lightroom and/or Bridge to manage the new files that have been copied to the drive. But you can do useful work with the renaming functions of these applications (especially Lightroom). Third-party programs such as BreezeBrowser Download Pro, Photo Mechanic, Bibble, Image Ingester, and PteroFile have similar functions, but I won't go into them each in turn here.

That'll have to do for today. In the next installment I will be discussing the options for organising your folders.

-- David

Continue reading "Naming your files"...

Sunday, May 24, 2009

Guidelines for image storage

Today when the master copies of our images are usually digital (even slide workers often put a lot of work into digitising their transparencies) it's easier to store and organise thousands of files than it was to do the same with physical prints/negatives/transparencies. But it's also easier to lose everything when you make a mistake (or a disk crashes, or any other disaster happens).

Many of my students (and I'm sure many other professional photographers) start off struggling with achieving reliable storage of their photographs. But just a bit of organisation (which is also needed when your image collection starts growing to many thousands of images) will make things a lot easier. Since the late 1980's I've been involved in designing and operating large computer systems. Although it's been a few years since I did that for a living (I've been working as a photographer since 2002) the things I learnt in that transfer very well to the world of digital photography. The gigabytes and terabytes of data involved in today's photography aren't really much bigger than the datasets we dealt with "back in the day".

I will write more on this in future posts, but wanted to start with this list of points to consider. Some of this is about data backups in general, although some of it is specific to organising photo files.

Backups are essential. Ok this sounds obvious, but too many people just hope that disaster won't happen to them. Eventually it will: all disk drives will fail at some point. All of them.

How paranoid do I have to be with my backups? That's up to you: it's all about mitigating risk. How much is it going to hurt you if you lose your files? A useful equation to keep in mind is:
Risk = probability of failure * cost of failure
If it's only going to take you 5 minutes to re-create some files, then it's only going to hurt you if they get lost frequently. If the files are the result of expensive trips, or opportunities that won't return (photos of an alien landing, or even just of your family) the risk could be regarded as high even if there's a very slight probability of failure.
A good backup system does not have to be incredibly complex to be effective, but it's worth applying a little thought to the risks that your data is exposed to. If you make backup copies to another folder on the source drive, that's not providing a lot of protection. If you make backup copies to another drive that's better, but if it's in the same location and always connected to the computer then it's still at the same risk of corruption, fire, theft, lightning strike, etc. When establishing your own backup system, you'll make your own decisions about how much is enough.

EOS 5DmkII, 180mm macro (A2_020109)
RAID is not backup. It's just increasing the reliability of the storage device. It's still just one copy of the data. If you delete or corrupt a file, that deletion or corruption is stored reliably.

Backups must include more than just your data files. To use your files once they're restored you'll probably have to use specific software (e.g. Photoshop, Lightroom). So you need to make sure you'll have that backed up and restorable also.

Backups that are in "normal" format are usually best. For example a copy of the files onto another drive (be it optical, hard drive, or even tape media). If you need special restore software to recover files, you need to be sure have that backed up too (and have a computer capable of running it: which can be a problem 5 years down the track)!

More than one stage of backup can be good. If your backup operation updates your data to multiple places at once, that can be as bad as RAID at "backing-up" corrupted data. Better systems provide access to "yesterday"'s data separately from "last week"'s data.

Backups need to be operated simply and regularly. Automatic backups can be good, but a manual process can work well: as long as you don't have to think too much about what to do.
Fancy schemes such as those that involve rotating backups manually usually have lots of opportunity for errors, from "I forgot" through to "I didn't think of that".

No backup media will last forever. Whether it's CD-R, DVD-R, magnetic tape, or hard drive, it won't last forever. Some media will fail over time (optical media are examples of this, especially if not stored carefully) whereas some will become obsolete (can you read those Jazz drives you backed up to years ago?). Any good backup system will evolve and will allow you to migrate backup data to new media. Whether by copying files from 7 CD-Rs to a new DVD-R, from 10 DVD-Rs to Blu-Ray, from old hard drive to new hard drive, etc.

Restoration of backups needs to be tested regularly. There's no point making backups regularly if you only find out that they failed when you're trying to recover after a disaster.

EOS 40D, 24-105mm (A2_020132)
Give your drives sensible and unique names. OS X drives are identified by the volume name (unlike in Windows where the drive letter can change when the drive is connected). At least in Windows each drive can have a volume name although it's not so central to accessing the data on the drive.

Make top-level folders on drives to store any data. Don't just scatter data across the "root" folders of the drives. For example you might have a folder on one drive called "Images-A" containing photo files (with appropriate sub-folders of course). If you need to start using a second drive, you could give it a folder called "Images-B". This will simplify later rearrangements when you move data from one drive to another (e.g. consolidating data from multiple smaller drives to fewer larger drives).

Backup drives should have at least as much space as the drives being backed up. That is, be big enough to hold as much as the source can (not just how much data is there now). Otherwise you will find yourself eventually putting data onto your system that isn't fitting onto the backup, and you won't necessarily realise immediately that your system is failing to protect your data.
Sometimes this means that if you've bought a big new drive, that has to be used as a backup drive. Sometimes it means that you need to buy drives in pairs (at least).

You should always have at least 3 copies of your data. Ideally at least one of them should be offline (disconnected, where it can't be affected by accidental deletes/formats/etc) at any one time. Sometimes 2 copies is enough, but you are still at much higher risk than when you have 3 copies.

This includes when copying from your camera's flash cards. Ideally you should only format the cards when you know the data's been copied to at least 2 drives. Sometimes that's awkward, but if you only have your data on one drive you should be aware that you are tempting Professor Murphy.

EOS 5DmkII, 180mm macro (A2_020030)
Files in your storage system should all have unique names.
For example you should be able to later reorganise your files and never have to worry about overwriting a different file that happens to have a duplicate name.
The best time to rename files (to achieve the above uniqueness) is as they're added to your system (e.g. copied from flash cards). Keeping files with names such as DSC_1234.NEF is usually not sensible. A range of options are available, including combining the camera filename with the date/time of the photo. Most photo management software includes functions to automate this.
Some people put their name at the beginning of the filenames so they can give files to clients and have it obvious where the file came from. Some people only give out files that have been exported with simplified names (keeping details like the source ID in the image metadata).

Long filenames don't matter (within limits). When was the last time you used something other than pointing and clicking to open a photo file? Having unique filenames does matter.

Files in your storage system should not be renamed without good reason. Consider having a source file called 200901201245_1234.NEF (depending on your naming scheme, this might be a photo taken at 12:45PM on the 20th of January 2009 and named DSC_1234.NEF by the camera) and you've generated the file 200901201245_1234-edited.psd from it. If you're working on a PSD file in the future and decide you want to find the matching RAW image, if the PSD file was named TwoBirdsOnBeach.psd you wouldn't know what to search for. By keeping the filenames the same you should be able to find the file even if it's on an old backup copy.

Photo filenames within your storage system should not contain details of subject matter. The filenames just need to be unique. Using software such as Aperture, Lightroom, Expression Media, and even Bridge you can easily using metadata fields such as keywords to quickly find relevant images even when your image collection grows to hundreds of thousands of images. You can of course use folders to impose some subject-matter structure on your files if you want (as long as you know the filenames will be unique).

EOS 10D, 28-135mm (A1_2F12)

That's more than enough for now. Give all that some thought, and we will soon discuss some tools to help implement data backups for photographers.

-- David

Continue reading "Guidelines for image storage"...