Over at Khromagery's site I've posted an article on my recommendations for processing infrared RAW files. If you have an IR-modified camera (e.g. one with an internal IR-pass filter) that produces RAW files, check it out!
Monday, June 15, 2009
Yes and No.
Or to put it another way: It Depends!
Another backup bites the dust
Barrientos Island, Antarctica
EOS 5DmkII, 100-400mm (A2_007594)
Today there are a variety of network backup services available, giving you the option to make backups of your data onto network servers via your Internet connection. Probably the biggest name is Amazon's S3, although there are others. S3's good in that it encrypts your data and distributes it across many servers, providing a redundant/reliable service. You're charged a monthly fee based on the amount of data you have in the cloud, and usually also upload and download fees.
Having a copy of your data stored offsite is always sensible (e.g. if your house/office is damaged by fire) and having the ability to access it online can be very convenient. It can be a useful component in your backup scheme, but it's not a panacea. In this article I explore some of the shortcomings.
Having your files available via a web interface from any Internet-connected machine is good in terms of convenience, and possibly bad in terms of security. You'd better be sure that there's not just an easily-stolen password protecting your data...
How much do you want to backup? Or restore?
Today Internet connections are generally faster than in the past, and for many of us it seems less of an issue to up/download megabytes (even gigabytes) of data. For those of us still stuck using dialup it's almost a moot point of course.
As a starting point, which data do you need to backup? If you've followed my earlier advice about separating your data into "sets", you should be able to identify which data to back up where. As photographers we tend to have more (and larger) files than many users (except for some video collections I suppose). But with laptops having 320 GB and even 500 GB internal drives, it's very easy for all of us to build up large collections of files. The decision of whether or not they're important files and need to be protected is up to you of course...
Testing the backups
EOS 5DmkII, 100-400mm (A2_007968)
How long is it going to take to make the initial backup?
Consider for example that I have about a terabyte of files I want to back up. I may have a backup copy on external disks already, but maybe I decide I want another copy "out in the cloud".
My own Internet connection is via an ADSL2+ link that gets approximately 16 Mb/s downstream and 1 Mb/s upstream. So in a perfect situation it's capable of 1.6 MB/s download, and 100 kB/s upload (big 'B' is for Byte, little 'b' is for Bit). For most of us this is not a slow connection. If I manage to keep the 1Mb/s upstream link saturated 24/7, it's going to take me about 115 days to get my terabyte uploaded. Of course, I'll never be able to keep the link's saturated (maybe 80% is a better estimate: closer to 150 days). It's slow and gradual, but it'll get done.
Thereafter for each 2 GB of new (or modified) data I add to the system, it will take at least 5.5 hours to update the network backup. 30GB would take at least 3.5 days. That could be manageable, depending on how much you shoot (and how large your files are). Today the RAW files from my main camera weigh in at 25 MB each, and derived TIFF/PSD files usually start off at 120 MB before adding any Photoshop layers. Space is getting used up faster each year.
What have I created?
Petermann Island, Antarctica
EOS 40D, 100-400mm (A2_013936)
How long to restore?
But what if I need to recover the files from the cloud? With my ADSL connection I can download files a lot faster than I can upload them: downloading that 1 terabyte of data would take me a bit over 7 days (again assuming that I could keep the line saturated and didn't want to use the Internet for anything else).
However... even if I had the most expensive plan my ISP offers (and it is one of the better ISPs around) I would be limited to 140 GB per month. Once I go over that I'm throttled back to 64 kb/s. So in fact it would take me more than 7 months to download all my data! Currently I'm on a 65 GB/month plan (which is more than enough for our normal traffic) so it would take me at least 16 months. My ISP only counts downloaded data, so at least the uploads aren't an issue. Some ISPs count both the uploads and downloads against your monthly allowance (and they tend to max out at 60 GB/month plans).
In some parts of the world there are Internet connections available with no traffic limitations, but the rest of us have to count our gigabytes.
Even if I could download as much as I wanted, it would still take me more than a week to get my 1 TB of data. Compare that with the day or so to get the backup disks that are sitting on the other side of the city and hook them up to my computers.
So for large sets, restoring everything from the cloud is rarely going to be a feasible option. For smaller sets of data yes, and for restoring a few files from a large set.
Avoiding the upload/download problem
Many providers recognise this, and provide services such as Amazon's AWS Import/Export. You ship them a drive with files on it, and they copy that data into the cloud for you. Or you ship them an empty drive and they ship it back to you with your data. Of course, packaging and shipping your drive(s) to/from Amazon in the U.S. securely does have its own complications of cost and time.
Contemplating a new life
Barrientos Island, Antarctica
EOS 5DmkII, 100-400mm (A2_007655)
Cloud backups can work, but you need to have done your research before you start. For example:
- How much will it cost to store your data?
- How long will it take and cost for the initial upload?
- How long will it take and cost to recover your files after a disaster?
In my own environment I've decided to not use network backups. I have large volumes of data to backup, and having established local disk backups (with off-site copies on the other side of the city) I have a system that works.
If a natural disaster strikes that takes out my whole city, I'm guessing I'll have other things to worry about. Luckily we're not on top of a major tectonic plate boundary...
Friday, June 12, 2009
The image galleries on this site (there's a link in the sidebar) have been updated. The gallery format has changed, and a few extra images are also there. If you're still getting galleries with a black background, you may need to purge your browser cache.
Also, if you've installed Cooliris for your browser you'll be able to explore the galleries with that. Read on for a preview...
Here's a preview of the sort of thing Cooliris can do (this needs at least Flash v9):
Try clicking and dragging within that panel...
Meanwhile, the next post in my series on image storage will be along soon!
Saturday, June 6, 2009
In an earlier article I wrote that:
RAID is not backup. It's just increasing the reliability of the storage device. It's still just one copy of the data. If you delete or corrupt a file, that deletion or corruption is stored reliably.
This is very true, but there are more risks than just the above. There are a few mis-conceptions about RAID around, so I'm just going to expand on them a bit...
Ngorogoro Crater, Tanzania
EOS 30D, 100-400mm (F1_53A1)
First of all, let's recap some of the underlying technology:
RAID (Redundant Array of Inexpensive Disks) has been around for many years now. It usually refers to a group of disk drives that are treated by the computer as a single device for putting files onto, and behind the scenes the data is copied to multiple drives so that if one fails the computer can keep on running. The simplest form is RAID-1 (or "mirroring") with drives in pairs. Fancier forms such as RAID-5 distribute the data across more drives (typically 3-5) and cope with the failure of any one of them.
"Software RAID" is where the disks are connected as normal devices to the computer, and the RAID function is provided by software. For instance OS X has this support as standard.
"Hardware RAID" is a device that connects to the computer and usually appears as a single device, but internally connects to multiple disks.
More-advanced models of hardware RAID use multiple connections to the computer (e.g. Firewire or FibreChannel connections to separate controllers in the computer), use dual redundant power supplies, etc. These try to improve the reliability further by coping with more risks than just the possible failure of a disk drive, but the costs of these extra efforts are not insignificant, and are usually reserved for "enterprise-level" systems needing 99.99% uptime.
RAID is usually single-disk protection
When a disk in a RAID volume fails, it needs to be replaced by another, and the RAID controller will then rebuild the contents of that disk from the contents of the others. Once the new disk is synchronised, protection is restored.
If you've got a shiny new Drobo unit with multiple drives humming away and you decide to demonstrate to a friend or your boss how robust it is by pulling out a drive, you need to be aware that if any of the remaining disks choose that time to have a problem, the whole shebang will come to a crashing halt. Once you replace the drive the system will begin to resynchronise, and it's only when resynchronisation is complete that you are safe again. Smart RAID controllers may be able to optimise this with journals and not have to re-write the entire drive, but some (e.g. OS X's software RAID) can take hours to rebuild the disk. With software RAID the reason for the rebuild can be as annoying as not having all the USB drives powered up when you rebooted...
Upgrading disks in a RAID volume usually involves "failing" a disk (by removing it) and replacing it with a bigger disk. Once the new disk has been re-populated, the fancy RAID systems such as the ReadyNAS X-RAID and Drobo's BeyondRAID use this as the key to increasing the capacity of the device: once the device has been rebuilt, the extra space is added to the pool.
Few among Many
Serengeti Plain, Tanzania
EOS 30D, 100-400mm (F1_4E8A)
But until the disk is rebuilt your data is at risk: if any of the remaining disks hiccup then all your data will be lost. Most of these devices can only cope with a single disk failure. In rough terms, a ReadyNAS NV+ or a Drobo with four 1TB drives will provide you with almost 3TB of data space and the ability to cope with a single disk failure.
The DroboPro can handle up to eight drives, and with 1TB drives would provide you with 7TB of data space. However it does have an optional configuration where it only provides you with 6TB of space (assuming the same set of eight 1TB drives) but is able to handle the failure of any two drives. It's nice, but you do need to sacrifice a chunk of your capacity to provide the extra safety.
So put simply: upgrading disks in a RAID system is a risky exercise. You won't have any protection until it's complete (except in the above DroboPro configuration). Too many people assume that once they've put their data onto a RAID drive such as a Drobo then they're safe. Nothing's ever that simple.
All this isn't bad news though. It's not at all a reason to avoid using RAID. If you're using RAID it should be as part of your whole data management system (e.g. to protect the primary members of your data sets). You must also have regularly-updated backups of your data (possibly on another RAID device) and if you've updated a backup prior to doing a disk upgrade you shouldn't lose anything if the upgrade of your RAID fails.
In my own environment I use a system of multi-stage backups, with a primary copy of my data on local Firewire-connected disks. I'm starting to add Firewire-connected Drobo units for improved speed and robustness, but I will continue to keep backup copies of all the data on external drives that get rotated off-site. Currently these backups are on a mixture of external USB drives and "bare" SATA drives which connect into a USB/SATA "dock" on my desk.
We do have a ReadyNAS NV+ connected to the gigabit Ethernet LAN (configured with "jumbo frames" on all the machines for maximum speed). It currently has only 900GB of RAID (four 320GB drives) although I am considering upgrading the disks soon. The file-transfer speed of this over the LAN is as good as most Firewire-connected drives, but due to the inherent restrictions of network filesystems I don't currently store photo files on it. Lightroom catalogs can't be locked properly on network filesystems: Lightroom insists that they be on local storage. Also things like folder searches are always slower over a LAN than to a local disk. At one stage most of my photo files were on the NAS box protected by RAID, but even though the file read/write speed is impressive, once I moved the photo sets to Firewire disks, Lightroom and Bridge "felt" a lot faster because there was so much directory-lookup activity. Waiting for the machine to catch up with you is VERY frustrating...
Ngorogoro Crater, Tanzania
EOS 30D, 100-400mm (F1_53A6)
The NAS machine's major advantage is that it's always accessible to all the machines on the network: for example rebooting the image workstation doesn't affect other machines. As well as hosting shared "normal" files for the network machines, it also provides networked Time Machine storage for a wireless-connected Mac Mini.
Why use RAID?
RAID can be very useful, but it's up to you to decide if the risks are outweighed by the costs of your system.
If you have your primary storage on "normal" (non-RAID) drives and a drive fails, your machine will grind to a halt and you'll have to restore from your last backup onto a replacement drive. If your backups are frequent enough you shouldn't lose much work, although the "down-time" while you restore the system will affect your productivity. For some people this is enough protection.
If you have your primary storage on RAID units and a drive fails, your machine will keep running and you should be able to replace the drive and continue with no interruption. But there is still a risk that the system will die before the drive is replaced and resynchronised, so you'd better still have access to frequent complete data backups.
Even if just because today's "failure" was just that you deleted the wrong file!
Thursday, June 4, 2009
If you've followed along with the recommendations of earlier articles in this thread, you will have identified "sets" containing your data files. Grouping them together makes it easier to manage the files on your system, and by extension makes it easier to manage your backups.
I'll get back to sets further below, but first a look at backups in general (hopefully you've already read the start of this series of posts).
"Imaging" your hard drive
Stonington Sunset, Antarctica
EOS 5DmkII, 24-105mm/4 (A2_010459)
The simplest form of backup is to get a second drive big enough to store all the data on your main drive, and make a copy. Software such as SuperDuper! and Carbon Copy Cloner can make bootable copies of OS X drives, even onto drives of different sizes.
Microsoft Windows backups are a bit more complex, and most users have never managed to do it. Acronis True Image and Symantec Ghost purport to be able to make backups of system drives. My own experience with Ghost on Windows XP in 2008 resulted in complete failure, but I hear good things about Acronis' product.
Even if there's no special Windows software available, it's possible to use a Linux or BSD bootable CD (e.g. Knoppix) and use that operating system to make a block-level copy of any drive on your system.
Of course any operation like this requires some "down time" to have the system idle (or shut down) while the copy is being made. The hassle involved in making a backup this way usually means that they happen infrequently, so it's not a perfect solution. But if your boot drive fails it can be nice to be able to boot off the backup drive and know you have a complete working system.
If you're using OS X 10.5 Leopard you may already be using Time Machine to keep your files backed up. Time Machine is a nice implementation of backups for most "normal" files: it's usually configured to backup to a dedicated USB or Firewire disk directly attached to the computer. Every hour it makes copies of any new/modified files, and preserves "snapshots" of the system for each backup. It keeps the last day's worth of hourly backups, the first backup of each day for the last month, and as many weekly backups as will fit on the backup disk. Note that it expects to be able to fill up the backup volume. You can put other files onto the volume, but any available space will eventually be used up by Time Machine.
If you use a GUID partition table and copy the contents of the OS X install DVD onto the backup partition (using Disk Utility's Restore function) the disk can be used as a standalone boot/restore disk to completely reinstall your machine to the state of the last TM backup. I have used this successfully on several machines to recover from a system drive crash (even onto a complete replacement machine - not so easy to do with a Microsoft operating system).
As well as complete restores, Time Machine has a flashy interface that lets you explore back through time (thus the name) and recover specific old versions of any file or folder it backed up.
However, Time Machine doesn't cope well with all files. When it decides that any part of a file has been updated, it simply makes a new copy of the entire file. If the file is a large Lightroom .lrcat (my largest is over 1.2GB) or a virtual disk image for VMware or Parallels (my Parallels disk image is 30GB) TM would spend a long time copying the file. Not only would a new copy of the file every hour quickly fill up your backup disk (causing the automatic deletion of older snapshots) but it's likely that the file was still being modified while it was being copied (resulting in a useless corrupted backup).
You can turn Time Machine backups off via System Preferences while using those large files (of course, it's easy to forget to turn the backups on again!) or you can configure Time Machine to always ignore specific folders. If you've consolidated your catalogs into one folder (see the earlier introduction of "sets") they make perfect folders to get Time Machine to ignore. Of course you need to be sure you're making backups of those files some other way! Note that by default Time Machine only backs up internal drives (not removable USB/Firewire drives).
Time Machine does a good job of making backups, but it only backs up to one drive. Not only does that drive need to be larger than the data on your internal drives (especially if you want to have some history available), but if that drive fails you will lose your protection (as well as any history of file changes). The way I cope with this risk is to make weekly DMG images of the Time Machine disk onto one of my backup drives (using Disk Utility). The DMG can't easily be used to restore from directly, but after a disaster I can use another machine to extract the DMG contents onto another drive.
Incidentally, the internals of Time Machine may get an overhaul in OS X 10.6 with the introduction of the ZFS filesystem. I guess we'll find out soon.
If you're using Adobe Lightroom you will have seen the regular request to "backup" your catalog. Hopefully you've realised that this just makes a copy of the catalog file (after doing some integrity checks of course) and doesn't actually do anything about backing up your image files. The default place it puts the backup files is a sub-folder of the catalog folder, but it's easy to set it to use a different physical disk if that suits your configuration.
Backing up the catalog is important, even if you have it set to write metadata out as XMP to the image files. As well as the image metadata and CameraRaw edits, the catalog contains lots of data that doesn't get saved to XMP: virtual copies, collections, flags, stacks, develop history, and any plugin-specific metadata. That's typically data you don't want to throw away.
Lounging Leopard, Antarctica
EOS 5DmkII, 24-105mm/4 (A2_015729)
Lightroom also keeps a Previews.lrdata folder/package for each catalog, containing the preview images and thumbnails. This speeds up Lightroom as well as allowing you to see the images even if the drive containing the actual image files is not mounted at the time. This Previews database is not critical to backup, as the previews will be regenerated as required (as long as you have the catalog and the image files).
When Lightroom backs up a catalog it doesn't bother making a backup copy of the Previews database: it can be huge (at last glance the Preview database for my main catalog is 45 GB!) it's made up of lots of small files (so a copy would be very slow) and it's rebuildable.
Mind you, sometimes you may wish to make a backup of the Previews database: that 45 GB database could take days to regenerate fully.
So, you've got Time Machine (or some other software) backing up your boot drive and all your normal files including applications, email, etc. But you've told it to ignore the set containing your Lightroom catalogs. And possibly sets containing your photo files. For small collections you might let Time Machine take care of your RAW files (they tend not to change, unless they're in DNG format and get the XMP re-written) but as your collection grows you'll probably find that Time Machine's single backup disk doesn't suit any more.
How can you make backups of these "sets"? The simplest (to comprehend) is just to make a copy via Finder/Explorer onto external drives which you can then disconnect from the system. If you need to access the copied files then you can just open them as normal files on whatever computer you connect the backup drive to. There are ways we can optimise the copy function, but before that it will help to introduce some more terminology:
So far where we've talked about a "set" of files, it's really just the "primary member" of that set. Any backup copies of the primary can be referred to as a "secondary" member of the set.
If you have the primaries of an Images1 set and a Catalogs set on your computer's internal hard drive, and the primary of an Images2 set on an external drive, you might decide to store secondary copies of all three sets on an external 1TB or 2TB drive. Or you could split the secondaries across multiple drives.
If you can set up a backup system where updating these secondary copies is quick and easy, it's easily extended so you can maintain multiple secondary copies. In my own system I have three secondaries for each set: one on a permanently-connected drive and updated daily, and two on removable drives (one updated weekly and stored on-site, and swapped monthly with a group of drives stored off-site).
This whole concept is fundamentally "low-tech". We make a copy of the files onto another disk. We don't automatically have a history of versions available (other than having extra secondaries that haven't been updated for a while). But it works, and can be used on systems ranging from a few megabytes to many terabytes.
Where the magic starts to come in is in automating the update process, so it's time to talk about file synchronisation. Rather than simply dragging/dropping a folder from one disk to another (and thus re-copying all the files) it's much easier if we can run a synchronisation program to only copy new or updated files (and possibly delete removed files) without wasting time on the unchanged files. Luckily these programs are available.
Microsoft offers the free SyncToy to do file synchronisation. 2BrightSparks offers the free SyncBack. Commercial offerings for OS X include ChronoSync. Some of these allow you to set up presets of a source/destination folder pair, although extending this to groups of secondaries that might not all be connected at once is sometimes not straightforward. Built in to the base OS X is rsync (also available for other Unix-style OSes and for Windows) although to operate it you really need to set up command-line scripts.
My own PteroFile program for OS X uses rsync to do the underlying file synchronisation, and builds on top of it with a configuration database and many photographer-specific workflow opimisations (including Expression Media and Lightroom plug-ins). PteroFile has been in heavy use by a small number of testers since mid-2008, and we're going to make it available to a wider audience of Mac users in June 2009. In the current version the user interface is a bit crude (needing the use of the Terminal.app commandline to configure it, but with several GUI interfaces for those people that prefer those for things like initiating synchronisation).
I won't go on about PteroFile in this post, but you'll hear more about it soon!
Shooting the Midnight Sun, Antarctica
EOS 5DmkII, 24-105mm/4 (A2_010513)
My incentive for writing this series of articles was to introduce the underlying issues of backups and the concept of sets and members as a prelude to the release of PteroFile, but I'm sure the concepts will also be useful to many other people who might not be able to use PteroFile itself.
There's still lots more to talk about, including issues of backup media (hard disks, DVDs, "cloud storage", etc) backup verification (both that it can be restored and that the restored images are "correct") and more. So keep checking back here.
Monday, June 1, 2009
Another post for the photographers amongst you who are curious about my processes. This time it's about how I use GPS data to geo-tag my photos with exact GPS coordinates.
Choosing my first book
20° 36'41" N, 102° 1'58" E
EOS 40D, 50mm/1.8 (A2_003826)
20° 36'41" N, 102° 1'58" E
EOS 40D, 50mm/1.8 (A2_003826)
By attaching location information to your photos, you add lots of flexibility in how you can use the photos. Not only do software such as Adobe Lightroom, Apple's Preview, iPhoto, etc have functions to show you a map of the photo's location, but you can also do things like search for images within a specified radius of a given point. When exploring Google Earth you've probably noticed the little clickable icons that show you pictures of the location. With Google Earth and KML files you can even put together a log of a trip for yourself complete with a trail on the map and clickable photos at each location.
All of this starts with geotagging the photos.
Incidentally, there are at least two locations in every photo: the photographer's and the subject's. While work is being done on ways to store multiple locations in image metadata, today we can just store one and usually just embed the photographer's location.
Attaching a GPS to your camera
Some cameras allow you to connect a GPS to them (usually mounted on the hot-shoe and connected via a short cable) and record the location information directly into the photo as it's taken. Nikon sell the tiny GP-1 GPS for this, and models from other manufacturers are also available. A few compact cameras have internal GPS units (even the iPhone 3G can geotag its photos). While this can be convenient, in real-world use it can have some issues:
- If you're using more than one camera you'll need more than one GPS. If you're thinking that using more than one camera sounds extravagant, maybe it's just because you're still using your first camera. Once you have more than one (even a shiny new model and an older model) you may find that it's very useful to be able to work with two bodies. For example in Africa and in Antarctica when I've been working from vehicles, having a wide-angle lens mounted on one body and a telephoto on another has been amazingly useful. But only having some of my photos geotagged would be annoying.
- When you turn a GPS on it will take a while to lock onto enough satellites to get a position fix. If it has had a fix recently this can be very quick, but depending on the local geography, current satellite orbits, as well as the sensitivity of the GPS the process can take minutes.
If you turn your camera on to take some photos immediately, if the GPS has been off you're likely to end up with no GPS data on your photos.
Another option is to carry a GPS unit with you that's continually on, recording logs of where you've been. The GPS gets accurate time from the satellites, and each position in the log has a timestamp. As long as you set the right time in your camera before you started shooting, software on your computer later can work out from the log where you were when the photo was taken. This is the function provided by geotagging software.
A wide variety of GPS units are on the market and are suitable for this application. Some simply log data, some are fully-fledged navigation devices.
I have been recording GPS logs of my travels since the early 2000s. My first unit was a Garmin GPS II, which only had enough memory to record about 2 hours of highway travel. Later I used a Garmin eTrex Legend which had better sensitivity and enough memory to cope with a day's work. Today I use a Garmin eTrex Legend HCx which I am very happy with.
It's small enough to slip into a pocket, and the sensitivity is impressive: inside a car or under a freeway it usually has no problem getting a lock (although dense structures and wet foliage do cause some issues). I have clip-on mounts for it in my car and on my bicycle, and when walking I have it in a top pocket (or sometimes the top compartment in a camera bag). A pair of AA batteries lasts all day long (finding it ran out half-way through an outing would be ... disappointing). It has a microSD card inside which I've loaded up with detailed topo maps of Australia and of any international location I'm heading to. With the right maps it will even give turn-by-turn routing instructions, but I don't use it for that. I have it set to continually record my locations to the microSD card, and it has enough room for months of travel. It creates a GPX file for each day's data (GPX is an XML data format that's understood by probably 99% of GPS/mapping software) which I can easily copy to the computer over USB.
Celestial Pole over Castle Rock
24° 52'12" S, 133° 49'52" E
EOS 20D, 17-40mm/4 (F1_2EFC)
24° 52'12" S, 133° 49'52" E
EOS 20D, 17-40mm/4 (F1_2EFC)
I do use the GPS for general navigation tasks, in-car and on-foot. I've built up a sizeable database of interesting locations in the GPS (including points recorded with my earlier units). If I'm driving somewhere and see an interesting location but don't have time to stop I will usually at least record a specific "waypoint". Several times on trips I've noticed on the unit's map that we're approaching a waypoint from an earlier trip. I haven't always remembered what it was about until I've arrived, but it's usually turned out to be worth a stop.
On-foot navigation includes finding where my tripod is in the dark when shooting star trails. The camera's been operating for hours, and you're out of your tent and wandering off to find it before dawn and dew ruin the photos. Being able to find your tripod in the dark without accidentally shining a torch towards the lens is very useful!
Geotagging your photos
Once you've got your photos and your GPS log, it's time to correlate them. Most tagging software allows you to do this and embed the locations into the photo's metadata so that when you import the photos to your favourite management software the data is available. Some software (e.g. Image Ingester, Downloader Pro) do the tagging for you while the photos are being copied from the flash card.
But this scenario rarely makes sense for me. I record GPS data, I shoot photos, I import the photos to the computer and start managing them with Lightroom immediately, and at some stage I will download the GPS data onto the computer and want to attach location info to the photos. Having to hold off on using Lightroom until I've got the GPS data in the right place just wouldn't work for me.
For instance after my last Antarctic voyage I had logs from my eTrex that covered most of my onshore excursions and much of the ship's movements, but there were large gaps where I had no GPS data (sometimes because I'd been in a location where the steel structure of the ship blocked the GPS signals). Through the 4 weeks of my travels I used Lightroom most days to manage the thousands of photos I was taking. After each burst of shooting (a Zodiac excursion, a whale sighting, passage through the Lemaire Channel, etc) I usually cycled full cards through the card reader in my cabin onto duplicate hard drives so I could go back out and be confident I wasn't going to run out of space.
Above and Below
66° 1'3" S, 65° 24'9" W
EOS 40D, 100-400mm (A2_012074)
66° 1'3" S, 65° 24'9" W
EOS 40D, 100-400mm (A2_012074)
On my busiest day in Antarctica I took around 2900 photos, and although I was using a 16 GB CF card, an 8 GB, and an assortment of 2 GB and 4 GB cards (spread across 4 cameras: a 5DmkII, a 40D, a G9, and an IR body) I wouldn't have had enough room on the cards for that whole day's work. There definitely wasn't opportunity to download GPS data before importing photos. I needed to be able to use Lightroom to review and work on the day's photos whenever I had a chance: geotagging photos on the day was a complication I would not have enjoyed. Every couple of days or week, sure.
Incidentally, once I got home I received a copy of the ship's own GPS log, which I was then able to use to fill in the gaps in my own GPS data.
Geotagging within Lightroom
So for me it's normal procedure to geo-tag my photos once they're "inside" Lightroom, and for this I use Jeffrey Friedl's GPS Support Lightroom plug-in. Inside Lightroom you just select the images you want to geotag, invoke the plug-in and point it at a GPX file. Because of the current structure of Lightroom's plug-in API there's a multi-step process required if you then want to embed the location data back to the underlying files, but the plugin explains everything. If the clock in your camera was slightly off (the most common cause for this when travelling is forgetting to update devices when you change timezone) you can use Lightroom's Edit Capture Time function to fix this before using the plug-in to set (or re-set) locations.
Not only can it geotag photos based on a GPS log: you can also enter locations manually or via Google Earth, or even copy from another photo (e.g. a photo taken at the same location with your iPhone 3G).
Lightroom has a built-in function to call up Google or Yahoo Maps in your web browser looking at the current photo's location, but Jeffrey's plug-in also allows you to pass it over to local programs such as Google Earth.
When I came back from Antarctica I already had around 12,000 photos remaining in a Lightroom catalog, had done basic editing on many of them and applied many of the appropriate keywords (it's amazing how much you can get done waiting for hours in airport lounges with a laptop). Once I geotagged the photos it was a simple task to use Google Earth along with British Antarctic Survey maps (and tourist maps of our other destinations including Easter Island) to accurately identify each location and attach appropriate text metadata. And if I'm ever wondering exactly where a particular photo was taken I just have to click a button and the wonder that is Google Earth will remind me.
If you're using Adobe Lightroom and you're interested in geotagging, have a look at Jeffrey's plug-in!