With scanning technology being easier and cheaper than ever before, why aren’t more genealogy materials online? The answer isn’t, “Just scan it.” Here’s a look at everything involved in making genealogy records available online.
Recently, FamilySearch announced that they have completed digitizing their collection of 2.4 million rolls of microfilm. But… they aren’t all online yet. And when you think about the archives, libraries, government agencies, and other organizations that have records—why aren’t more of them online?
Let’s walk through a hypothetical (yet realistic) scenario. Let’s pretend you’re an archivist and there’s this really cool collection of thousands of letters that you just know would be useful to researchers. Wouldn’t it be great to digitize it and get it online? Let’s see what it would take.
For an archive or library to make their materials available online, they first need to make sure they have permission to do so. Sometimes that’s straightforward, like when it’s something in the public domain. But if it isn’t in the public domain and it’s something that was donated to them—such as original materials or manuscripts—they need to make sure they have the permission of the donor. Donor agreements sometimes restrict what that library or archive can do with the material. Also, if it isn’t spelled out that the repository does have rights to digitize and distribute, they might need to re-negotiate an agreement allowing them to do so.
If it’s a government agency, sometimes the records are restricted by law. The records might be from a time period that is still in “embargo” (such as states that restrict death records for 50 years). The record type itself might be restricted; this is often seen with state hospital and state asylum records.
In your (hypothetical) archive, you first need to find the donor agreement. When you find that it doesn’t have the necessary permission, you have to try reaching out to the donor, which isn’t always an easy or fast thing to do. But for this example, we’ll say that it only took a month to reach the donor and get his permission.
As you’ll see, there’s a fair amount of money involved in digitizing. Most archives have razor-thin budgets, and extra projects simply don’t make the cut.
You opt to go for grant funding for the necessary equipment and additional staffing. Fast-forward several months, and you’re notified you got the grant. (You aren’t always so fortunate. It isn’t unusual to go through several rounds of applications to various foundations before securing funding.)
Setting Some Ground Rules
Are you going to scan the front and back of all pages, or just the ones with writing? Are you going to scan the envelope? (I hope the answer to that one is, “Yes.”) How are you going to handle oversize pages that either won’t fit on the scanner or in the field of the camera without being really tiny?
What format will they be scanned into? What resolution? How will you keep together the files of the letters that are multiple pages? File naming conventions?
Arranging and Preparing the Material
Before you set up your scanner or digital camera, there is work to do. Those letters need to be opened and the papers unfolded, unstapled, and un-paperclipped. This takes time. (So. Much. Time.) And there needs to be a way to keep things in order so papers don’t get mixed up in the process.
Getting the Necessary Equipment
There needs to be a scanner or digital camera with the necessary accessories, such as batteries, lighting, camera stand, etc.
And people. You can’t digitize without the people do it.
Honestly, this is usually the easiest part of the whole process, but it still takes time. Scanning a book can be fairly fast, but if you’re working with unbound material (like your hypothetical letter collection), you’re going to go a lot slower. Even if you can do a new image every 5 seconds — which would be lightning speed for some unbound materials, it adds up. Let’s say it’s 5 second per page. Multiply that by a letter that’s 6 pages long plus an envelope… and there are 2000 letters to do. That’s 1167 minutes or almost 20 hours. That’s presuming nothing slows you down. (Head’s up: There’s always something that will slow you down.)
Those images don’t do anyone good if nobody knows what they are. That requires someone to set up metadata—information about something that makes it more usable. At a minimum, you need some kind of title for this group of images, but there usually needs to be a more robust description.
There’s also something called “structural metadata,” which shows how the images relate to one another. This includes things like the sequence of the images, so that page 5 comes after page 4 but before page 6. It can also be assigning “waypoints.” Essentially, this allows users to see where sections of the work are, much like a book’s table of contents does. This also takes time and someone to do it. (See a pattern here?)
If these images are going to be online, they need to be hosted somewhere. (This isn’t the same thing as the archive’s hard drive where they are stored. By the way, that’s more equipment that’s needed.) There also needs to be some sort of website. There are frameworks called “content management systems” that help libraries and archives manage this, but they’re often too expensive for small organizations to use. Even the organizations that do have a CMS still need to pay for the service and have people to work on the technical aspect. (I hope you figured that into your grant proposal.)
Storage and Backups
If you’ve ever had a hard drive fail, you’ll understand the need for a good system of backups. With computers, it isn’t a matter of if they will fail; it’s a matter of when. You’re going to need a good backup system (which includes a clear way to recover data), as well as a plan to migrate data to new media and/or formats when necessary.
You might be able to get some of the work done with the staff you already have at your (hypothetical) archive, but this is a big project. You were smart to include in your grant proposal some funding for a part-time employee or intern to work with the project. Even if you had opted to use all volunteers for this, volunteer labor is not free. It still requires time to train, supervise, and manage.
These Issues Affect Everyone
Whether it’s your (hypothetical) small archive or one of the major players in the genealogy space, these issues affect everyone who is trying to get things digitized and bring them online. True, the big players don’t need to apply for grant funding, but they still have issues of time constraints, staffing, and technology. Even for them, there are limits of people, time, and money. It’s tougher for smaller organizations, because they don’t have the economy of scale that the larger ones do.
So when you get frustrated (like I do sometimes) when that record you need isn’t online, remember that there’s more to bringing records online than just scanning.