We’ve heard the horror stories about people using AI Artificial Intelligence with disastrous results. (Like the lawyer who used ChatGPT to write a brief, only to have it create fake legal citations.) So it might surprise you that you can use AI and things like ChatGPT for your genealogy research successfully and accurately. Let me show you how.
Click the play button to watch the video below or keep scrolling to read the post.
What is Artificial Intelligence?
Artificial intelligence (or AI) is the overarching field. As IBM defines it, “artificial intelligence is a field which combines computer science and robust datasets to enable problem solving.
Chances are you have already used AI in your genealogy. For example, when you start typing a search into Google, the list of possible searches that you get is generated by a form of AI.
FamilySearch also uses AI. One example is when you’re looking at a profile in the Family Search family tree and it has something in the “Research Help” section. Family Search has analyzed the profiles in the family tree and determined that couples who were living in this time period in this location typically didn’t have children further apart than three years. So when it sees on this profile that there are children who are spaced further apart than three years, it’s suggests, hey, there might be another child in the middle.
Ancestry also uses AI in a variety of ways. You might think immediately of Ancestry’s hints, and you wouldn’t be wrong with that. But they also use AI for things like the Newspaper.com Obituary Index. Ancestry doesn’t have a team of people going through individual newspapers figuring out which articles are obituaries. Instead, Ancestry’s AI is looking at individual articles and looking at the language. If an article has a lot of words like died, buried, cemetery, survived by, chances are good that it’s an obituary.
What is ChatGPT?
This analysis of language is more like the AI that has a lot of people riled up right now, which is with tools like ChatGPT. So what is ChatGPT? “Chat” refers to how you interact with it: You type a prompt, it types something back. It’s very much a text based chat. GPT stands for Generative Pre-trained Transformer. Though that sounds really technical, it really describes what it is.
To use ChatGPT effectively for your genealogy research, you really have to understand what it is and what it is not. ChatGPT is not a search engine. It’s also not a fact checker. ChatGPT and other tools like it, like Bard or Bing Chat, are built on what’s called a large language model. Basically, a large language model takes a huge data set (ChatGPT used billions of publicly available web pages) and analyzes it to see what the patterns are of language within certain contexts.
ChatGPT takes what’s in the prompt and compares it against the training set. It then gives a reply in words that it thinks has the highest probability of fitting the pattern in that context. This technology is not new. If you’ve ever gone on to a business’s website, and you ask in their online chat, “Are you going to be open next Monday?” and immediately it gives you that answer of, “Here are our store hours” — that’s an example of a large language model. It takes your prompt, analyzes it, and finds what most closely matches what would be an expected response.
What’s revolutionary about ChatGPT, Bard, and other similar tools, is that for the first time, people who aren’t programmers have access to this technology. You don’t have to program anything in ChatGPT; you work with it in natural language.
The Biggest Mistake in Using ChatGPT
By far the biggest mistake that I see people making with ChatGPT is treating it like Google. When you create an account on ChatGPT and log in, you’ll see a box where you can enter your prompt, and it looks a lot like a Google search bar. I suspect that that’s where that lawyer got into trouble. I suspect that he entered a prompt something like, “Write a legal brief about this particular topic,” and he expected ChatGPT to go scour the web, find all of the current facts, and synthesize them into a coherent and accurate legal brief that he could then turn into the court.
But that isn’t how ChatGPT is designed to work. That lawyer asked for a legal brief, which set the context for ChatGPT. It looked at its training set and saw that legal briefs have these things called citations, which usually have name versus name, a bunch of numbers, and a year. So that’s what it gave him. That was the type of language that was expected. Again, we’re talking language, not fact checking.
How to Use ChatGPT for Genealogy Accurately
But genealogy is all about being accurate. So how can we use ChatGPT and similar tools, and still be accurate with what we’re getting? For genealogists, we like facts, and we want things to be accurate. So sometimes what some of us do is we will test the new thing and give it a name that we already know is in that data set — or in this case, ask it a question that we think it should know the answer to. But if I enter a question like, “When did Ohio birth records start,” ChatGPT is going to give me a response that in terms of language, makes sense. In terms of fact, not quite. Below is ChatGPT’s response. I highlighted in yellow the text that is incorrect.
Here’s a vital thing to know about ChatGPT prompts:
If the prompt that you’re using is something that you would otherwise have typed into Google, it’s not a good prompt. Use Google for those kinds of things. That’s what Google is designed for. Just like you wouldn’t open up PowerPoint to send an email, don’t use ChatGPT for something that you otherwise would have used Google for.
So what does make a good ChatGPT prompt? It’s going to be things that are based on language, concepts, or transforming things. I really like ChatGPT for idea generation. I asked it recently to compile a list of 10 activities for a family reunion. I was intrigued by the second item on the list, “Family Olympics.” So I continued the chat and asked ChatGPT to give more specific examples for #2 Family Olympics from the previous list.
Getting Accuracy in ChatGPT Results
We have to address ChatGPT making up facts. One of my great-great-grandfathers was John Peter Kingery. He was a pretty average, obscure individual. There aren’t going to be massive amounts of references to John Peter Kingery in ChatGPT’s training set.
If I prompt ChatGPT, “write a biography of John Peter Kingery,” ChatGPT has no context. It doesn’t know when or where he lived or anything else about his life. All ChatGPT knows is that I asked it to write a biography of a person with the name John Peter Kingery. And that’s exactly what ChatGPT did:
There’s nothing in this biography that’s correct. But I wouldn’t expect it to be. I gave ChatGPT absolutely no context to work with. Even if my ancestor was somebody famous, somebody that ChatGPT would have somewhere in the training set, it doesn’t have a good way of differentiating between people who have the same name.
You’re setting up ChatGPT to fail when you give it a prompt with absolutely no context like this.
I didn’t want to leave it at that. So I put together a short little document that had the basics of John Peter Kingery’s life. And I also added a couple of extra facts, including that he served in the 173rd Ohio Infantry, and that he’s buried at Kingery Cemetery. Now look what happens when I give ChatGPT that prompt of “using these facts, write a biography of John Peter Kingery.” Honestly, it took me longer to type up the facts to put into ChatGPT than it took for the biography to be created. Below is part of the biography that ChatGPT wrote with my second prompt.
Is this a perfect biography? No. There’s some editorializing and a little bit of embellishment that I’m not quite comfortable with, but this makes a fantastic first draft. I can now take this biography out of ChatGPT and I can edit it myself. But I know that the facts that are in this biography are correct, because I told ChatGPT to include them. I didn’t leave it up to ChatGPT to just go make up stuff. When you want ChatGPT to create something like this, the more specific you can be and the more details you can give it, the better the response is going to be.
Other Uses for ChatGPT in Genealogy
Writing biographies is not the only way the Chat GPT can help us in our genealogy. Remember that T stands for transformer. I recently found a newspaper article about a mine explosion on a website called Chronicling America. One of the things that you can do on that website is copy the OCR (the optical character recognition). So I copied the OCR from this article and opened up ChatGPT. The prompt I gave it was “take the following text and create a table with the person’s name if he was killed or injured occupation, family, relationships and residence,” and pasted in the text of the article.
And look what ChatGPT gave me. It gave me a table with the data extracted into those columns. Obviously, I would want to compare this with the actual article. But if you’re working with a lot of data like this, this is a huge time saver.
When it comes to tools like ChatGPT, Bard, or Bing Chat, we are just scratching the surface of how they can help us in our genealogy. How do you want to use these tools? Let me know in the comments.