Biographical and burial records scraped via FindAGrave's GraphQL API. This preview analyzes a single file (memorial IDs 0–999,999) from an ongoing scrape of ~290 million memorial IDs.
FindAGrave is the world's largest online cemetery database, with over 290 million memorial records contributed by volunteers. Each record includes name, birth/death dates, cemetery, and — for about 30–40% of records — a volunteer-written biographical text ranging from a one-line description to a multi-page life story.
We discovered that FindAGrave exposes an unauthenticated GraphQL API that supports batch queries of up to 10,000 memorial IDs per request. This is ~250x faster than HTML page scraping, enabling a full database scrape in approximately 30–35 hours.
This page analyzes a single output file (IDs 0–999,999) to characterize the data. The biographies are the most valuable field — they contain structured information about occupations, family members, military service, and life events that can be extracted with LLMs.
| API | GraphQL (unauthenticated) |
| This File | IDs 0–999,999 |
| Records | -- |
| With Bios | -- |
| Files Complete | -- |
| Total IDs | ~290 million |
| Status | In Progress |
How complete is each field? Core identification fields (name, dates) are near-universal. Biographies, inscriptions, and military data are present for subsets of records.
Birth and death decades reveal the historical span of the collection.
Decade of birth for records with a birth year.
Decade of death. The 2000s–2020s spike reflects recent memorial creation.
When these memorial records were first added to FindAGrave by volunteers.
Volunteer-written biographical texts are the most valuable field for research. They contain information about occupations, family relationships, military service, and life events.
Most "biographies" are just a few words (e.g., an occupation label). The table below shows how many records have substantive biographical text at various word-count thresholds.
| Threshold | Records | % of Total | Median Words |
|---|
Distribution of biography lengths.
Records with military service data.
Examples of biographical texts from "famous" memorials in this ID range.
Examples of biographical texts from ordinary memorials.
FindAGrave's GraphQL API returns 25 fields per memorial but does NOT include structured family relationship data. However, Ancestry.com hosts a parallel index (Collection 60525) that includes Father, Mother, Spouse, and Children as structured fields.
For a targeted set of individuals (e.g., 10,000 people found in our data), we can query Ancestry's record pages to pull structured family fields that aren't available through FindAGrave's API:
No Ancestry subscription is required — collection 60525 is free.
Each Ancestry record page takes ~0.56 seconds to fetch. With a 0.5-second politeness delay between requests:
| 100 people | ~2 minutes |
| 1,000 people | ~18 minutes |
| 10,000 people | ~3 hours |
| 100,000 people | ~30 hours |
Scraping all 146M Ancestry records would take ~4.6 years — impractical for bulk, but very feasible for targeted enrichment.
Each record in the CSV contains 25 fields. The bio and inscription fields contain HTML text stripped to plain text during conversion.
| Field | Type | Example |
|---|---|---|
| memorial_id | Integer | 1 |
| first_name, middle_name, last_name | String | Cleveland Abbe |
| maiden_name | String | (if applicable) |
| birth_year, birth_month, birth_day | Integer | 1838, 12, 3 |
| birth_place | String | New York |
| death_year, death_month, death_day | Integer | 1916, 10, 28 |
| death_place | String | Chevy Chase |
| cemetery_id, cemetery_name | ID, String | 104448, Rock Creek Cemetery |
| plot | String | Section M, Lot 292 |
| is_famous | Boolean | 1 |
| military_branch, military_rank | String | United States Army, Private |
| bio | Text (HTML stripped) | Scientist. A native of New York City... |
| inscription | Text | CLEVELAND ABBE... |
| date_created, date_modified | ISO 8601 | 1998-04-26T00:00:00.000Z |
| creator_name, bio_contributor_name | String | Find a Grave, Bigwoo |