There are several large-scale human name databases available in CSV format, each with different coverage, licensing, and accessibility. Here are the most notable options based on size and scope:
1. Name Census (census.name)
- Coverage: 1,507,690 validated first names and 3,251,185 surnames from 139 countries.
- Details: Includes gender and popularity for each name. Data is validated using official government sources and over 22 million social media profiles.
- Format: CSV, SQL, and JSON.
- Access: The full database is available for purchase, but you can preview the top 100 names per country for free. Suitable for commercial and research use, but redistribution is restricted[1].
2. philipperemy/name-dataset (GitHub)
- Coverage: 491,655,925 records from 106 countries.
- Details: Each record includes first name, last name, gender, and country code. Data is derived from the large Facebook data leak (533 million accounts).
- Format: CSV (one file per country), total uncompressed size 10GB.
- Access: Free to download and use, but check the repository for licensing and ethical considerations[2][3].
3. sigpwned/popular-names-by-country-dataset (GitHub)
- Coverage: Thousands of unique first names and surnames, organized by country.
- Details: Data is available in CSV, TXT, and JSON formats. Each file contains names by country, with master files for both forenames and surnames.
- Access: Free and open-source, but the scale is smaller than the above datasets[4][3].
4. diabolical-ninja/AllTheNames (GitHub)
- Coverage: 14,607,135 total rows, 289,755 unique names, 623 unique origins.
- Details: Aggregates data from 15 sources, includes first name, origin, and gender (where available).
- Format: CSV, easily readable into data analysis tools like pandas.
- Access: Free and open-source, but accuracy and metadata vary by source[5].
5. Other Public Datasets
- Heise c’t magazine: Nearly 50,000 international names with gender and country popularity, available under a GNU license[3].
- US Social Security Administration: Extensive US baby names data, downloadable in CSV, with historical data back to the 19th century[3].
Summary Table
| Dataset/Source | First Names | Surnames | Countries | Format(s) | Access |
|------------------------|------------------|------------------|-----------|----------------|------------------|
| Name Census | 1.5M | 3.2M | 139 | CSV, SQL, JSON | Paid, preview free[1] |
| philipperemy/name-dataset | 491M records | (included) | 106 | CSV | Free[2][3] |
| sigpwned/popular-names | Thousands | Thousands | Many | CSV, TXT, JSON | Free[4][3] |
| diabolical-ninja/AllTheNames | 289K unique | (N/A) | 623 origins| CSV | Free[5] |
Conclusion:
- The largest freely available dataset is likely the philipperemy/name-dataset, with almost half a billion records and full CSV access[2][3].
- For commercial-grade, highly validated data with global coverage, Name Census offers the most comprehensive option, but requires payment[1].
- For smaller, open-source datasets or those focused on specific regions, the other GitHub repositories and public sources are also valuable.
If you need the absolute largest, free, and ready-to-use CSV file, philipperemy/name-dataset is your best option[2][3]. For officially validated and structured data with commercial support, consider purchasing from Name Census[1].
[1] https://census.name
[2] https://github.com/philipperemy/name-dataset/blob/master/README.md
[3] https://opendata.stackexchange.com/questions/4756/searching-for-lists-of-babynames-containing-huge-10k-amounts-of-unique-name
[4] https://github.com/sigpwned/popular-names-by-country-dataset
[5] https://github.com/diabolical-ninja/AllTheNames
[6] https://www.kaggle.com/datasets/namecensus/first-name-database
[7] https://www.back4app.com/database/back4app/list-of-names-dataset
[8] https://www.tableau.com/pt-br/blog/55-million-rows-baby-name-data
[9] https://blog.gdeltproject.org/gkg-person-name-histogram-2015-2019/
[10] https://pypi.org/project/human-names/
[11] https://opendata.stackexchange.com/questions/46/multinational-list-of-popular-first-names-and-surnames
[12] https://www.npmjs.com/package/humannames
[13] https://www.kaggle.com/datasets/datagov/usa-names
[14] https://github.com/FinNLP/humannames
[15] https://www.kaggle.com/datasets/techsolutionsgib/list-of-people-names-by-countries
[16] https://www.ssa.gov/oact/babynames/decades/century.html
[17] https://www.kaggle.com/datasets/asthamular/people-10-m
[18] https://stackoverflow.com/questions/1452003/plain-computer-parseable-lists-of-common-first-names
[19] https://amore-upf.github.io/manynames/
[20] https://www.cs.williams.edu/~jeannie/cs134-f22/labs/06-names/index.html