We have some exciting news for fans of big batches of metadata: this year’s public data file is now available. Like in years past, we’ve wrapped up all of our metadata records into a single download for those who want to get started using all Crossref metadata records.
This is a companion discussion topic for the original entry at https://www.crossref.org/blog/2023-public-data-file-now-available-with-new-and-improved-retrieval-options/
This is my first time noticing that the data is available via a torrent – which is awesome! I took a look at the current peers:
- Eisenstadt, Austria
- Minneapolis, USA
- Kassel, Germany
- Amsterdam, Netherlands
- North Holland, Netherlands
- Broadside, England
I wonder if in the future it might make sense to seed from places other than North America and Western Europe? Maybe a few of those Western European spots could be swapped out for spots in Africa, Australia or Asia?
Thanks for the suggestion! In previous years, we relied on organic seeding from those who were downloading the torrent. This is the first year that we’re proactively seeding the file with a hosted solution. We’ll take your feedback into account and will continue considering ways to improve sustainable access to our public data files.
Is there any documentation on the schema of the contents of the public data files? I was unable to find anything to document the format, and that’s crucial.
Good question. We don’t have a published schema for the public data file, albeit we do get a number of questions about metadata retrieval output schemas and are considering how we can make these available.
The public data file does use the oai-pmh protocol and the output in that file is based on our comprehensive XML input schema, which you can review here: Schema documentation for crossref5.3.1.xsd (most recent version).