New public data file: 120+ million metadata records - Crossref - Interfaces for Machines

January 2022

Ceber

This public data file is indeed very useful. I would like to use the API to get the metadata records that are not included in the public data file and avoid duplicates. I guess I should pull those that are registered after “January, 7, 2021”, according to this post description? And which date field should I use? (e.g. indexed, created, deposited)

January 2022

ppolischuk

For incremental updates we recommend using the from-index-date filter. The timestamp that from-index-date filters on is guaranteed to be updated every time there is a change to metadata requiring a reindex. This way you’ll pick up updated records in addition to new records.

I’m glad to hear you find the public data file useful! We’re preparing the 2022 public data file for release soon.

2 replies

January 2022 ▶ ppolischuk

Ceber

Thanks for the reply, it is all clear now.
One last question regarding the dates. If I look for a DOI with the search engine in XML format, for instance, DOI 10.1039/d0se01062f, I can see a publication_date field (29 September 2020). However, in the public data file, the JSON for that DOI includes several date fields like indexed, created, published-online, issued, deposited. Some of them like issued or published-online have only the year part. What is the field in the public data file JSON that should correspond to the publication date?

February 2022

May 2022

ppolischuk

The 2022 public data file is now live. Please see the blog post announcement: 2022 public data file of more than 134 million metadata records now available - Crossref

1 reply

December 2022 ▶ ppolischuk

Garry

hello I need help reading the .json.gz file from the torrents using python. Also when I try to unzip the .gz file it says it’s corrupted. Why is this?

Ceber

ppolischuk

Ceber

AaronNGray

Jens

ppolischuk

ppolischuk

Garry