8 replies
January 2022

Ceber

This public data file is indeed very useful. I would like to use the API to get the metadata records that are not included in the public data file and avoid duplicates. I guess I should pull those that are registered after “January, 7, 2021”, according to this post description? And which date field should I use? (e.g. indexed, created, deposited)

January 2022

ppolischuk

For incremental updates we recommend using the from-index-date filter. The timestamp that from-index-date filters on is guaranteed to be updated every time there is a change to metadata requiring a reindex. This way you’ll pick up updated records in addition to new records.

I’m glad to hear you find the public data file useful! We’re preparing the 2022 public data file for release soon.

2 replies
January 2022 ▶ ppolischuk

Ceber

Thanks for the reply, it is all clear now.
One last question regarding the dates. If I look for a DOI with the search engine in XML format, for instance, DOI 10.1039/d0se01062f, I can see a publication_date field (29 September 2020). However, in the public data file, the JSON for that DOI includes several date fields like indexed, created, published-online, issued, deposited. Some of them like issued or published-online have only the year part. What is the field in the public data file JSON that should correspond to the publication date?

February 2022

AaronNGray

Hi, How do I import the crossref data dump of *.json.gz ?
Do I import it into ElasticSearch and if so how, please ?

March 2022 ▶ ppolischuk

Jens

Thank you for creating and offering this huge amount of valuable data as a downloadable set of files. Is there an update on the release date for the 2022 public data file?

March 2022

ppolischuk

Hi Jens,

We’re close. Expect an update in the next few days. We’re planning to perform a reindex to finish up a few fixes and enhancements, after which we’ll generate and publish the 2022 public date file.

May 2022

ppolischuk

The 2022 public data file is now live. Please see the blog post announcement: 2022 public data file of more than 134 million metadata records now available - Crossref

1 reply
December 2022 ▶ ppolischuk

Garry

hello I need help reading the .json.gz file from the torrents using python. Also when I try to unzip the .gz file it says it’s corrupted. Why is this?