January 2022
This public data file is indeed very useful. I would like to use the API to get the metadata records that are not included in the public data file and avoid duplicates. I guess I should pull those that are registered after “January, 7, 2021”, according to this post description? And which date field should I use? (e.g. indexed, created, deposited)
January 2022
For incremental updates we recommend using the from-index-date
filter. The timestamp that from-index-date
filters on is guaranteed to be updated every time there is a change to metadata requiring a reindex. This way you’ll pick up updated records in addition to new records.
I’m glad to hear you find the public data file useful! We’re preparing the 2022 public data file for release soon.
2 replies
January 2022
▶ ppolischuk
Thanks for the reply, it is all clear now.
One last question regarding the dates. If I look for a DOI with the search engine in XML format, for instance, DOI 10.1039/d0se01062f, I can see a publication_date field (29 September 2020). However, in the public data file, the JSON for that DOI includes several date fields like indexed, created, published-online, issued, deposited. Some of them like issued or published-online have only the year part. What is the field in the public data file JSON that should correspond to the publication date?
February 2022
Hi, How do I import the crossref data dump of *.json.gz ?
Do I import it into ElasticSearch and if so how, please ?
March 2022
▶ ppolischuk
Thank you for creating and offering this huge amount of valuable data as a downloadable set of files. Is there an update on the release date for the 2022 public data file?
March 2022
Hi Jens,
We’re close. Expect an update in the next few days. We’re planning to perform a reindex to finish up a few fixes and enhancements, after which we’ll generate and publish the 2022 public date file.
May 2022
December 2022
▶ ppolischuk
hello I need help reading the .json.gz file from the torrents using python. Also when I try to unzip the .gz file it says it’s corrupted. Why is this?