Event Data: Addition of missing events and API performance update

This is a brief update on our recent work to improve the volume and delivery of data through our Event Data API.

There are known issues with the reliability of Event Data, including large numbers of events missing since March 2020, as well as significant downtime for the API, but due to limited resourcing we haven’t been able to address them nearly as quickly as we’d have liked. This has been frustrating given the amount of interest in Event Data, however we have now started the process of backfilling data and restoring the API.

Currently, the majority of missing events for 2020 have been incorporated into data dumps, to allow high volume users to quickly update any local data cache that they keep. Progress can be tracked on Gitlab here and here. Anyone who would like access to the data dumps can contact us by email at eventdata@crossref.org.

We are now rebuilding the API index using an improved Elasticsearch service. This will be more reliable than what we have been using to date and is a first step in shoring up the API infrastructure. Right now we are reindexing the data in the new architecture and plan to switch over on Monday (26 October 2020). At this point we expect it to contain all data for 2020, and we will be actively re-indexing data from previous years over the coming weeks.

We will give more details about our future plans for Event Data in a blog post coming soon, and continue to highlight issues and fixes through Gitlab and here on the Community Forum.

To update progress on the API, we have loaded nearly 110 million events in a new API platform, but there are still some gaps in the 2020 data. We will fill those in before switching over from the current API, which has 538 million events.