OAI-PMH - preprints and incremental updates

Hi!

We’re looking to improve our system’s ability to fetch up-to-date metadata about journal articles and preprints, including updates when changes occur over time due to new forward links and/or cross-links between preprints and journal articles being processed.

We currently use the REST API, but the OAI-PMH service is intriguing, since it seems potentially more fit for our use case of incremental metadata harvesting.

However, it’s not exactly clear what would trigger the inclusion of a record in the OAI-PMH list of updated records for a given date. I’d love to know if there is an easily-described set of rules that determine this.

Our primary motivating examples are:

  • Journal article X has a new inbound citation, so its citeby-count goes up by 1. Does article X show up in the OAI-PMH incremental update for today?
  • Preprint X has now been published in a journal, so a new related-item link exists. Does preprint X show up in the OAI-PMH incremental update for today?

If there is clear guidance to stick to the REST API for high-fidelity incremental updates like this, that will be helpful to know. (I did see the note in the API Swagger docs about using the from-index-date field to capture incremental metadata updates.)

1 Like

Thanks for the interesting question. The short answer is that the OAI-PMH endpoint doesn’t give the information you’re looking for. It only includes two date fields: “created” when the record was first deposited and “last-update” which refers to the last time the record was redeposited by the member. The latter of these doesn’t change when we add a citation or cross-linked article to a preprint.

Your best solution (which is sounds like you are doing already) is to use the “indexed” field in the REST API. This updates when there is any change of value in the metadata. It covers a broader set of cases than you are looking at, but you can compare old and new records to see what’s been updated.

1 Like