“I’d like to get all the articles from publisher X in March 2023”. That’s a fairly typical question you might start out with when you come to look at Crossref metadata. What you’ll quickly realise is that ‘in March 2023’ has a number of different interpretations. The concept of dates can be a tricky one, but with some careyou will be able to get the results you want.
Here are the main dates we deal with (there is a full list used in REST API filters here):
- Published / published-print / published-online: The publication date of an item. There can be several of these, because items are published in different formats at different times (e.g., in print or online). It is defined by the member who submits the metadata.
- Posted: When an item was available online. Some types don’t have peer review or a formal editorial process, so the concept of publication is different. This mainly applies to items of type posted content, such as preprints. It is also defined by the member who submits the metadata.
- Created: When did Crossref receive metadata for the item? This is set our deposit system.
- Updated: When was the metadata record last redeposited by the member that maintains it? This is also set by the deposit system.
- Indexed: When was the metadata last changed by anybody (including Crossref)? This can include changes to the citation count or relationships that were initially asserted by a different member (such as a preprint / article link). This is the only date determined by the REST API.
Which dates you want to use will depend on what you want to do with the metadata. For example, if it’s to count publications in an issue you will want to use the published date. On the other hand, if you want to keep a local cache of data updated you will be interested in the redeposit or reindex date.
Some of the common cases we see are:
Statistics and analysis
This covers cases where someone is looking retrospectively at publication data performing some analysis. Date can be one of the variables. In this case, the published or posted dates are often the most useful because it fixes when the work became publicly available.
Keeping ahead of the curve
Many users want to look for new content items with certain properties, such as from a specific journal or author. Here, the created date is often useful because it is always a date in the past (unlike publication date!) and you will only ever collect each item once.
Service providers and indexers
Those building services on top of Crossref metadata and providing services for their own users often want the latest metadata. Here, the updated or indexed date is a good choice. It means you can keep in sync on a regular basis.
Note that in the REST API, the created, updated, and indexed dates can include a time, so you can check with a resolution of down to one minute without needing overlapping time ranges. For the created and updated dates there can be a lag between the metadata being deposited and it appearing in the REST API. It is safer to add a delay of at least several hours, and up to 24 hours to be sure you collect all of the metadata.