Date fields in CrossRef

Hello CrossRef community,

I have a question about date fields - the motivation for this is a continuation of a question I asked previously about unexpectedly low volumes of results from my search query. My end users are still unhappy - we now get more results but apparently only a low proportion are truly relevant.

Our use case is to do a trawl of new academic literature being released each week on poverty in developing countries. Part of the solution last time was switching from publication dates to creation dates, and I’m wondering if altering the date field again could help.

I’ve reviewed the documentation on date fields and I’d like to understand a little more about some of the main categories:

  • pub-date - this is the publication date according to the publisher - we seem to have issues with these being a long way in the past (or sometimes the future!) even when an record was ā€œcreatedā€ in the last week. What could cause that? What’s the process & any validation checks for publishers depositing this data? Also, can this change e.g. could it be set for a preprint and then updated at final publication etc.? If a publication gets published in print and online, which value would this take?
  • created-date - this is the date of first deposit - what drives that? Is it normal for publishers to start registering very old work? Conversely, do they create deposits of anticipated releases well in advance of publication?
  • update-date - what events cause this to be updated? It’s a deposit or redeposit, but what real world events lead to this?
  • deposit-date - just checking this is identical to update-date?
  • index-date - how does this differ from update-date and deposit-date?

The solution I’m considering is switching from created date to update date. It seems to me this should catch publications at all significant steps in their publication journey, and the only downside is introducing duplicates into our dataset. Any comments on that strategy would also be welcome.

Please let me know if more context or clarification would be useful.

Hi Tom,

Thanks for your questions.

Yes, that’s correct. We allow both ā€˜print’ and ā€˜online’ publication dates; however, only one or other is strictly required. Publication dates can be supplied as year-only, year and month, or the full date with year, month, and day. When you filter or sort by publication date, and there are multiple publication dates in a record (both ā€˜print’ and ā€˜online’) the filter will use whichever is earliest.

Typically, current content is registered with Crossref shortly after it’s published. But we also do encourage the registration of older content, which was published prior the advent of DOIs or which simply hasn’t been registered before, for whatever reason. This also includes archival content which has recently been digitized and therefore may have a very recent online publication date but a print publication date in the more distant past.

There is a validation check that the date exists. So, a publisher couldn’t supply a publication date of November 31st or February 30th. But, otherwise, like all DOI metadata, it’s up to the publisher to ensure that the dates are accurate.

Crossref members can update their DOI metadata as necessary, to correct errors or enrich a record with more complete data. But, publication dates should not change in the way you described. A preprint and a published article are two distinct works, and therefore should have two distinct DOIs. Content that’s published online ahead-of-print could have an online publication date that’s earlier than the print publication date, but it shouldn’t have a single date that changes.

We have no way to structurally prevent members from making changes to publication dates, but it would be considered bad practice. And, as far as I’m aware, it doesn’t happen often.

It’s not uncommon for publishers to register items that were published a long time ago. That’s something we support and encourage.

Registering content before it’s published is, on the other hand, strongly discouraged. It does happen sometimes, though usually not too far in advance. We have implemented a special record type - pending publication - intended to avoid that problem.

Updates can come from the member redepositing updated metadata with changes or additions. Updates also can result from Crossref updating citation counts or asserting other kinds of relationships based on relationships declared in the metadata of other works.

It won’t always be identical because some updates come from internal Crossref processes, while others come from member metadata deposits. It should reflect the date of the most recent metadata deposit by the member for that item.

The API record is re-indexed after every update, but not always on the same date. There can be a lag time between the update being made by the member and the reindexing in the API, usually no more than 24 hours. Crossref will also periodically do a full reindex, or a reindex of all works for a given member when certain administrative metadata (for example the member’s organization name) changes.

If you have a robust process for deduplicating, that’s a good idea, because it will ensure you have the most up-to-date metadata for the items you’re interested in.

Please let me know if there’s anything else we can help with.

-Shayn

2 Likes

Hello @Shayn,

Thanks very much for your prompt and detailed response - this is incredibly useful.

I think the moral from everything you’ve said is that there’s no compelling reason to change the choice of date field, and the way we have it should be capturing the relevant works. If anything we could switch to deposit-date but since none of these things should really be changing very much, it won’t make much difference. (If anything I think we could filter on both publication and created dates, but obviously that will reduce volumes even more…)

With best wishes, and thanks again for your help,
Tom

1 Like