Article metadata does not include citation of CrossRef DOI for a dataset

I’ve discovered a case where an article cites a dataset that has a CrossRef DOI, but this dataset DOI is not included in the CrossRef metadata. In other words, a data citation has been missed. For context, this came up as part of the Make Data Count Kaggle Challenge, see Make Data Count - Finding Data References | Kaggle

The example is “A conserved allosteric element controls specificity and activity of functionally divergent PP2C phosphatases from Bacillus subtilis” https://doi.org/10.1016/j.jbc.2021.100518 which cites a Protein Data Bank record 3F7A which has a DOI wwPDB: pdb_00003f7a issued by CrossRef. This DOI is cited in the web version of this article, but not in the CrossRef metadata:

{
        "year": "2009",
        "series-title": "Structure of Orthorhombic Crystal Form of Pseudomonas aeruginosa RssB",
        "author": "Levchenko",
        "key": "10.1016/j.jbc.2021.100518_bib25"
      },

I wonder if this is because the data record has type “component” and is hence overlooked when making the DOI links?

It would be interesting to see what happened here, because data citations are currently a topic of great interest, and here is a case where CrossRef seems to have all the information needed to make the link between paper and data, but doesn’t!

1 Like

Thanks for reporting. This looks like a case where our matching methods haven’t picked up the DOI for the reference. This can happen for a number of reasons - no matching is perfect and there will always be some level of false-positives and false-negatives.

Being a component isn’t a reason for us not to match this item. I suspect that either the metadata is too sparse or there are several similarly titled works that couldn’t be separated. We will be revisiting and updating our reference matching processes in the next 18 months or so, which will hopefully mean that we can pick up more cases like this.

@mrittman I wonder what data Elsevier sent CrossRef? Did they include the DOI that they show on their own web site, or just the metadata? I don’t know if CrossRef caches publisher-supplied metadata, but that would help decide whether the issue lies with the data supplied by Elsevier, or the matching methods used by CrossRef.

If the DOI isn’t in the REST API metadata then we didn’t receive it. It looks like that’s what happened in this case.