In December, thanks to the concerns raised in the preprint “Incorrect Citation Association for Articles in Online-Only Springer Nature Journals”, we identified approximately 150k incorrect citation links in the Crossref database that pointed to works published by Springer Nature. Roughly 25% of these issues originated from publisher-asserted metadata, and the remaining 75% from our automated reference matching process.
Two main factors contributed to the problem:
-
Incorrect reference metadata: the affected references included a first page value of “1”, even though the cited works do not use page numbers at all and instead use article numbers.
-
Our legacy reference matching strategy was not sufficiently fault-tolerant to detect such inconsistencies in the deposited metadata, which allowed incorrect citation links to be created.
We updated our matching strategy on 29 January to prevent this category of incorrect matches. After that date, newly deposited (or re-deposited) references are no longer affected. Correcting historical data remains future work.
150k links is a big number and we reacted to this information with immediate attention and action. It’s important to note that the issue affected only a relatively small slice of the overall citation network of nearly 2 billion citation relationships. This highlighted just how vital open metadata is - and how the community plays an important role in helping us identify and correct such issues when they occur.