Ticket of the month - June 2024 - What happens to submitted references

Shayn · 6 July 2024 21:57

Let’s say you’ve capitulated to all our nudging and have just begun supplying references along with your DOIs’ metadata records. The submission logs that come back confirming your successful metadata deposits now have a bunch of extra ‘stuff’ in them.

Instead of just a <msg>Successfully added</msg> or <msg>Successfully updated</msg> message for each submitted DOI (or, hopefully rarely, an error message), now you see a separate diagnostic for each submitted reference your publications’ reference lists.

These will return one of three status results:

status=“error”
status=“resolved_reference”
status=“stored_query”

Let’s look at each in context. In the example submission log in our documentation, the very first reference submitted returned an error

<citation key="10.5555/example_bb0030" status="error">Either ISSN or Journal title or Proceedings title must be supplied.</citation>

What kind of reference would result in that error? Well, it would have to be a reference where each element is tagged individually (aka, “a structured citation”) because that’s the only situation which requires an ISSN or Journal/Proceedings title. For example:

<citation key="ref1">
<author>Dobbs</author>
<volume>13</volume>
<issue>2</issue>
<first_page>16</first_page>
<cYear>2023</cYear>
<article_title>Cat Herding: A Systematic Review</article_title>
</citation>

The error itself is pretty straightforward in this case. When your publication is citing a journal article or conference paper, the structured reference data has to include some way to identify the journal or conference proceedings that it’s a part of. So, adding the journal title or title abbreviation like this would take care of the problem.

<citation key="ref1">
<journal_title>Journal of Impossible Tasks</journal_title>
<author>Dobbs</author>
<volume>13</volume>
<issue>2</issue>
<first_page>16</first_page>
<cYear>2023</cYear>
<article_title>Cat Herding: A Systematic Review</article_title>
</citation>

The second reference diagnostic in that log returns the status stored_query like this

<citation key="10.5555/example_bb0005" status="stored_query"></citation>

While further down the list, you can see a resolved_reference status like this

<citation key="10.5555/example_bb0015" status="resolved_reference">10.1590/S0006-87051960000100077</citation>

Both of those were the result of references that were formatted in a completely valid way. We know this, because the status was not “error”. So, what’s the difference between them?

In simplest terms, resolved_reference means our reference matching system could successfully match the reference that was supplied in that metadata deposit to the metadata associated with a specific DOI. That is, your publication is citing something, and we’ve figured out what exactly it was citing.

In contrast, stored_query means that we couldn’t find a distinct match. We don’t know what exactly your publication was citing via that reference. When that happens, the reference is “stored” for later re-querying. Periodically, we’ll try to match it again, in case the cited publication has been registered in the meanwhile.

When a citation match has been found, the DOI of the cited item is displayed in the submission log diagnostic. In our example, that’s 10.1590/S0006-87051960000100077

The reference that produced this citation match may have looked like this

<citation key="10.5555/example_bb0015">
<doi>10.1590/S0006-87051960000100077</doi>
</citation>

Or like this

<citation key="10.5555/example_bb0015">
<journal_title>Bragantia</journal_title>
<author>Bacchi</author>
<volume>19</volume>
<first_page>XLI</first_page>
<cYear>1960</cYear>
</citation>

Or like this

<citation key="10.5555/example_bb0015">
<unstructured_citation>Bacchi, O. (1960). Estudos sôbre a conservação de sementes. V - alface. Bragantia, 19(unico), XLI–XLV.</unstructured_citation>
</citation>

Any of those, as well as many variations of the later two could produce a successful citation match to 10.1590/S0006-87051960000100077 based on the metadata supplied to Crossref by its publisher.

A stored_query result, where a citation match has not been found, typically means that the referenced publication has not been registered with Crossref. While the majority of DOIs for scholarly publications are registered with Crossref, not all scholarly publications have DOIs (this is especially true for content that was published prior to the advent of the DOI system) and not all DOIs are Crossref DOIs. If a reference is citing something that isn’t registered with Crossref, then we won’t be able to match your reference to an identifier.

In some cases, the lack of a citation match is due to an inaccuracy in the way the citation has been submitted or formatted.

One common example tends to happen when an author is citing a paper directly from a prepublication manuscript, and therefore puts the first page number “1” in their reference and the publisher passes this false, placeholder page number along in the reference they submit to Crossref. Ultimately, once that cited paper goes on to be published as an article in a journal, it’s given some other page range that doesn’t begin with “1”. So, the page number reference doesn’t end up matching the page number in the cited work’s metadata record, and no citation match can be made.

For example, if an item that you’re registering cites the article “Damage Tolerance Related to the Damage Area of Impacted Carbon/Epoxy Composite Laminates” in volume 57 issue 19 of Journal of Composite Materials, but you supply the reference like this:

<citation key="5555.1">
<unstructured_citation>Targino, T. G., et al. (2023). Damage tolerance related to the damage area of impacted carbon/epoxy composite laminates. Journal of Composite Materials, 57(19), 1-9</unstructured_citation>
</citation>

That won’t be effective in producing a citation match to its DOI 10.1177/00219983231181942 because the page range in the metadata for that DOI is 2985-2993, not 1-9. However, if the first page number, or page range, was entirely omitted from the reference, that would match successfully. Page numbers can help disambiguate one item from another, but they’re not required - an inaccurate page number hurts more than an accurate one helps.

In other instances, a missing citation match may be due to an overall sparsity of information in the reference. This is especially a problem with structured references where each element has its own tags. Unstructured references, where a whole formatted citation is submitted as one block of text, tend to be a bit more flexible.

So, to take another example, if an item that you’re registering cites the article “ Cosmological consequences of Brans–Dicke theory in 4D from 5D scalar-vacuum” in volume 139, issue 2 of The European Physical Journal Plus, but you submit a reference like this:

<citation key="5555.2">
<journal_title>Eur Phys J Plus</journal_title>
<author>Lambiase</author>
<cYear>2024</cYear>
</citation>

That’s unlikely to produce a successful match to that cited work’s DOI - 10.1140/epjp/s13360-024-04905-w - simply because there’s not enough data included. The publication year, journal abbreviation, and first author’s surname are accurate, but including the volume and issue numbers and/or the article title would be more effective.

And, of course, the simplest and most foolproof method to submit a reference is always to just use the DOI, if it exists, e.g.

<citation key="5555.3">
<doi>10.5555/12345678</doi>
</citation>

As long as that DOI exists in Crossref’s system that is a 100% guarantee that you’ll end up with a successful citation relationship between your publication and the item it cites.

mccurley · 8 July 2024 01:57

We decided to jump headlong into reporting bibliographic references, but it’s easy for us because we collect structured information (bibtex) from authors and we check references in our copy editing process to make sure that DOIs are included wherever possible. Unfortunately when I tested our newest issue, I saw a bunch of errors of the form.

<citation key="ref37:AC:CasLagTuc18" status="error">Reference DOI 10.1007/978-3-030-03329-3_25 not found in Crossref doi: 10.1007/978-3-030-03329-3_25</citation>

This is very peculiar since https://api.crossref.org/works/10.1007/978-3-030-03329-3_25 returns metadata for that DOI (and it’s from Springer Nature). This is not an isolated example; we had 315 such reports for DOIs that are registered. The DOI should be the most valuable key in identifying bibliographic references, so it makes me wonder what is going wrong.

Shayn · 8 July 2024 02:12

Is this a test deposit that was submitted to our test system (test.crossref.org) endpoint? If so, that almost certainly explains the “not found in Crossref” response.

The test system is good for testing the process of submitting files, but that’s about where its utility ends. (I’m exaggerating a bit - it will also tell you if your xml is misformatted or invalid against the schema, but there are other ways to do that)

Responses that relate to the existence of individual DOIs or the details of titles (title ownership, journal title text, ISSNs, for example) aren’t always going to sync up with the real data in the production system.

So, for practical purposes, since you know that DOI does actually exist, you can ignore that error from the test system. It won’t happen in production.

mccurley · 8 July 2024 21:28

That explains a lot - I was in fact using test.crossref.org.

This mostly seems to work for standard references like journal articles or books or articles in conference proceedings. As you mentioned, there are lots of things that get many citations and lack a DOI or ISBN. For example this paper has over 2200 citations and has had a stable URL for almost 20 years. Some publishers like the Internet Society and Usenix don’t use DOIs (this one has almost 6000 citations). It feels like a great oversight to omit a field for a stable URL. Should we be using elocation_id for those URLs? A lot of people regard URLs as unstable identifiers, but web has changed a lot due to SEO and in some cases this is the best identifier we have. A DOI is always superior of course.

Shayn · 9 July 2024 20:06

The way references are handled in our schema was geared towards matching references to Crossref DOIs, when DOIs for those cited items exist. That’s why we haven’t allowed for URLs in the citation markup.

elocation_id is intended for article IDs or page locators. So, that’s not suitable for stable URLs.

The best option, if you have a reference for something that you already know doesn’t have a DOI, is actually just to use <unstructured_citation> rather than marking up the individual elements. You can put the URL in there the same way that you would have it at the end of a formatted reference.

mccurley · 10 July 2024 00:36

It makes sense that you’re targeting references with DOIs (after all that is your business). Unfortunately we would have to do quite a bit of work to generate <unstructured_citation> from the BibTeX format, because <unstructured_citation> is misnamed - it’s really just a different structure with its own tag set. For example, many of our references contain mathematics in the title, and while <article-title> supports inline mathematics as <tex-math>, <unstructured_citation> only seems to support <mml:math>. The conversion is non-trivial to handle so we’ll probably just send the structured version and have them ignored.

Since some publishers use only a stable URL as their identifier, that’s what we will be sending in <elocation_id> since there is no other field for it. If crossref chooses to drop it, then whatever but if a client of the data wants to match references, then sending more data gives them a better chance.

Topic	Replies	Views
Ticket of the month - October 2023 - Dispelling pesky journal-title-level registration errors Content Registration journal-level_dois , web_deposit_form , content-registration , ticket_of_month , title_update	1121	2 November 2023
Ticket of the month - May 2024 - Journal title management and best practices Content Registration content-registration , ticket_of_month , title_update	201	7 June 2024
Ticket of the month - November 2021 - Simple Text Query form, registering references, and reference coverage within Participation Reports Technical Support references , participation-report , ticket_of_month , simple_text_query	1430	6 December 2021
Ticket of the month - May 2023 - DOI error reports and unregistered DOIs Reports reports , content-registration , ticket_of_month , dois_not_resolving , doi_error_report	834	3 June 2023
Ticket of the month - April 2022 - reference coverage - which DOIs have I registered references for? Metadata Retrieval rest-api , references , ticket_of_month	1225	6 May 2022

Ticket of the month - June 2024 - What happens to submitted references

Related Topics