This topic is related to topic #14533 which is a question specifically about the REST JSON API.
My question here is specifically about a different problem in the UNIXREF XML API, as used by Zotero, via the HTTP endpoint:
curl -LH "Accept: application/vnd.crossref.unixref+xml" https://doi.org/10.1007/s00253-024-13397-8
The UNIXREF XML API currently does not properly provide HTML-like content when two inline HTML-like elements are adjacent with zero whitespace between them.
As a concrete real example, I provide the yummy case of the MATa gene allele of ale beer yeast. What better than a delicious example! The DOI example above is for the libation of PMC11754353. It, and in particular its abstract, writes about a well known cell lines with a gene allele that I write below in three formats.
In HTML (as rendered by Wikipedia):
<i>MAT<b>a</b></i>
In Wikitext:
''MAT'''a'''''
In markdown:
_MAT**a**_
Scientific contributors to Wikipedia can write Wikitext and have in properly rendered on the Wikipedia, as seen on the page about the [Mating of yeast] (en wikipedia org /wiki/Mating_of_yeast).
For the abstract of PMC11754353 (10.1007/s00253-024-13397-8), the Crossref UNIXREF XML API returns the following around the gene allele name:
... We generated stable
<italic>MAT</italic>
<bold>
<italic>a</italic>
</bold>
or
<italic>MATα </italic>
lines of four different Kveik yeasts, named Odin, Thor, Freya and Vör.
I bet Freya tastes the best. The JATS XML in PMC for this part of the abstract is as follows:
We generated stable <italic>MAT</italic><bold><italic>a</italic></bold> or <italic>MATα </italic>lines of four different Kveik yeasts, named Odin, Thor, Freya and Vör.
I bet Thor tastes really bitter. But what about Odin and Vör?
The problem here is the insertion of whitespace between the </italic> and the <bold>. Any downstream application, like say Zotero, will now render this incorrectly as two words, “MAT” and “a”, rather than one word “MATa”, because Crossref has modified the XML mixed content of the abstract, in an invalid way.
My questions for Crossref are:
What are Crossref’s plans for this?
Are abstracts intended for data-mining only and not re-display for human reading?
Does Crossref plan to fix this or should developers use the REST JSON API if they care? (assuming the invalid removal of whitespace is fixed in the REST JSON API)
Cheers (with a beer ale),
Castedo