We are a Japanese organization, and we’re interested in understanding how much Japanese-language content has been registered with Crossref and is being distributed via metadata.
While exploring Crossref’s REST API, we noticed that records include a field like the following:
“language”: “en”
We would like to ask for clarification on this field. Does “language”: “en” indicate:
That the article itself is written in English?
Or that the journal is English-language?
Or simply that the metadata (such as title and abstract) was submitted in English?
That is representing the language of the journal as a whole, and we’re getting that from the XML that is being submitted to us at the journal level. You can see this from the XML file that was registered for DOI 10.2964/jsik_2025_002 in submission https://doi.crossref.org/servlet/submissionAdmin?sf=detail&submissionID=1685138502
Thank you very much for your detailed explanation.
Based on your response, I understand that in this case the “language”: “en” value in the REST API output is not inferred from the article metadata, but instead comes from the language specified at the journal level.
I have a few follow-up questions:
Does Crossref’s deposit schema include a metadata field specifically intended to indicate the language in which the article itself is written?
For example, does the attribute <journal_article language=“” serve this purpose?
We have some concerns that metadata aggregators such as OpenAlex may not be accurately capturing the publication language of non-English articles, including Japanese content.
Since these services rely heavily on Crossref metadata, are there any best practices for depositing metadata for non-English articles to ensure that the article language is properly identified by downstream systems?
We would appreciate any guidance you can provide on this topic.