We are a Japanese organization, and we’re interested in understanding how much Japanese-language content has been registered with Crossref and is being distributed via metadata.
While exploring Crossref’s REST API, we noticed that records include a field like the following:
“language”: “en”
We would like to ask for clarification on this field. Does “language”: “en” indicate:
That the article itself is written in English?
Or that the journal is English-language?
Or simply that the metadata (such as title and abstract) was submitted in English?
That is representing the language of the journal as a whole, and we’re getting that from the XML that is being submitted to us at the journal level. You can see this from the XML file that was registered for DOI 10.2964/jsik_2025_002 in submission https://doi.crossref.org/servlet/submissionAdmin?sf=detail&submissionID=1685138502
Thank you very much for your detailed explanation.
Based on your response, I understand that in this case the “language”: “en” value in the REST API output is not inferred from the article metadata, but instead comes from the language specified at the journal level.
I have a few follow-up questions:
Does Crossref’s deposit schema include a metadata field specifically intended to indicate the language in which the article itself is written?
For example, does the attribute <journal_article language=“” serve this purpose?
We have some concerns that metadata aggregators such as OpenAlex may not be accurately capturing the publication language of non-English articles, including Japanese content.
Since these services rely heavily on Crossref metadata, are there any best practices for depositing metadata for non-English articles to ensure that the article language is properly identified by downstream systems?
We would appreciate any guidance you can provide on this topic.
Does Crossref’s deposit schema include a metadata field specifically intended to indicate the language in which the article itself is written?
For example, does the attribute <journal_article language=“” serve this purpose?
I am going to preface my answer by saying that the information in the metadata should reflect how you anticipate that the journal and its articles will be cited. If you are publishing this journal in Japanese, then it would make sense that anyone citing journal articles published by this journal would do so using a citation in Japanese. Thus, including a Japanese tag in the journal-level metadata makes sense, like this:
<journal_metadata language=“ja”>
<full_title>Joho Chishiki Gakkaishi</full_title>
<abbrev_title>Journal of Japan Society of Information and Knowledge</abbrev_title>
0917-1436
1881-7661
</journal_metadata>
That’s at the journal level.
Then, at the article-level, you can also declare the language of that article using the <journal_article> tags, like this: <journal_article language="ja">.
In addition, you can include multiple titles and abstracts in multiple languages per record. There can be multiple tags and multiple tags within a <journal_article> section of the xml.
Since these services rely heavily on Crossref metadata, are there any best practices for depositing metadata for non-English articles to ensure that the article language is properly identified by downstream systems?
Not beyond what I have mentioned above. In the example above, I will say that we are matching citations for this journal using both the title Joho Chishiki Gakkaishi and Journal of Japan Society of Information and Knowledge. And for that article you have asked about - DOI 10.2964/jsik_2025_002 - we are matching the article title if it is included in a citation as Differences in Areas of Interest between User Companies and IT Vendors in Agile Development OR アジャイル開発におけるユーザ企業とITベンダの関心領域の相違. But, including the language tags that I have included in my first answer would be best practice for your example.