Questions Regarding Japanese Content and the language Field in REST API

Dear Crossref Support & Community,

I hope this message finds you well.

We are a Japanese organization, and we’re interested in understanding how much Japanese-language content has been registered with Crossref and is being distributed via metadata.

While exploring Crossref’s REST API, we noticed that records include a field like the following:

“language”: “en”

We would like to ask for clarification on this field. Does “language”: “en” indicate:

  • That the article itself is written in English?
  • Or that the journal is English-language?
  • Or simply that the metadata (such as title and abstract) was submitted in English?

Thank you very much for your support.

Hello @ksakai ,

Thanks for asking!

This typically means that the content in question is published in English and thus will be cited in English.

If you have a specific example I’d be happy to give you more details.

-Isaac

Dear @ifarley ,

Thank you for your response and clarification.

As a follow-up, may I ask how the "language" field in the REST API is determined?

For example, in the following record, we did not specify a language attribute like <journal_article language="en" in the deposited metadata:

https://api.crossref.org/works/10.2964/jsik_2025_002

Despite this, the REST API output shows "language": "en". Could you let us know how this value is inferred in such cases?

We’d appreciate any further insight you could provide.

Best regards,
Kohei

Hello Kohei,

Thanks for the example. We’re not inferring the language here, we’re taking it directly from the metadata registered with us.

In the REST API result for DOI 10.2964/jsik_2025_002 - https://api.crossref.org/works/10.2964/jsik_2025_002 - you’ll see the language element here:

That is representing the language of the journal as a whole, and we’re getting that from the XML that is being submitted to us at the journal level. You can see this from the XML file that was registered for DOI 10.2964/jsik_2025_002 in submission https://doi.crossref.org/servlet/submissionAdmin?sf=detail&submissionID=1685138502

There is no language declared for the journal article, albeit two titles were submitted for the journal article:

Thus, we’re matching references based on both journal article titles.

Warm regards,

Isaac

Dear @ifarley ,

Thank you very much for your detailed explanation.
Based on your response, I understand that in this case the “language”: “en” value in the REST API output is not inferred from the article metadata, but instead comes from the language specified at the journal level.

I have a few follow-up questions:

  1. Does Crossref’s deposit schema include a metadata field specifically intended to indicate the language in which the article itself is written?
    For example, does the attribute <journal_article language=“” serve this purpose?
  2. We have some concerns that metadata aggregators such as OpenAlex may not be accurately capturing the publication language of non-English articles, including Japanese content.
    Since these services rely heavily on Crossref metadata, are there any best practices for depositing metadata for non-English articles to ensure that the article language is properly identified by downstream systems?

We would appreciate any guidance you can provide on this topic.

Thank you again for your continued support!

Hello @ksakai ,

Thanks for following up.

  1. Does Crossref’s deposit schema include a metadata field specifically intended to indicate the language in which the article itself is written?
    For example, does the attribute <journal_article language=“” serve this purpose?

I am going to preface my answer by saying that the information in the metadata should reflect how you anticipate that the journal and its articles will be cited. If you are publishing this journal in Japanese, then it would make sense that anyone citing journal articles published by this journal would do so using a citation in Japanese. Thus, including a Japanese tag in the journal-level metadata makes sense, like this:

<journal_metadata language=“ja”>

<full_title>Joho Chishiki Gakkaishi</full_title>

<abbrev_title>Journal of Japan Society of Information and Knowledge</abbrev_title>

0917-1436

1881-7661

</journal_metadata>

That’s at the journal level.

Then, at the article-level, you can also declare the language of that article using the <journal_article> tags, like this:
<journal_article language="ja">.

In addition, you can include multiple titles and abstracts in multiple languages per record. There can be multiple tags and multiple tags within a <journal_article> section of the xml.

Since these services rely heavily on Crossref metadata, are there any best practices for depositing metadata for non-English articles to ensure that the article language is properly identified by downstream systems?

Not beyond what I have mentioned above. In the example above, I will say that we are matching citations for this journal using both the title Joho Chishiki Gakkaishi and Journal of Japan Society of Information and Knowledge. And for that article you have asked about - DOI 10.2964/jsik_2025_002 - we are matching the article title if it is included in a citation as Differences in Areas of Interest between User Companies and IT Vendors in Agile Development OR アジャイル開発におけるユーザ企業とITベンダの関心領域の相違. But, including the language tags that I have included in my first answer would be best practice for your example.

Warm regards,

Isaac