ROR, affiliation, raw affiliation in "American Physical Society" APS publisher publications

Hi Crossref,

It seems APS is now sending affiliations using ROR, there is a side effect, I don’t know if crossref can suggest a better way to use the api.

It is important for crossref api clients to have the full raw affiliation.

ex: 10.1103 PhysRevC.110.034315
initial affiliation for user Bernerd

  • 2[CERN]( ror org 01ggx4157), 1211 Geneva 23, Switzerland
  • 6Instituut voor Kern- en Stralingsfysica, KU Leuven, B-3001 Leuven, Belgium

api crossref org works 10.1103 PhysRevC.110.034315

{
“given”: “C.”,
“family”: “Bernerd”,
“sequence”: “additional”,
“affiliation”: [
{
“id”: [
{
“id”: “ror org 01ggx4157”,
“id-type”: “ROR”,
“asserted-by”: “publisher”
}
],
“name”: “CERN”
},
{
“name”: “Instituut voor Kern- en Stralingsfysica”
}
]
},

We loose information of university KU Leuven, country etc…

Could that be fixed ? APS is an important publisher.

Thanks :slight_smile:

Ren

1 Like

Hi Ren,

Thanks for writing in! APS is indeed now sending affiliations using ROR. Here’s the announcement they made about that, with an embedded presentation: Research Organization Registry (ROR) | American Physical Society Becomes Largest Society Publisher to Adopt ROR (cross-post)

The organization name and location for the first affiliation in the example you give (CERN) can be retrieved using the ROR ID, either by visiting the link on the web at Research Organization Registry (ROR) Search or by retrieving it with a second call to the ROR API at https://api.ror.org/organizations/https://ror.org/01ggx4157

For the second affiliation, “Instituut voor Kern- en Stralingsfysica”, which I think is properly meant to be “Instituut voor Kernen Stralingsfysica,” or the Institute for Nuclear Radiation and Physics at KU Leuven, I think APS most likely is still working on parsing out the correct part of the affiliation to send to Crossref, which in this case would be KU Leuven, https://ror.org/05f950310.

There isn’t any way to get the raw affiliation strings from the Crossref API unless APS sends them to Crossref, but I’ll pass on your message and see what they can do.

You could also get in touch with them directly to ask that they send full affiliation strings, at least until they have added more ROR IDs to their metadata so that they can identify affiliations like the one for KU Leuven in the example you give.

Does that help?

Amanda

2 Likes

@Ren By the way, can you tell me what you use the affiliation information for? Would be interested to know!

Hi! I work for the APS journals and have been somewhat involved with our affiliation work. I believe before May 2024 we weren’t sending any affiliation data; when we implemented ROR we also started sending associated affiliation strings to Crossref whether or not the affiliation was identified with a ROR id. However the data being sent may be incomplete, thanks for pointing out this issue and I’m asking some of our IT staff to look into it.

3 Likes

Hi Amanda - is there a standard field in the Crossref upload format that represents the “raw affiliation string” somewhere? It looks like all we’re sending (aside from ROR ids) is the “name” field, which is whatever our system thinks the “name” is?

@apsmith No, there isn’t a place for that “raw affiliation string” – that’s a good point. It’s broken down into three separate fields: institution_name, institution_place, and institution_department. You can also add institution_acronym and of course institution_id.

That said, I wouldn’t be surprised if there are a lot of existing records with a raw affiliation string in institution_name. Here’s the documentation: Affiliations and ROR - Crossref

In the example @Ren gives for KU Leuven, the ideal example would look like this:

<institution>
    <institution_name>KU Leuven</institution_name>
    <institution_place>Leuven, Belgium</institution_place>
    <institution_id type="ror">https://ror.org/05f950310</institution_id>
    <institution_department>Instituut voor Kernen Stralingsfysica</institution_department>
</institution>

Although in that “ideal example” you don’t necessarily need institution_name and institution_place since those can be fetched from ROR.

Hmm, that’s not really a good match for our JATS affiliation data. All our vendors are tagging right now is <aff> and then within that <institution-wrap> which holds the <institution-id> (ROR) and <institution> (name) elements. So no “department” or “place” designation. Are you recommending just using the full string (without tags) as the name value if that’s all we have?

No, I definitely wouldn’t recommend putting the full string as the institution name value. I’d be curious to hear from @Ren what parts of the raw string are most important for their needs.

Thanks Amanda, Arthur for your prompt reply.

This change impacts one very important emerging service OpenAlex (I don’t work for them). I see that on recent APS publications.

api . openalex. org/works / doi : 10.1103 / PhysRevC.110.034315

It is now very hard for OpenAlex affiliation to institution resolver to resolve institutions. => the APS record fails matching a basic university “KU Leuven”, because information is not present anymore along with address country, city. (that are injected in matcher)

example :

{
“author_position”: “middle”,
“author”: {
“id”: " // openalex. org /A5081842888",
“display_name”: “Cyril Bernerd”,
“orcid”: " // orcid. org / 0000-0002-2183-9695"
},
“institutions”: , => OpenAlex cannot match KU Leuven university (array is empty)
“countries”: ,
“is_corresponding”: false,
“raw_author_name”: “C. Bernerd”,
“raw_affiliation_strings”: [
“Instituut voor Kern- en Stralingsfysica” => missing raw affiliation information in OpenAlex.
],
“affiliations”: [
{
“raw_affiliation_string”: “Instituut voor Kern- en Stralingsfysica”,
“institution_ids”:
}
]
},

It seems raw affiliation is available in NASA adsabs :
ui. adsabs. harvard. edu / abs/ 2024PhRvC.110c4315Y / abstract
(I don’t know how they get raw affiliation from APS, Arthur do you know ? )

Conclusion ( you may disagree :slight_smile: ) :

  • It is important to keep the raw affiliation concept, scientists are using it for many years to express their affiliation. We need the complete raw affiliation string. There is no standard way to cut it (lab, dpt, uni, city, country is perfect world that does not exists). I hope Crossref could clarify where to put this raw affiliation information.
  • Matching raw affiliation to a ROR is just a second unperfect step, we need to keep the possibility for external systems (like OpenAlex) to match institutions. We need to simply allow user to search within raw affiliations for specific terms (ex: an acronym, a specific name, etc…), OpenAlex provides this feature.
  • “Unperfect step” because ROR can even be incomplete at the time you enter your affiliation in APS.

Thanks @Ren for that explanation! And apologies for not having remembered earlier that the Crossref schema doesn’t support raw affiliation strings. I can pass your request on to the metadata team.

1 Like

Hi,

Other example : https :// journals aps org prb abstract 10.1103 PhysRevB.110.174434
affiliation : “Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, China”
is exported as : crossref http :// api crossref org works 10.1103 PhysRevB.110.174434 (accessed 2024/11/21)
“Institute of Theoretical Physics” (no ROR)
=> not yet loaded in OpenAlex (checked today), but I don’t think it will correctly match.

OpenAlex could perfectly match this affiliation, if it was full ex : https:// api . openalex . org / works /doi : 10.1103 / physrevd.65.084014
=> for “Institute of Theoretical Physics, Chinese Academy of Sciences, P.O. Box 2735, Beijing 100080, China” openalex gives 2 RORs. one for “Institute of Theoretical Physics” (in beijing) and one for “Chinese Academy of Sciences”

After I started looking at what authors submit for ROR, I’ve noticed that they sometimes submit incorrect information. Part of the problem was that the structure of the ROR database can be confusing with parent/child relationships and various other oddities. As it turns out, the name is crucial in resolving differences and what people put in their articles for the name of an institution can vary somewhat (for example, if they write in English but the primary name is actually in another language). It’s also problematic for people who work for global companies. Walmart has subsidiary research labs in various countries, but they aren’t represented in ROR. IBM has a research lab in Cambridge, but it isn’t represented in ROR. Brown University has a campus in Bologna that isn’t represented. I guess it depends on whether you take the name from the author or the ROR database. The ROR database has multiple names for each institution, and there are complicated relationships between institutions.

This came up in part because we use ROR is to show a geographic view of a corpus and we were getting wrong locations. See Map view of Cryptology ePrint Archive statistics for example.

Hi Kevin, @mccurley

Thanks for contribution on that. A few comments to link your statements to the “raw affiliation”.

ROR historically got its data from grid, which was an initial export of wikidata (in wikidata most company have unique local headquarter entry).
regarding other entries (not companies). This leads to current ROR company names similar to wikidata names like “GlaxoSmithKline (Germany)” Wikidata : Q29123139, ROR 05gedqb32

The usage of ROR is used in different ways to abstract Organisations which is closely related to how research is financed and organised. Italian national funder CNR recorded ROR institutions that are headquarters based (you don’t have ROR records for the city branches, they all are under the same organisation-acronym-name) . Spanish national funder CSIC has research lab institutions labs in ROR that are quite city centric, often attached to a university.

  • ROR has labels for languages ex : 02k8cbn47 (so not an issue to resolve I think)

  • you are free to represent the Brown University Campus in Bologna. “Georgetown University in Qatar” exists ROR : 029e47x73. It seems unfair to kind of force an author to affiliate a work to ROR based in US, when you are actually working in Italy. The proper management of raw affiliation (without ROR) in the publication chain allows mitigating this risk.

  • raw affiliation contains the real place, city where people are actually living and working. This will allow creating maps not relying (only) on ROR unperfect locations.

Renaud,

PS : regarding the ROR search for end users, the scientists. I remember using Grid search. I think they were indexing the wikipedia text pages. This means if a company had a old name mentioned in wikipedia text (not even in grid labels or anywhere). Grid could push up this record in results. I can’t 100% confirm that, GRID search is not active any more. This behaviour is not the case in ROR search : ( ex: https // ror org search?query = Nitta Gelatin (in wikipedia Arkema French text page) => does not return Arkema

ROR is always more than happy to accept requests for new records, and our usual processing time is 2-4 weeks. Our curation request form is available at https://curation-request.ror.org, or you can submit issues directly to our issue queue on GitHub at Issues · ror-community/ror-updates · GitHub

Brown does not seem to have a campus in Bologna – it’s just a study abroad program. Brown in Bologna | Office of Global Engagement | Brown University However, we’re willing to evaluate that and the other organizations you mention for addition to ROR if you submit requests for them with the relevant information.

There are many global companies in ROR, as @Ren helpfully points out, and we did inherit the way those are managed from GRID, and that emulates Wikidata. Most global companies should have their local headquarters represented in ROR. See for instance Nokia, 3M, and Google. Again, we’re happy to accept requests to update our data.

By the way, I have also since learned, @Ren and @apsmith, that many publishers do send full affiliation strings to Crossref and store them in the name field in affiliation, so perhaps that’s something APS could do as well, though I can’t think it optimal in terms of structured metadata.

Thanks Amanda,

Using the name for affiliation is not optimal yes. But if for now, this is the only way to keep the raw affiliation open data. Could this be added to crossref documentation.

Do we know if Elsevier plans to implement ROR in crossref soon ?

Note : I start seeing labs with existing ROR, not associated in APS by the user. (the user is just linking the university ROR), lab disappears from crossref and their client databases like OpenAlex.
OpenAlex starts working on tools to help community fixing/curating ROR association in their database. https www. youtube . com / watch ? v = OIFHhz2OQPg (they use crossref data, crossref is the main source of trust, if the lab name is not there anymore. curators won’t be able do much)

2 Likes

Elsevier has indeed been saying they will begin using ROR in Crossref, but it’s time for me to check back with them!

1 Like