Ticket of the month - March 2022 - Getting started with REST API queries

Hi @ifarley,
Thank you for your quick answer and help.
I ve tried the request you suggested me and it works (I added &rows=1000&offset=0 because otherwise I retrieved only 56 docs). I manage to download the metadata of articles published in this journal, except the cited references, which is the category we are working on with our bibliometric methods.
Do you know why it is possible to dl the cited references when I focus on one specific articles metadata and not when I want to download a full set of metadata?
Thanks a lot for your help.
Alex

Hi @AlRen ,

Thanks for following up. I think this query might give you everything you’re looking for. I’ve tried to reduce some of the noise and only select for elements of interest. So, at the time of this writing, there are only 667 DOIs registered for this journal that include reference metadata. The remaining DOIs have not had reference metadata registered for them.

https://api.crossref.org/journals/1939-1854/works?mailto=support@crossref.org&filter=has-references:true&select=DOI,title,type,reference,references-count,is-referenced-by-count&rows=700

My best,
Isaac

Thanks again @ifarley. I downloaded the 667 metadata entries.
May I ask you two more questions and a last request.

1/ could you decrypt the request you wrote, I mean to explain the logic of it? Then I can reproduce it later fully aware of what I do.

Q1: Is it normal that the cited references don’t have the same form. indeed, some are cited this way:

{“doi-asserted-by”:“crossref”,“unstructured”:“Austin. Another view of dynamic criteria: A critical reanalysis of Barrett, Caldwell, and Alexander. 42 583 1989 », « key”:“r2_10.1037/0021-9010.86.3.446”,“DOI”:“10.1111/j.1744-6570.1989.tb00670.x”}

while other are cited this way:

{« doi-asserted-by":“publisher”,“key”:“r5_10.1037/0021-9010.86.3.446”,“DOI”:“10.2307/256190”},

I am used to use Scopus data and we collect raw data, exactly as they are cited by authors in their publications. Does crossref clean the cited references?

Q2: Do you have any idea why only 667 have references? Is it because the journal is poorly referenced and it doesn’t provide full data to Crossref? Or is it because most of its publication doesn’t cite any references? In both case the differential between 10K documents and 667 is impressive…

Thanks a lot.
Sincerely
Alex

Hey @AlRen ,

See my answers below.

Sure thing.

https://api.crossref.org/journals/1939-1854/works?mailto=support@crossref.org&filter=has-references:true&select=DOI,title,type,reference,references-count,is-referenced-by-count&rows=700

I’ll go in order of the query, so you can follow the logic:

  1. I’m querying the journals route for any works with ISSN 1939-1854.

  2. By including my email address as a mailto parameter, my query is being performed by our Polite pool of the API.

  3. I am then filtering the responses so that I only get results for DOIs with ISSN 1939-1854 in the metadata that have references registered with us.

  4. Next, using the select parameter, I am telling the API to reduce some of the noise of the response, so I don’t get the full metadata record. Instead, I only want: DOI, title of the article registered for the DOI, content type of the DOI (these are all journal articles, so moot), the references registered for this DOI, the count of references registered for this DOI, and then the number of other Crossref DOIs that have cited this DOI (using the is-referenced-by-count parameter).

  5. And, finally, since I know there are 667 results and I want to see all of the results in the response, I have told the API to give me the first 700 rows (or, resulting records) back.

Q1: Is it normal that the cited references don’t have the same form. indeed, some are cited this way:

{“doi-asserted-by”:“crossref”,“unstructured”:“Austin. Another view of dynamic criteria: A critical reanalysis of Barrett, Caldwell, and Alexander. 42 583 1989 », « key”:“r2_10.1037/0021-9010.86.3.446”,“DOI”:“10.1111/j.1744-6570.1989.tb00670.x”}

while other are cited this way:

{« doi-asserted-by":“publisher”,“key”:“r5_10.1037/0021-9010.86.3.446”,“DOI”:“10.2307/256190”},

This is somewhat normal. I suspect that whoever was registering this content for American Psychological Association (APA), the Crossref member who stewards the Journal of Applied Psychology (ISSN 1939-1854), had the citation for the DOI matching 10.1111/j.1744-6570.1989.tb00670.x, but did not know the DOI, so they submitted the citation and we matched the DOI, so that is why the reference metadata is presented in this way.

For the reference metadata where only the DOI is present, American Psychological Association (APA) only gave us the DOI to establish the cited-by match. They didn’t need to provide us with the citation because they had the DOI. That’s the most definitive way to establish a cited-by match between the citing and cited DOIs.

Does crossref clean the cited references?

No.

Q2: Do you have any idea why only 667 have references? Is it because the journal is poorly referenced and it doesn’t provide full data to Crossref? Or is it because most of its publication doesn’t cite any references? In both case the differential between 10K documents and 667 is impressive…

We have 667 DOIs with reference metadata for this ISSN because this is what American Psychological Association (APA) has registered with us. Unfortunately, the questions about the content and why only 667 DOIs have reference metadata registered are best answered by APA. I can tell you that we do not require our members to register reference metadata, so it is likely that there is reference metadata for DOIs of this journal that just have not (yet) been registered with us. Adding reference metadata to any existing Crossref DOI’s metadata record is free for our members, but they’d need to register the references.

Warm regards,
Isaac

Hello @ifarley
Thank you for all your answers and advice. They were very informative for me and helped me a lot. I have a broader question, not sure if this is the right place to ask it, but anyway. I wanted to explore CrossRef’s coverage of journals referenced in a scientific ranking. I found the right command to explore a single journal, but is it possible to get the list of journals present in CrossRef, or at least the ISSN list? And if so, what information can be added: for example, would it be possible to have the years covered and the number of articles included in each journal?
Thank you for your help, which is always valuable.
regards
Alex

Hi Alex,

We don’t have queries that are going to give you exactly what you are requesting, but here are a few that get close/are a starting point:

This will give you all journal articles registered with us (I have selected for only the article DOI, article title, journal title, and ISSN):
https://api.crossref.org/works?filter=type:journal-article&select=DOI,title,container-title,ISSN&rows=1000&mailto=support@crossref.org

Similar to the first query, this gives you journal articles sorted by date created (or, registered with Crossref). The newest DOIs are atop the results:
https://api.crossref.org/works?sort=created&filter=type:journal-article&select=DOI,ISSN,container-title,created&rows=1000&mailto=support@crossref.org

If you know the ISSN you’re eager to see works for you, you can include it in your query, like this one below. These results are sorted in order of most recently created (registered with Crossref):
https://api.crossref.org/works?sort=created&filter=issn:1939-1854&select=DOI,title,container-title,created&rows=1000&mailto=support@crossref.org

My best,
Isaac

Hi Isaac,
I am back here with another question ^^
Do you know if it is possible to have the list of metadata available for a specific journal / or specific editor?
I have 2 specific concerns:

  • does the journal/editor provide: (i) the abstracts; (ii) the references cited

I will be interested for this journal Accounting, Organizations and Society (Print ISSN: 1873-6289 and e-ISSN: 0361-3682); and this editor; Springer with the prefix 10.1016

Thanks a lot.
Regards,
Alexandre

1 Like

Hi @AlRen ,

Good questions. You can retrieve this information in our REST API:

There are no works registered with us that include ISSN 1873-6289 in the metadata record, as you can see with this query: https://api.crossref.org/journals/18736289/works?mailto=support@crossref.org

As for works registered with us for ISSN 0361-3682, some have references registered:
https://api.crossref.org/journals/03613682/works?filter=has-references:true,&mailto=support@crossref.org

But, none have abstracts registered with us:
https://api.crossref.org/journals/03613682/works?filter=has-abstract:true,&mailto=support@crossref.org

You can confirm that here (all works are returned when including the parameter: filter=has-abstract:false)
https://api.crossref.org/journals/03613682/works?filter=has-abstract:false,&mailto=support@crossref.org

Which works have an abstract registered with us on prefix 10.1016:
https://api.crossref.org/prefixes/10.1016/works?filter=has-abstract:true&mailto=support@crossref.org

Which works have references registered with us on prefix 10.1016:
https://api.crossref.org/prefixes/10.1016/works?filter=has-references:true&mailto=support@crossref.org

My best,
Isaac

2 Likes

Thank you so much for your precise and so helpful answers. :raised_hands:
I think I have everything I need. Thank you so much again :trophy:

1 Like

You’re welcome! We’re always happy to help :slight_smile:

Hello, Isaac,

in your first message in this thread you wrote an example of a query, which processes the query tokens with OR:

My question is, how to make a query with AND operand between several tokens?

Hi @edgolovin ,

Thanks for following up.

You’re right that in the query: https://api.crossref.org/works?query.affiliation=Science+State+University&select=DOI,title,author&rows=500&mailto=support@crossref.org that I get back all results that include the word Science OR State OR University (that’s why I get back more than 12 million results), but the results are returned in order of their relevance for Science AND State AND University. That relevance is scored using a relevance score, you can see it if you include it in the select parameter, like this:

https://api.crossref.org/works?query.affiliation=Science+State+University&select=DOI,title,author,score&rows=500&mailto=support@crossref.org

My best,
Isaac

1 Like

Thank you, @ifarley.

Your answer is very clear and helpful. I will take in use this relevance score then.

Could you point out if there is any documentation on how this score is calculated, or maybe the particular name of the relevance model?

Hi @edgolovin ,

We use the default scoring in Elasticsearch (ES): Practical BM25 - Part 2: The BM25 Algorithm and its Variables | Elastic Blog. The query is scored against a field which is a concatenation of several metadata fields.

There is also one additional thing to keep in mind about these scores: in search engines, such as ES, scoring is not supposed to be meaningful across different queries, i.e. the score is not some sort of objective global measure of similarity. The number is not scaled to any known range, and it will depend a lot on the query itself. Scores are only supposed to allow us to compare the similarity of different indexed documents with the same query , and so it only enables us to sort the results for a given query.

My best,
Isaac

1 Like

hi Isaac,
I am wondering ig it is possible to write a query on the keywords used by articles’ authors ? I can’t find the command.
Thank you by advance for your help
sincerely

1 Like

Hi @AlRen ,

Unfortunately, keywords asserted by authors are not a part of the metadata record registered with Crossref, so there’s no way to search for an author’s keywords using our API.

-Isaac

Hello all,

For those of you looking to get started with using our API, I highly recommend the API 101 for publishers, researchers, and librarians with Postman and Crossref - ConTech.Live video tutorial hosted by my colleague Rachael Lammey, our Director of Product, and Claire Froelich, Student Community Manager at Postman. So well done!

-Isaac

Hello @ifarley,

I discovered that in order to access all work published in a particular journal, I need to query all journals ISSNs registered in Crossref. To do this, I search for a known ISSN using the following link: https://api.crossref.org/journals/{known_issn}. After obtaining the journal registered in Crossref ISSNs, I fetch the works for each ISSN separately.

However, I find this approach counterintuitive. When I read the FAQ for journal queries (crossref*/swagger-ui/index.html#/Journals/get_journals__issn_), the description states “Returns information about a journal with the given ISSN…,”, which leads me to believe that by providing one of the journal ISSNs, I should receive all the associated works regardless of the ISSN assigned to each work. Moreover, when I double-checked the works obtained with different ISSNs from several journals and received the same results, I reinforced in my misunderstanding. Only later I accidentally come across a journal where the lists of works differed between the print-ISSN and eISSN (and I know there can be more ISSNs, i.e. if journal renamed).

I wonder why this functionality isn’t implemented in a way where journal query api.crossref.org/journal/ with providing any of journal ISSNs directly gives access to all works associated with journal without the need for additional queries which mostly return duplicated result. Instead, now it seems to be a replica of another query https://api.crossref.org/works?filter=issn:{known_issn}.

1 Like

Good question @SibTony. We get versions of this question from our metadata users from time to time, so I’m glad you asked it here.

You’re right; it would be helpful for us to have a better journal identifier to use for retrieving metadata records across an entire journal. We do use an internal identifier within our system to aggregate all records against one journal. We call these internal identifiers publication or journal IDs, and you can see those IDs within some of our reporting features, like our journal depositor reports, which do aggregate records exactly like you’re wanting. Let’s use the EKSAKTA Journal of Sciences and Data Analysis from the Universitas Islam Indonesia (Islamic University of Indonesia) (10.20885) as an example:

In the REST API, there are different counts for the number of DOIs registered for the pISSN and eISSN:

Print: https://api.crossref.org/journals/25032364/works?mailto=support@crossref.org (116 total works)
versus
Electronic: https://api.crossref.org/journals/14111047/works?mailto=support@crossref.org (66 total works)

Our internal journal ID for the EKSAKTA Journal of Sciences and Data Analysis is 300093, and if I append that ID to our depositor report address, I can see all of the DOIs registered against this journal: https://data.crossref.org/depositorreport?pubid=J300093

Now, and I think it is important to note, there are legitimate business and publishing practices that would lead to different totals between works with the pISSN and eISSN. For instance, maybe some of the journal articles are online-only articles and the member has omitted the pISSN in the metadata records of works that only appear online. I can’t speak to the discrepancy for this specific journal, but the point is the issue with the REST API query is with the identifier being used in the query and not necessarily metadata errors or inconsistencies on the part of any individual Crossref members (albeit human and machine publishing and registration errors can certainly lead to differing counts in records as well).

We are aware of this, and are working on a longer-term solution for being able to retrieve all works via the REST API for a journal using a better global identifier than an ISSN. Unfortunately, we’re still early in those discussions, so there is no timetable for that arrival. Therefore you’ll need to take the above into consideration when searching for a full list of articles for a specific journal. Note: we also expose the journal IDs in our title list tool here: crossref.org : : Title List (click on the ID link on your result)

My best,
Isaac

2 Likes

@Shayn has published a great post on using Postman for API queries, which builds nicely on this post and thread: Ticket of the month - August 2023 - Using Postman for API Queries

My best,
Isaac