Interpreting Public Data File


I’m hoping to have a play with the Crossref public data file to see what types of analysis might be possible with it. I’ve got the file downloaded and have opened one of the individual JSON files in a text editor to start to explore the contents. I have a few questions if anybody is able to help?

  • Is there any documentation on the structure of these documents?
  • What does each item in the main array represent? Are they a mixture of the different types that can be returned by the API? E.g. funders, members and works etc…
  • Does the JSON map to a particular schema that will help me interpret it?



Hi @jswainston,

Thanks for your message, and welcome to the community forum!

We don’t currently have a JSON output schema, but we do have an existing feature request into our technical team to develop one. We have not currently prioritized this work. We have had other metadata users request this output schema, and the more users who request it will mean that we increase the priority. For now, it is a consideration in our backlog. You can follow our progress or contribute to the discussion here: As a metadata user, I'd like a JSON output schema for the REST API (#317) · Issues · crossref / User stories · GitLab

I will say that the JSON output does map to the input schema, and we have vast documentation for that. You may review the most-recent version of the XML input schema here: Schema documentation for crossref5.3.1.xsd

As for what each item represents, our Swagger documentation may prove helpful for that bit. It is available here: Swagger UI

Take a look at these resources and do let me know if you have any follow-up questions,