In a data exploration of all works in CrossRef I used the July 6, 2023 Facets counts.
The JSON gives me values of the total count per works type, as so:
I put the “works type” versus “counts” into a dataset and calculated the total number of works from the dataset: 146251674.
I noticed that the JSON includes a total-results count as well, which is: 146284740.
Are these two totals supposed to be the same?
What causes the difference of 33066?
Hi Dave, thanks for your many good questions lately.
In this case, there may be one or more explanation for differences between
total-results counts and the sum of facet counts by
Some items in our corpus may have
type=null and would not show up in the facet counts.
On top of that,
total-results counts and facet counts may or may not be 100% accurate at all times due to optimizations or limitations of Elasticsearch, the underlying system powering the REST API.
There may be other explanations for slight differences in counts at different times given the dynamic nature of such a large corpus.
While I can’t offer a complete accounting for this specific discrepancy, these are some possible explanations.
We may be revisiting how we calculate and make available record counts in the future to make them more consistent and performant. Please stay tuned.