(I’m not sure if Reports is really the best section to post this – feel free to move if it fits better elsewhere.)
I just received our resolution report for February, and was surprised to see a very low ratio of successful resolutions (50 failures out of 237 attempts – compare with January, where there were only three failures out of 136 attempts). Looking at the failed DOI resolutions, two are valid failures from a title I’d forgotten to submit, but 48 are not due to actual incorrect DOIs, but attempts to resolve addresses that include the base URL of our final landing page instead of the actual DOI string.
That is perhaps not very clear, so let me explain: the landing page URLs on our website all have the format /titles/[ISBN] or /series/[ISSN]; these then redirect to details.asp?id=[identifier] and series.asp?issn=[ISSN], respectively. These latter URLs are not meant to be used externally, the pretty URLs registered as landing pages with Crossref taking precedence.
The 48 failed resolutions in question are for 10.XXXXX/DETAILS.ASP (44 failures), 10.XXXXX/SERIES.ASP (three failures) and 10.XXXXX/SOME-IMAGE-NAME.GIF (one failure). That is, with the actual DOI string registered replaced by the base URL (without query string) of the landing page afterredirection, or in one instance the URL of a random, tiny image somewhere on the landing page.
Given that these have never appeared before, but account for about 20% of all our resolution attempts in February, they worry me a bit. I assume they must be the result of some sort of scraping mechanism, but I wanted to know if this sort of thing is a known issue that has been reported before? Might it indicate some kind of problem that we can do anything about on our end?
Patterns of failed resolutions like that are almost certainly the result of bots/crawlers.
The purpose of sending the failed resolution csv along with the resolution report is to alert to any possible gaps in your DOI registration workflow that resulted in you publishing or distributing DOIs without having first registered them.
So, all you really need to do is review the list and see if there are any legitimate DOIs on there that you have assigned to your publications. If so, they should be registered as soon as possible.
Otherwise, you can fairly assume that the remaining failed resolutions are the result of bot activity or human error (for example, accidentally picking up a stray full stop at the end of a DOI when copy-pasting from a reference). Those can be safely ignored.
We’re seeing something similar, I think. Our primary content source uses DOIs in the form 10.54846/jshap/#### where the #### is the internal manuscript tracking number. Essentially all of our failed DOI lookups this past month are in the form 10.54846/JSHAP/filename.PDF where the filename is something like V16N5P254 – how I store the HTML and PDF forms of the papers on our web server. Except that the actual filename is lowercase, v16n5p254 for example, and the server is case sensitive.
I’ve tried googling a sampling of the incorrect DOIs in quotes to try to see whether I might have written some code where I output the wrong field, but no results. (And no surprise; I would have to have written unnecessary code to incorrectly change the filename to uppercase.)
Interestingly, our landing page for correct DOIs is a script that outputs the key information (title, authors, etc) as HTML and includes a link to the PDF… but that link is in the form …retrieval_page?id=#### where the ID is just an auto-generated primary key in a database table and has no relationship to the manuscript number in the DOI.
Anyway, supporting the stupid [my term] web crawler hypotheis shared by @Shayn, the HTML versions of our papers link to the DOI at doi.org immediately followed by a relative link to the PDF in the same folder/directory/path, so the closing tag for the link to 10.54846/JSHAP/#### is immediately followed by the opening a href=”filename.pdf” tag. Somehow some bot must be conflating these… though it also somehow decides to remove the ####? And make everything all-caps? Well, like I said, stupid bot.
At least for us, it’s only about 6% of our attempted resolutions that are failures of this sort.
(edits to fix some of the things I’d typed that were swallowed/transformed by the editor/posting mechanism filtering out html or, surprisingly, accepting it. Sorry for the weird formatting. I’ll try to remember to try the markdown editor next time.)