https://doi.org/10.13003/c23rw1d9
This is a companion discussion topic for the original entry at https://www.crossref.org/blog/news-crossref-and-retraction-watch
https://doi.org/10.13003/c23rw1d9
The Blog mentions âa community call on 27th September at 1 p.m. UTC to discuss this new development in the pursuit of research integrity.â how is that accessed? have details gone out already?
Hello! Thanks for reaching out. You can register here: Webinar Registration - Zoom
We are looking forward to seeing you online!
Rosa
Will the retraction dataset be fully integrated into Crossrefâs API? For example, will we be able to search for retractions by journal, author name, institution, etc.?
Yes, thatâs our plan. Geoffrey, our Director of Technology & Research, mentioned that very thing in a thread earlier today:
'Our recently announced opening of the RetractionWatch data will only ever be made available via the REST API.
-Isaac
Will the Retraction Watch Hijacked Journals Checker also be implemented somehow, that would be marvelous!!!
Hey, Crossref, I wanted to let you know that the csv version of the database that can be directly downloaded from this article has mixed character encoding. Itâs mostly UTF-8, but has embedded Windows-1252, which frequently happens when copy-pasting from different sources. This unfortunately makes importing that file into any program a real PITA to figure out.
I strongly encourage you to correct the encoding and replace the version at that link (and anywhere else it lives, if there are multiple endpoints). Heck, email me, and I will send you a version in pure UTF-8. You will save many people the time and frustration it takes to figure out why none of the normal encodings are working and to find an appropriate conversion tool.
In the meantime, a tip for anyone using Python: UnicodeDammit.detwingle() can be used to fix this file.
Thanks for that tip!
The context provided by my colleagues on our technical team who are working most closely on the Retraction Watch data is that the character encoding of the file that is given to us by Retraction Watch is somewhat broken. We can either decode it as UTF-8, ignoring errors like the ones youâve pointed out, or decode as Latin1, which incorrectly displays Russian names (among other things).
In January, we switched the encoding from Latin1 to UTF-8, just ignoring the errors. Neither solution is ideal, but thatâs the trade-off we have for the moment.
Another workaround in Python is to pass errors=âignoreâ to the decode function - there will still be a few problematic entries, but it should make using it go more smoothly.
We are working with Retraction Watch to eventually build a system that will produce the data without those encoding errors.