Analysing hybrid Elsevier articles for correct access rights

Having read Ross Mounce’s interesting article on wrongly paywalled Open Access articles in hybrid Elsevier journals, I came up with the idea of throwing a bit more data at the issue. We have a lot of those articles in our OpenAPC data collection (2630 articles in 670 different journals, to be precise), and since the contributing institutions reported their APC costs for all of them, we can be pretty confident that they were all paid for and none should be paywalled at the moment.

So, after fiddling around with some regular expressions and the structure of the landing pages at sciencedirect.com, I came up with a small python script which does the following:

Resolving and obtaining the page contents is quite time consuming, so the whole process took some hours. Here is a screenshot of the script’s output while being busy:

screenshot of the test script while running

All green at the University of Bristol - but Elsevier is not off the hook yet. :-) The script collects all negative results and prints out a concise summary after finishing. So let’s have a look at the bottom line:

screenshot of the test script after finishing

Surprise - 6 articles out of 2630 (or about 0.23%) were still hidden behind a paywall where it should not be the case according to our data.

Here is a detailed overview of the articles in question:

institution period APCs paid (€) DOI journal
Wellcome Trust 2015 1829.97 10.1016/j.jth.2015.04.502 10.1016/j.jth.2015.06.005 Journal of Transport & Health
University College London 2014 822.53 10.1016/j.jns.2013.01.022 Journal of the Neurological Sciences
University College London (confirmed case) 2014 2257.64 10.1378/chest.13-0179 Chest
University of Glasgow 2014 1882.5 10.1016/j.renene.2014.11.024 Renewable Energy
University of Cambridge 2015 2294.44 10.1016/j.epsl.2014.11.034 Earth and Planetary Science Letters
University of Milan 2016 400 10.1016/j.puhe.2016.10.024 Public Health

 

To be fair, while we can clearly tell that something is wrong here, it’s too early to lay the blame on Elsevier yet. Five of the items in question were part of aggregated data sets (The first one is from the Wellcome Trust open data, the next 4 articles from British universities were contained in the JISC collections data), so it is quite possible that something went wrong during the aggregation process. The article from Milan University has only been published this month, so maybe the transaction has not been fully processed yet and the full text will become available in the near future.

Nonetheless, this issue seems worth to investigate further. We will try to contact the involved institutions and see if we can shed some light upon this.

Update, Mar 1, 2017:

We have now cleared up 4 of the 6 cases:

We have corrected all of the entries mentioned above in our data collection.

As a side note, according to UCL’s records the original payment was not payable to Elsevier, but to the ACCP, and it seems the journal has moved to Elsevier since then. This is interesting as it identifies changes in journal ownership as another possible source of such mistakes.