TU Berlin joins OpenAPC

We welcome the Technische Universität Berlin as new contributing institution!

The TU Berlin University Library is in charge of the University’s Open Access Publishing Fund, which is supported under the DFG’s Open Access Publishing Programme.

Contact Person is Steffi Grimm.

Cost data

The first dataset provided by the TU covers publication fees for 30 articles. Total expenditure amounts to 40 074€ and the average fee is 1 336€.

The following table shows the payments the TU Berlin University Library has made to publishers in 2017.

  Articles Fees paid in EURO Mean Fee paid
MDPI AG 14 17019 1216
Public Library of Science (PLoS) 4 6193 1548
Frontiers Media SA 3 5490 1830
Ubiquity Press, Ltd. 2 2500 1250
AIP Publishing 1 1325 1325
BMJ 1 1425 1425
Cambridge University Press (CUP) 1 634 634
Copernicus GmbH 1 1067 1067
Hindawi Publishing Corporation 1 1768 1768
Mechanical Engineering Faculty in Slavonski Brod 1 718 718
Springer Nature 1 1934 1934

Overview

With the recent contribution included, the overall APC data for the TU Berlin now looks as follows:

Fees paid per publisher (in EURO)

plot of chunk tree_tuberlin_2018_01_30_full

Average costs per year (in EURO)

plot of chunk box_tuberlin_2018_01_30_year_full

Average costs per publisher (in EURO)

plot of chunk box_tuberlin_2018_01_30_publisher_full

Reverse DOI lookup using the Crossref API

When preparing the jisc collections data for ingestion into OpenAPC in September 2017, more than 8000 articles unfortunately had to be excluded since no DOI had been supplied for them. However, a closer investigation revealed that a significant share of those entries, while lacking persistent identifiers, did include the title of the article (about 3000).

In principle, knowledge of an article title allows for rather easy retrival of the associated DOI: It usually involves copy-pasting the string into a search engine, following the first link to a landing page or full text PDF and looking for the DOI there. While this is doable for a small number of data points, it quickly becomes infeasible for larger sets of data, and 3000 articles is clearly too much to look up manually. Fortunately, there is a solution: At OpenAPC, we usually employ the Crossref REST API to import metadata like ISSNs, journal title or publisher using the DOI as query (that’s why we put so much emphasis on this identifier). However, the API can also be used to search various other metadata fields - like the title of an article.

Let’s have a look at an example from those 3000 articles in question: We want to look up the DOI for an article named

“The phosphorylation of Hsp20 enhances its association with amyloid-<beta> to increase protection against neuronal cell death”

The according query to the Crossref API looks as follows:

https://api.crossref.org/works?rows=5&query.title=The+phosphorylation +of+Hsp20+enhances+its+association+with+amyloid-%3Cbeta%3E+to +increase+protection+against+neuronal+cell+death

This returns a quite comprehensive JSON structure. Using the URL parameter rows=5 will ensure that a maximum of 5 results will be delivered, which is more than sufficient - usually the first one will already be what we are looking for, as it is the case here:

` “DOI”: “10.1016/j.mcn.2014.05.002”, `

` […] `

` “title”: [“The phosphorylation of Hsp20 enhances its association with amyloid-\u03b2 to increase protection against neuronal cell death”] `

However, taking a look at the title string reveals a problem - the spelling in Crossref might differ from our search title (Here the greek letter β is correctly encoded in Unicode in Crossref, but written out as a word in our data). This is an obstacle for automated lookup, because a simple string comparison will fail in these cases. The solution is to use a string metric like the Levenshtein distance which provides a measure of difference between two character sequences. In our case we employ a so-called Levenshtein ratio (LR), which is the Levenshtein distance, normalised to string length. A LR of 1.0 would indicate that the two strings are identical, while a LR 0.0 would mean that they are totally different. In our example, the LR between the Crossref title and our search title is 0.97, which indicates a high level of similarity.

Now we have everything at hand to put together a script for automated reverse DOI lookup which does the following:

For every DOI-less article with a title:

  • query the Crossref API, using the article title as search string
  • for every returned result, calculate the Levenshtein ratio (LR) between the query title and the result title
  • Take the result with the highest LR and do the following:
    • If it’s higher than 0.9: Assume identity and import the associated DOI.
    • If it’s between 0.8 and 0.9: Present the strings to the user and ask what to do.
    • If it’s lower than 0.8: Discard results.

There’s no deeper magic behind those numbers, the 0.8 and 0.9 thresholds simply turn out to work pretty well in practice.

screenshot of the lookup script

While the whole process took some time, results were very convincing. Here’s a breakdown for the 2970 articles in question:

match type Levenshtein ratio count percent
perfect 1.0 1469 49.5
good between 1.0 and 0.9 935 31.5
possible between 0.9 and 0.8 114 3.8
no match lower than 0.8 452 15.2

Altogether, this method could retrieve more than 2400 DOIs automatically. In the end, this led to a net ingestion of 1520 new articles into the OpenAPC core data file.

As always, the python code has been put under an MIT license, so feel free to reuse and experiment!

University of Hannover reports its 2017 APC expenditures

The Leibniz Universität Hannover has provided its APC expenditures for the 2017 period.

Leibniz Universität Hannover’s Open Access publishing fund has received support from the DFG under its Open-Access Publishing Programme since 2013.

Contact persons is Marco Tullney.

Cost data

The new dataset covers publication fees for 40 articles. Total expenditure amounts to 56 622€ and the average fee is 1 416€.

The following table shows the payments the University has made to publishers in 2017.

  Articles Fees paid in EURO Mean Fee paid
MDPI AG 14 21605 1543
Copernicus GmbH 7 4723 675
Springer Nature 4 6697 1674
Public Library of Science (PLoS) 3 4832 1611
Dove Medical Press Ltd. 2 3651 1825
Institute of Electrical & Electronics Engineers (IEEE) 2 3178 1589
AIP Publishing 1 1519 1519
Hindawi Publishing Corporation 1 1877 1877
Informa UK Limited 1 1334 1334
IOP Publishing 1 1392 1392
JMIR Publications Inc. 1 1525 1525
Royal Society of Chemistry (RSC) 1 663 663
Schweizerbart 1 1806 1806
Wiley-Blackwell 1 1821 1821

Overview

With the recent contribution included, the overall APC data for the Leibniz Universität Hannover now looks as follows:

Fees paid per publisher (in EURO)

plot of chunk tree_hannover_2018_01_15_full

Average costs per year (in EURO)

plot of chunk box_hannover_2018_01_15_year_full

Average costs per publisher (in EURO)

plot of chunk box_hannover_2018_01_15_publisher_full