Analysing the article coverage of offsetting contracts

As a result of the first ESAC Offsetting Workshop 2016, the collection of articles published under offsetting contracts like the Springer Compact agreements has been established as a side project of OpenAPC. Data providers include the Austrian Academic Library Consortium (KEMÖ), the Max Planck Digital Library, VSNU / UKB for all Dutch universities, the Bibsam Consortium for Sweden and JISC Collections for the UK. The data collection starts with the first data from 2015, the years 2016 and 2017 are now completely available.

While those articles are not associated with cost data in the sense of APCs, they can still be aggregated and visualised with treemaps, using a simple numerical count as measurement. With the offsetting collection now covering more than 13,000 articles from 3 years, it became feasible to attempt measuring the coverage of offsetting contracts by answering 3 questions:

For every hybrid journal appearing in the offsetting data set:

  1. How many articles have been published in a distinct period (usually a calendar year)?
  2. How many of those were published Open Access?
  3. How many of those were published under offsetting contracts?

The analysis is especially appealing for the Springer Compact agreements: Since all current project members are reporting their data to OpenAPC, the article base should be (almost) complete. This would provide important insights up to which scope the articles financed through the mentioned offsetting contracts have increased the open access shares in Springer Compact journals and how offsetting is contributing to the goal of the OA2020 initiative to flip journals from the subscription system into open access.

Method

While the idea itself is straightforward, obtaining the required data on journal publication numbers turned out to be tricky. Our first approach, inspired by the Hybrid OA Monitor of our former colleague Najko Jahn, was to make use of the Crossref API. As an example, this query obtains journal metrics on one of the largest journals (in terms of offsetting articles) in our collection, European Radiology, from 2015 up to now. Digging through the returned JSON structure manually is cumbersome, but the relevant information can be found quite easily:

"license": {
    "value-count": 3,
        "values": {
            "http:\/\/www.springer.com\/tdm": 1263,
            "http:\/\/creativecommons.org\/licenses\/by\/4.0": 171,
            "http:\/\/creativecommons.org\/licenses\/by-nc\/4.0": 43
        }
    }
},

and

"total-results": 1772,

So, according to Crossref 1,772 articles have been published in European Radiology from 2015 until the current date (March 21, 2018), with 214 of them being Open Access (indicated by the sum of all CC license occurences). While these results do not seem to be implausible, let’s verify them by consulting a second source: SpringerLink, the publisher’s own web portal which also features a journal search function. Using the same parameters, we get this results page for the total number of articles and this one for the number of OA articles. As of today we end up with 1,956 total articles and 272 of them being OA. So there’s quite a spread in numbers, and unfortunately this is no exceptional case. Here are some more results for the 5 journals occuring most often in the offsetting dataset, also received on 2018-03-21:

Journal # Articles (Crossref) # Articles (SpringerLink) # OA Articles (Crossref) # OA Articles (SpringerLink)
European Radiology 1772 1956 214 272
Synthese 1117 1242 176 187
Diabetologia 1098 1185 241 321
J. of Autism and Developmental Disorders 1180 1371 132 174
J. of Business Ethics 1316 1859 119 142

The results are clear: When it comes to journal metrics (both OA and total), Crossref data is too sketchy to rely on.

(Update (2018-04-16): As Najko Jahn correctly points out, the gap between SpringerLink and Crossref numbers can be explained with filtering issues: Springer journals may contain “Online First” articles which are not assigned to a print issue and therefore won’t show up in a search for print publication dates on Crossref. We will have to look more closely into this.)

This brought us directly to our second approach: Using the statistics from SpringerLink instead of Crossref. Technically there are two ways to do this: The first is one is to download search result files from SpringerLink in CSV format and count the the number of entries within, the second one is to make use of web scraping to directly read the results count from the search page. While this approach lead to an accurate count of articles, it had an unforseen drawback which rendered it unsuable in the end: The time frame of the articles on SpringerLink is not identical to the one in our collection. While the “period” field in the offsetting dataset relates to the acceptance date of an article, the date filter on SpringerLink relates to the print publication date, which usually occurs significantly later, often not until the next calendar year. In our experience there is no possibility to use another type of date as filter on SpringerLink, so simple usage of these numbers would lead to incorrect comparisons and thus is not an option.

In the end we had to include an additional normalisation step to make the dates compatible: This would include downloading the aforementioned CSV lists for every journal present in our dataset from SpringerLink, looking up every single offsetting article in those lists (via the DOI) and retreiving the print publication date. This would effectively convert our offsetting data to “Springer print publication time” and make it comparable to the SpringerLink journal metrics. We would then aggregate the articles into journals and years and create an OLAP cube and a treemap visualisation.

While this approach proved successful and made an analysis of offsetting coverage possible, there are 2 potential problems:

  • Since the time frame of the offsetting_coverage dataset has been altered, it is no longer comparable to the original offsetting collection in terms of years (for instance, offsetting lists 6,892 articles for the 2016 period in contrast to 5,113 articles in offsetting_coverage).
  • It’s not a catch-all solution, since it relies on Springer’s web portal and search engine. A long-term approach should be based on a public, publisher-independent data source like Crossref, but as we saw this would likely require improvements in data comprehension and quality.

Results

The following brief presentation of the results relates to the publication years 2016 and 2017, as the data for the above-mentioned facilities are completely available for these two years. In total, 13,941 items were placed in open access through the offsetting contracts mentioned above. This corresponds to 4.28% of the total number of articles in the Springer Compact Journals (326,008) during this period. We were able to find a total of 27,622 open access articles in the Springer Compact journals, which corresponds to a share of 8.47%. Thus, offsetting was responsible for just over half of all open access articles in the hybrid Springer Compact journals.

In a total of 500 Springer Compact journals, the open access articles were even financed exclusively by offsetting, but very few of these journals achieved ever higher open access shares. Only two journals from this group have an open access percentage greater than or equal to 50% (Cambridge Journal of Evidence-Based Policing with 66.67% and The Astronomy and Astrophysics Review with exactly 50%).

The following figure shows all hybrid Springer Compact journals with open access shares greater than or equal to 50%.

table top offsetting journals

In 2017, at least one offsetting article was published in a total of 1,311 Springer Compact journals, of which only 13 journals achieved an open access share of greater than or equal to 50%. If one assumes 1,700 Springer Compact journals altogether, no offsetting articles have been published in around 400 titles. Based on 1,700 Springer Compact journals in 2017, only 0.76% of the journals have achieved an open access share of greater than or equal to 50%.

Furthermore, we observe strong fluctuations within individual journals, especially if the amount of published articles per journal is overall low. In 2017, for example, the journal Psychotherapy Forum had an open acces share of 95%. Of these, only 5.26% were caused by offsetting. In 2016, the same journal had an open access share of 10.71%, of which 100% was financed by offsetting.

The first interim conclusion is that offsetting has contributed to a significant increase in open access articles in some of the hybrid Springer Compact journals from 2016 to 2017. Around 24% of all Springer Compact journals do not appear to be relevant as publication sites for the academic institutions involved in the offsetting contracts. So far, numbers and distribution of additional open access articles generated through the above-mentioned offsetting agreements are not enough yet to flip individual journals completely into open access.

University of Stuttgart joins OpenAPC

We welcome the University of Stuttgart as new contributing institution!

The University Library of Stuttgart (ULS) is in charge of the University’s Open Access Publishing Fund, which is supported under the DFG’s Open Access Publishing Programme.

Contact Person is Stefan Drößler.

Cost data

The initial datasets provided by the University Library of Stuttgart cover publication fees for 210 articles published from 2011 to 2017. Total expenditure amounts to 254 329€ and the average fee is 1 211€.

  Articles Fees paid in EURO Mean Fee paid
Optical Society of America (OSA) 42 61630 1467
Springer Nature 35 54519 1558
Springer Science + Business Media 20 20096 1005
Frontiers Media SA 15 20073 1338
Copernicus GmbH 13 11074 852
MDPI AG 13 12643 973
Public Library of Science (PLoS) 12 15359 1280
International Union of Crystallography (IUCr) 11 1466 133
Hindawi Publishing Corporation 7 6331 904
American Physical Society (APS) 5 8307 1662
Scientific Research Publishing, Inc, 5 2980 596
Wiley-Blackwell 5 9188 1838
Elsevier BV 4 745 186
IOP Publishing 3 3147 1049
AIP Publishing 2 2241 1121
American Association for the Advancement of Science (AAAS) 2 5371 2685
Nature Publishing Group 2 3975 1988
Oxford University Press (OUP) 2 3290 1645
Schweizerbart 2 3017 1508
American Society for Microbiology 1 2551 2551
Bentham Science Publishers Ltd. 1 444 444
Canadian Center of Science and Education 1 283 283
Dove Medical Press Ltd. 1 1906 1906
EDP Sciences 1 800 800
Gesellschaft für Erdkunde zu Berlin 1 500 500
Informa UK Limited 1 352 352
PeerJ 1 859 859
Redfame Publishing 1 284 284
SAGE Publications 1 898 898

Overview

A detailed analysis of the contributed data provides the following overview:

Fees paid per publisher (in EURO)

plot of chunk tree_stuttgart_2018_03_20_full

Average costs per year (in EURO)

plot of chunk box_stuttgart_2018_03_20_year_full

Average costs per publisher (in EURO)

plot of chunk box_stuttgart_2018_03_20_publisher_full

Bielefeld University shifts to APC data harvesting

Bielefeld University has changed its workflows and will now provide APC data directly via its institutional repository PUB. An initial data set has been harvested via OAI-PMH.

Bielefeld University Library is in charge of Bielefeld University’s Open Access Publishing Fund, which is supported by the DFG under its Open-Access Publishing Programme.

Contact person is Dirk Pieper.

Cost data

The new data covers publication fees for 77 articles. Total expenditure amounts to 114 756€ and the average fee is 1 490€.

  Articles Fees paid in EURO Mean Fee paid
Springer Nature 34 47419 1395
Frontiers Media SA 15 31110 2074
Public Library of Science (PLoS) 8 13378 1672
MDPI AG 6 4540 757
Dove Medical Press Ltd. 3 5979 1993
Scientific Research Publishing, Inc, 3 3059 1020
Wiley-Blackwell 2 3641 1821
American Society for Microbiology 1 630 630
BMJ 1 1495 1495
Elsevier BV 1 1601 1601
Informa UK Limited 1 738 738
Korean Society for Microbiology and Biotechnology 1 1026 1026
Scientific & Academic Publishing 1 140 140

Overview

With the first set of harvest data included, the overall APC data for Bielefeld now looks as follows:

Fees paid per publisher (in EURO)

plot of chunk tree_bielefeld_2018_03_19_full

Average costs per year (in EURO)

plot of chunk box_bielefeld_2018_03_19_year_full

Average costs per publisher (in EURO)

plot of chunk box_bielefeld_2018_03_19_publisher_full