Analysing JournalTOCs as a possible source of journal hybrid status

One of OpenAPC’s key features is the usage of many external information providers for automated metadata enrichment, both to improve efficiency and to relieve our participating institutions from having to compile large sets of bibliographical data along with their reported articles. Here’s an overview of the whole process:

openapc enrichment overview

Naturally we are always keeping a lookout for additional sources to simplify the ingestion process even more, and the most obvious candidate to be switched to an external import is the is_hybrid field. It is a simple boolean indicator which describes if the journal the article was published in is either Gold/Fully OA (FALSE) or hybrid (TRUE). Given that a classification scheme of journal OA types is a complex bibliometrical question, our distinction is necessarily very rough and the value TRUE should more or less be interpreted as “anything else than Gold OA” (It also includes journals with a delayed OA/moving wall policy, for example).

There are many approaches to create listings of gold OA journals (Most notably the DOAJ or ROAD, but also compiled reports like the ISSN Gold OA list created by our colleagues at the INTACT OA analytics group) and there’s some inclination to use those compilations as “inverted classifiers” (“If it’s not on the list, it’s not fully OA.”), but in our experience this is no valid approach - in our daily work at OpenAPC we often stumble upon journals which are clearly OA but not part of any known list. Our conclusion is that a source of journal OA types should go beyond a one-sided list and report explictly both on fully and hybrid OA status. Unfortunately, a service which holds the desired information (and offers a public API) seems hard to come by.

At our OpenAPC workshop held on the German Librarians’ Day this year the JournalTOCs site was brought to our attention. As the name implies, the scope of this service is to provide metadata on journal contents (TOCs), but it also features a classification of journal types and it offers an API (free after registration). Since this sounded promising, we went ahead and conducted an analysis of the API and data quality.

Technical details

Out of the four different calls the JournalTOCs API offers, the “Search for Journals” is what we are looking for. It accepts both journal titles and ISSNs and returns a list of possible matches along with some metadata. The construction of queries is straightforward and the results are easy to parse and seem appropiate in terms of precision. However, there are two problems:

  • At times the system fails to answer at all, leading to socket timeouts. This is no dealbreaker, but it requires careful monitoring and event handling when processing large batches of journals.
  • Unfortunately, the API does not return the journal type, which is the only piece of information we are interested in - the type can only be found on a journal’s individual landing page. This forced us to construct a complicated workaround:
    • Search for a journal using a title string or ISSN
    • obtain the according journaltocID (an internal identifier)
    • Use the journaltocID to naviagte to the journal’s landing page
    • Extract the journal type from HTML via web scraping

We wrote a small python script which implements this heuristic. As with all OpenAPC software it’s placed under an MIT license, so feel free to reuse and adapt it.

Analysis

When our analysis took place, the OpenAPC and offsetting data sets combined contained 6,210 unique journal titles. The journal_full_title is not a perfect identifier since there are cases of titles belonging to different journals (For example there are two journals named “Medicine”, one published by Elsevier and one by Wolters Kluwer), but those are so rare that we can ignore them. In a preprocessing step, our script searched through the data and created a mapping table of all journal titles, associated ISSNs and hybrid status. 18 journals were identified as “flipped” (They were both TRUE and FALSE in the is_hybrid column) and excluded from the data. For every journal, all ISSNs where looked up in JournalTOCs until the first hit.

Coverage

Our first step was to determine the share of OpenAPC journals covered by JournalTOCs. We found that out of the 1629 fully OA journals in OpenAPC 1372 were also listed in JournalTOCs (84.22%), while the coverage of hybrid journals was 4376 out of 4563 (95.9%):

These results were a bit surprising, as we would have expected OA journals being easier to discover and ingest. The following table lists the 15 OA journals occuring most often in OpenAPC which are not covered by JournalTOCs. There does not seem to be a distinctive pattern, so the reason for this phenomenon remains unclear.

Title Count Publisher
OncoTarget 216 Impact Journals, LLC
Genome Announcements 46 American Society for Microbiology
Acta Crystallographica Section E Structure Reports Online 42 International Union of Crystallography (IUCr)
Geoscientific Model Development Discussions 27 Copernicus GmbH
ACS Omega 20 American Chemical Society (ACS)
Natural Hazards and Earth System Sciences Discussions 16 Copernicus GmbH
Aging 15 Impact Journals, LLC
Journal of new frontiers in spatial concepts 10 KIT Scientific Publishing
Annals of Transplantation 9 International Scientific Literature
BioResources 9 BioResources
International Journal of Electrochemical Science 9 ESG
Journal of Psychiatry & Neuroscience 9 Joule Inc.
International Journal of Clinical & Medical Imaging 7 OMICS Publishing Group
Microbial Cell 7 Shared Science Publishers OG
British Journal of Psychiatry Open 6 Royal College of Psychiatrists

Journal status

The second part of our analysis focussed on the journal type as reported by JournalTOCs and if it’s consistent with the classification in OpenAPC.

Hybrid journals

Ignoring all titles not listed in JournalTOCs, we get the following results for the 4376 journals categorized as hybrid in our data set:

We listed a journal in the ERROR category when JournalTOCs failed to display an article list, because in that case a journal type indicator isn’t shown as well. As discussed before, out notion of a “hybrid” journal is fairly broad, so we can safely assume that the HYBRID, SUBSCRIPTION and PARTIALLY_FREE categories match our definition (They are clearly not Gold OA), so only the journals marked OA and FREE represent clear mismatches (40 cases, or 0.91% of our sample). Again, here’s a table of the 15 journals in question appearing most often in our data sets. To get a clearer picture on the situation, we performed a manual lookup of the journals to determine their real status:

Title Count JTOCs type Real status
Semiconductor Science and Technology 12 OA Hybrid
Health Expectations 10 OA OA
Journal of Leukocyte Biology 10 OA Hybrid
Catalysis Science & Technology 6 FREE Hybrid
Nucleus 6 OA OA
Bulletin of the American Meteorological Society 5 OA status unclear
Zeitschrift für Naturforschung B 3 OA Hybrid (flipped in 2015)
Annals of Pure and Applied Logic 2 OA Delayed OA (48 months)
Big Data 2 OA Hybrid
Healthcare Technology Letters 2 OA OA (flipped in 2017)
Reproductive Medicine and Biology 2 OA OA (flipped in 2017)
Research in the Mathematical Sciences 2 OA Hybrid (flipped in 2018)
Tumor Biology 2 OA OA (flipped in 2017)
BioMolecular Concepts 1 OA OA (flipped in 2018)
Bulletin of the Veterinary Institute in Pulawy 1 OA OA

Fully OA journals

We conducted the same kind of analysis for the journals classified as fully OA in OpenAPC (1372 titles, again ignoring those not listed in JournalTOCs):

Here we focussed on the JournalTOCs types SUBSCRIPTION, HYBRID and PARTIALLY_FREE which indicate a mismatch. Again we did a manual lookup for the 15 titles appearing most often:

Title Count JTOCs type Real status
Journal of High Energy Physics 31 HYBRID OA (flipped in 2014)
Meteorologische Zeitschrift 31 SUBSCRIPTION OA (flipped in 2014)
Clinical Epigenetics 24 HYBRID OA
Journal of Clinical Investigation 23 SUBSCRIPTION status unclear
Journal of Cancer 9 SUBSCRIPTION OA
Microbiome 9 HYBRID OA
Gut Pathogens 8 SUBSCRIPTION OA
Nanophotonics 5 HYBRID OA
The European Physical Journal C 5 HYBRID OA (flipped in 2014)
EPJ Techniques and Instrumentation 4 SUBSCRIPTION OA
Geophysical & Astrophysical Fluid Dynamics 4 HYBRID Hybrid
International Journal of Advanced Robotic Systems 4 SUBSCRIPTION OA
Climate Research 3 HYBRID Delayed OA (48 months)
Health Education Research 3 HYBRID Delayed OA (12 months)
Solid State Nuclear Magnetic Resonance 3 HYBRID Hybrid

Conclusion

Putting all our results together, we get the following chart:

To our mind we can draw the following conclusions:

  • In general, JournalTOCs offers good data quality regarding journal types. Most classifications are in line with OpenAPC, and when there’s a mismatch, our manual lookups show that errors are evenly spread between both collections. We can also see that many mismatches seem to be based on journals having recently flipped in their OA policy, so these cases might be more related to information being outdated than to initial misclassification.
  • From an OpenAPC perspective, completeness is an issue. Out of 6192 OpenAPC journal titles, 482 (7.8%) were not listed in JournalTOCs. While this not a bad result in general, it would be a problem when trying to use JournaTOCs as source for the OpenAPC hybrid status in automated metadata enrichment (Our content policy requires each journal to be classified as hybrid or OA).
  • Both completeness and classification quality in JournalTOCs seem to be a bit worse on OA journals.

The bottom line is that JournalTOCs is probably not a candidate for automated metadata enrichment of the journal hybrid status in OpenAPC, at least not on it’s own. On the other hand it is clearly useful as an additional corrective, and it might still be useful as a source of the is_hybrid status when combined with other sources (like the DOAJ or the ISSN Gold OA list).

University of Potsdam reports its 2017 APC expenditures

The University of Potsdam has updated its APC expenditures. The latest contribution provides data for the 2017 period.

Potsdam University Library is in charge of the University’s Open Access Publishing Fund, which is supported under the DFG’s Open Access Publishing Programme.

Contact person is Marco Winkler.

Cost data

The new dataset covers publication fees for 53 articles. Total expenditure amounts to 78 435€ and the average fee is 1 480€.

The following table shows the payments Potsdam University Library has made to publishers in 2017.

  Articles Fees paid in EURO Mean Fee paid
Frontiers Media SA 14 23461 1676
Springer Nature 11 17528 1593
Copernicus GmbH 7 9889 1413
Public Library of Science (PLoS) 5 7778 1556
Dove Medical Press Ltd. 2 3708 1854
IOP Publishing 2 2599 1299
MDPI AG 2 1724 862
BMJ 1 2028 2028
International Union of Crystallography (IUCr) 1 227 227
JMIR Publications Inc. 1 2303 2303
Optical Society of America (OSA) 1 1224 1224
Ovid Technologies (Wolters Kluwer Health) 1 1956 1956
PAGEPress Publications 1 416 416
SAGE Publications 1 915 915
The Company of Biologists 1 1684 1684
Ubiquity Press, Ltd. 1 401 401
Walter de Gruyter GmbH 1 595 595

Overview

With the recent contribution included, the overall APC data for Potsdam now looks as follows:

Fees paid per publisher (in EURO)

plot of chunk tree_potsdam_2018_07_04_full

Average costs per year (in EURO)

plot of chunk box_potsdam_2018_07_04_year_full

Average costs per publisher (in EURO)

plot of chunk box_potsdam_2018_07_04_publisher_full

New articles ingested from OpenAPC Sweden

Our partner project OpenAPC Sweden collects APC data from institutions in Sweden.

New articles ingested after our first exchange in 2017 have now been added to OpenAPC as well.

Cost data

The ingested data covers publication fees for 2,770 articles published by Swedish institutions (This includes all articles already imported in September 2017). Total expenditure amounts to 4 737 102€ and the average fee is 1 710€.

Note that OpenAPC did neither include articles already present in its offsetting data collection nor articles with a cost of 0€, so the net intake is lower than what might be expected.

The following table and plots show a breakdown of the payments.

Overview

  Articles Fees paid in EURO Mean Fee paid
Springer Nature 506 853246 1686
Elsevier BV 349 763646 2188
Informa UK Limited 301 495012 1645
Wiley-Blackwell 275 665622 2420
Public Library of Science (PLoS) 243 329956 1358
MDPI AG 172 184352 1072
Frontiers Media SA 121 203233 1680
Copernicus GmbH 83 103808 1251
Oxford University Press (OUP) 72 146895 2040
Royal Society of Chemistry (RSC) 71 111076 1564
Springer Science + Business Media 58 98118 1692
American Chemical Society (ACS) 55 77131 1402
Hindawi Publishing Corporation 50 29475 589
SAGE Publications 34 53104 1562
IOP Publishing 30 55601 1853
BMJ 20 37874 1894
S. Karger AG 18 17048 947
American Geophysical Union (AGU) 16 40880 2555
Dove Medical Press Ltd. 16 28718 1795
Scientific Research Publishing, Inc, 15 12526 835
AIP Publishing 13 20999 1615
Resilience Alliance, Inc. 13 12015 924
Institute of Electrical & Electronics Engineers (IEEE) 12 16702 1392
American Association for the Advancement of Science (AAAS) 10 28680 2868
Cambridge University Press (CUP) 10 23146 2315
American Meteorological Society 9 19086 2121
Impact Journals, LLC 9 23517 2613
Proceedings of the National Academy of Sciences 9 20438 2271
Walter de Gruyter GmbH 9 13075 1453
OMICS Publishing Group 8 6758 845
American Physical Society (APS) 7 13171 1882
Ovid Technologies (Wolters Kluwer Health) 7 10045 1435
The Royal Society 7 14332 2047
American Dairy Science Association 6 11951 1992
The Company of Biologists 6 17146 2858
American Society of Plant Biologists (ASPB) 5 11556 2311
Pensoft Publishers 5 2678 536
Acta Dermato-Venereologica 4 4041 1010
American Society for Microbiology 4 9066 2266
Genetics Society of America 4 7045 1761
JMIR Publications Inc. 4 7059 1765
Sciedu Press 4 1198 299
The Electrochemical Society 4 2700 675
Wageningen Academic Publishers 4 6650 1662
David Publishing Company 3 3400 1133
EMBO 3 11343 3781
Emerald 3 4191 1397
Geological Society of London 3 5319 1773
SPIE-Intl Soc Optical Eng 3 2033 678
Academic Journals 2 590 295
American Psychological Association (APA) 2 5400 2700
Bentham Science Publishers Ltd. 2 1452 726
Brill 2 2660 1330
Cogitatio 2 1727 864
Gavin Publishers 2 2646 1323
Inter-Research Science Center 2 2900 1450
International Union of Crystallography (IUCr) 2 1971 986
John Benjamins Publishing Company 2 3400 1700
Mary Ann Liebert Inc 2 5034 2517
MDPI 2 1584 792
Optical Society of America (OSA) 2 3277 1638
Scandinavian Journal of Work, Environment and Health 2 3500 1750
Society for Sociological Science 2 1109 554
Taylor & Francis 2 1844 922
Trans Tech Publications 2 480 240
Academic Conferences and Publishing International 1 276 276
Acoustical Society of America (ASA) 1 1780 1780
American Association for Cancer Research (AACR) 1 2641 2641
American Society for Biochemistry & Molecular Biology (ASBMB) 1 3404 3404
American Society for Clinical Investigation 1 2629 2629
American Society of Civil Engineers (ASCE) 1 1397 1397
American Society of Clinical Oncology (ASCO) 1 3095 3095
American Society of Nephrology (ASN) 1 650 650
American Speech Language Hearing Association 1 2174 2174
Annex Publishers, LLC 1 611 611
Association for Research in Vision and Ophthalmology (ARVO) 1 1219 1219
Baishideng Publishing Group Inc. 1 1971 1971
BioMed Central 1 140 140
BioResources 1 931 931
BioScientifica 1 3950 3950
Canadian Center of Science and Education 1 265 265
Canadian Science Publishing 1 2833 2833
Cappelen Damm AS - Cappelen Damm Akademisk 1 668 668
Cold Spring Harbor Laboratory Press 1 2354 2354
Crop Science Society of America 1 890 890
CSIRO Publishing 1 2488 2488
eLife Sciences Organisation, Ltd. 1 2002 2002
Elsevier 1 270 270
Graphyonline Publications PVT, Ltd. 1 374 374
Informa Healthcare 1 3040 3040
Intellect 1 1140 1140
IOS Press 1 1250 1250
IWA Publishing 1 1562 1562
Japan Society for Occupational Health 1 266 266
Magnolia Press 1 270 270
MIT Press - Journals 1 1125 1125
MyJove Corporation 1 1578 1578
Nature Publishing Group 1 2400 2400
Portland Press Ltd. 1 652 652
Redfame Publishing 1 227 227
Remedy Publications LLC. 1 1620 1620
Royal College of Psychiatrists 1 1425 1425
Scientific Research Publishing 1 95 95
Scitechnol Biosoft Pvt. Ltd. 1 360 360
Symbiosis Group 1 646 646
The Endocrine Society 1 1009 1009
Tresorix Ltd 1 179 179
Ubiquity Press, Ltd. 1 413 413
University of California Press 1 1223 1223
World Scientific Pub Co Pte Lt 1 1379 1379

Fees paid per publisher (in EURO)

plot of chunk tree_openapc_se_2018_07_04_full

Average costs per year (in EURO)

plot of chunk box_openapc_se_2018_07_04_year_full

Average costs per publisher (in EURO)

plot of chunk box_openapc_se_2018_07_04_publisher_full