Data Liberator through PLAZI
By Siya Zamisa
Dr John Midgley and I attended a Plazi workshop that seeks to promote the development of persistent and openly accessible digital taxonomic literature.
โPlazi, amongst other things, maintains a digital taxonomic literature repository to enable archiving of taxonomic treatments, participates in the development of new models for publishing taxonomic treatments and advocates and educates about the vital importance of maintaining free and open access to scientific discourse and dataโ (www.plazi.org). This therefore plays a fundamental role as it means the information will be provided freely. This benefits research in developing countries that might not have the financial power or funds for purchasing journals.
During the workshop we were trained to extract taxonomic treatments (scientific description of a biological species) and enhance data from different PDF scientific literature using a program called GoldenGATE-Imaging (GGI). GoldenGATE-Imagine opens PDF documents, extracting metadata and marking taxon names and treatments, decoding embedded fonts where required, and finally segmenting pages into columns, blocks, paragraphs, and lines. All of this allows the Plazi system to separate the parts of the documents that can be shared without breaking copyright law and allows the end user to find things easily.
Figure 1: Example of an extraction using the GoldenGATE-Imaging (GGI).
The final step is to do an eBIODiv Matching Service which is designed for users to match and link the material citations contained in the academic literature to the respective specimens in natural history collections.
As a data technician at the KZN Museum, one of my responsibilities requires data (specimens) to be published to GBIF (Global Biodiversity Information Facility), and it helps with the match-up in terms of increasing the pull of specimens for linking. Not all collection data is found in literature, so my work plays a big role in trying to decrease the gap.
Plazi seeks to avail quality literature to everyone in the world which is free access and fair.