Biodiversity informatics

Biodiversity Informatics is the application of informatics techniques to biodiversity information for improved management, presentation, discovery, exploration and analysis. It typically builds on a foundation of taxonomic, biogeographic, or ecological information stored in digital form, which, with the application of modern computer techniques, can yield new ways to view and analyse existing information, as well as predictive models for information that does not yet exist (see niche modelling). Biodiversity informatics is a relatively young discipline (the term was coined in or around 1992) but has hundreds of practitioners worldwide, including the numerous individuals involved with the design and construction of taxonomic databases. The term 'Biodiversity Informatics' is generally used in the broad sense to apply to computerized handling of any biodiversity information; the somewhat broader term 'bioinformatics' is often used synonymously with the computerized handling of data in the specialized area of molecular biology. Biodiversity Informatics is the application of informatics techniques to biodiversity information for improved management, presentation, discovery, exploration and analysis. It typically builds on a foundation of taxonomic, biogeographic, or ecological information stored in digital form, which, with the application of modern computer techniques, can yield new ways to view and analyse existing information, as well as predictive models for information that does not yet exist (see niche modelling). Biodiversity informatics is a relatively young discipline (the term was coined in or around 1992) but has hundreds of practitioners worldwide, including the numerous individuals involved with the design and construction of taxonomic databases. The term 'Biodiversity Informatics' is generally used in the broad sense to apply to computerized handling of any biodiversity information; the somewhat broader term 'bioinformatics' is often used synonymously with the computerized handling of data in the specialized area of molecular biology. Biodiversity informatics (different but linked to bioinformatics) is the application of information technology methods to the problems of organizing, accessing, visualizing and analyzing primary biodiversity data. Primary biodiversity data is composed of names, observations and records of specimens, and genetic and morphological data associated to a specimen. Biodiversity informatics may also have to cope with managing information from unnamed taxa such as that produced by environmental sampling and sequencing of mixed-field samples. The term biodiversity informatics is also used to cover the computational problems specific to the names of biological entities, such as the development of algorithms to cope with variant representations of identifiers such as species names and authorities, and the multiple classification schemes within which these entities may reside according to the preferences of different workers in the field, as well as the syntax and semantics by which the content in taxonomic databases can be made machine queryable and interoperable for biodiversity informatics purposes... Biodiversity Informatics can be considered to have commenced with the construction of the first computerized taxonomic databases in the early 1970s, and progressed through subsequent developing of distributed search tools towards the late 1990s including the Species Analyst from Kansas University, the North American Biodiversity Information Network NABIN, CONABIO in Mexico, and others, the establishment of the Global Biodiversity Information Facility in 2001, and the parallel development of a variety of niche modelling and other tools to operate on digitized biodiversity data from the mid-1980s onwards (e.g. see ). In September 2000, the U.S. journal Science devoted a special issue to 'Bioinformatics for Biodiversity', the journal 'Biodiversity Informatics' commenced publication in 2004, and several international conferences through the 2000s have brought together Biodiversity Informatics practitioners, including the London e-Biosphere conference in June 2009. A supplement to the journal BMC Bioinformatics (Volume 10 Suppl 14) published in November 2009 also deals with Biodiversity Informatics. According to correspondence reproduced by Walter Berendsohn, the term 'Biodiversity Informatics' was coined by John Whiting in 1992 to cover the activities of an entity known as the Canadian Biodiversity Informatics Consortium, a group involved with fusing basic biodiversity information with environmental economics and geospatial information in the form of GPS and GIS. Subsequently, it appears to have lost any obligate connection with the GPS/GIS world and be associated with the computerized management of any aspects of biodiversity information (e.g. see ) One major issue for biodiversity informatics at a global scale is the current absence of a complete master list of currently recognised species of the world, although this is an aim of the Catalogue of Life project which has ca. 1.65 million species of an estimated 1.9 million described species in its 2016 Annual Checklist. A similar effort for fossil taxa, the Paleobiology Database documents some 100,000+ names for fossil species, out of an unknown total number. Application of the Linnaean system of binomial nomenclature for species, and uninomials for genera and higher ranks, has led to many advantages but also problems with homonyms (the same name being used for multiple taxa, either inadvertently or legitimately across multiple kingdoms), synonyms (multiple names for the same taxon), as well as variant representations of the same name due to orthographic differences, minor spelling errors, variation in the manner of citation of author names and dates, and more. In addition, names can change through time on account of changing taxonomic opinions (for example, the correct generic placement of a species, or the elevation of a subspecies to species rank or vice versa), and also the circumscription of a taxon can change according to different authors' taxonomic concepts. One proposed solution to this problem is the usage of Life Science Identifiers (LSIDs) for machine-machine communication purposes, although there are both proponents and opponents of this approach. Organisms can be classified in a multitude of ways (see main page Biological classification), which can create design problems for Biodiversity Informatics systems aimed at incorporating either a single or multiple classification to suit the needs of users, or to guide them towards a single 'preferred' system. Whether a single consensus classification system can ever be achieved is probably an open question, however the Catalogue of Life has commissioned activity in this area which has been succeeded by a published system proposed in 2015 by M. Ruggiero and co-workers. 'Primary' biodiversity information can be considered the basic data on the occurrence and diversity of species (or indeed, any recognizable taxa), commonly in association with information regarding their distribution in either space, time, or both. Such information may be in the form of retained specimens and associated information, for example as assembled in the natural history collections of museums and herbaria, or as observational records, for example either from formal faunal or floristic surveys undertaken by professional biologists and students, or as amateur and other planned or unplanned observations including those increasingly coming under the scope of citizen science. Providing online, coherent digital access to this vast collection of disparate primary data is a core Biodiversity Informatics function that is at the heart of regional and global biodiversity data networks, examples of the latter including OBIS and GBIF. As a secondary source of biodiversity data, relevant scientific literature can be parsed either by humans or (potentially) by specialized information retrieval algorithms to extract the relevant primary biodiversity information that is reported therein, sometimes in aggregated / summary form but frequently as primary observations in narrative or tabular form. Elements of such activity (such as extracting key taxonomic identifiers, keywording / index terms, etc.) have been practiced for many years at a higher level by selected academic databases and search engines. However, for the maximum Biodiversity Informatics value, the actual primary occurrence data should ideally be retrieved and then made available in a standardized form or forms; for example both the Plazi and INOTAXA projects are transforming taxonomic literature into XML formats that can then be read by client applications, the former using TaxonX-XML and the latter using the taXMLit format. The Biodiversity Heritage Library is also making significant progress in its aim to digitize substantial portions of the out-of-copyright taxonomic literature, which is then subjected to OCR (optical character recognition) so as to be amenable to further processing using Biodiversity Informatics tools.

Parent Topic

Child Topic

No Parent Topic