10. Geographic distribution in the IOPI Checklist

Scope of the model

Geographic information is the only major area of "indirectly linked information" to be included in the IOPI Global Plant Checklist project and thus in the present model. In contrast to the treatment of core taxonomic information and literature references, the part of the model treating geographical information was specifically designed for the IOPI checklist project. Rather than attempting to provide a generalized model of distribution data, geographic data are used as an example for the attachment of information to taxonomic data.

Geographic information in botanical systems may encompass a wide variety of data areas, such as place names, coordinate-defined point locations, grid occurrence data, geomorphologic features, and meteorological information. In some areas, geographic and ecological information merge, for example in the case of collection site descriptions, where vegetation type, habitat, and edaphic data may be included.

Modeling geographic information is greatly complicated by the multitude of formats and standards in existence. After intensive discussion, the CDEFD project group came to the conclusion that commercially available geographic information systems (GIS) should be used whenever a detailed coverage of geographic information is requested. Such systems offer possibilities of linkage to relational databases, so that integration with the presented model for taxonomic data can be achieved. Examples for such an approach are set by the Environmental Resources Network (ERIN) in Australia, where specimen information, meteorological data, geographic data and taxonomic data have been used to form a highly integrated system (A, Chapman, D. Green, pers. comm.). A similar approach is taken by the Federal Agency for Nature Conservation in Germany (May, 1994). At present, commercial GIS systems are still relatively expensive. However, with the advent of desktop computers with workstation-like capabilities, mass usage of GIS programs in business is likely to bring down prices (Mauth, 1995).

For the IOPI checklist project, only taxon distribution and occurrence status information is to be included. In the interest of rapid checklist production, this information will (for the time being) not be specimen based. Rather, area occurrence data are to be recorded from existing datasets or publications and, wherever possible, be converted to the TDWG standard geographical area scheme (Hollis & Brummitt 1992). The occurrence status information should be converted to the POSS standard. In addition, a summary phrase for the taxon's world-wide distribution is to be included which will be formulated by the IOPI coordinator team for the respective taxonomic group (Wilson 1994).

Attachment to potential taxa

As described in chapter 9, "indirectly linked information" has a link to a source reference for the data, as well as a source reference for the linkage to a specific potential taxon. Thus one and the same dataset can be linked to several potential taxa. This approach may be obligatory, for example in the case of specimen data, where the assignment to a potential taxon represents the result of an identification. The assignations represent the identification history of a certain specimen, information which may be important in specimen management and taxonomic research.

However, in the case of geographic information, and for database tasks as large as that of the IOPI checklist, a different approach is recommended (but not prescribed by the data model!). Because of the closeness of taxon distribution data to taxonomic judgment, the assignment reference of a geographic record should be enforced to equal the potential taxon reference. An examples may serve to clarify the concept:

Potential taxon records have been created in the database for all taxa imported from the Med Checklist Database. Thus, a potential taxon "Lathyrus odoratus L. sec. Greuter et al. (1989)" exists. Geographical distribution records from Med Checklist have been imported as well, i.e. a set of area records with the original name link "Lathyrus odoratus L." also exists. In this case, the assignment reference (i.e., who did assign the geographic set to this particular potential taxon) can be automatically set to Greuter et al. (1989).

Lathyrus odoratus L. is also reported by Standley and Calderón (1925) as being introduced to El Salvador, C.A. A geographical record, with Standley and Calderón as source reference is created but not a potential taxon "Lathyrus odoratus L. sec. Standley & Calderón (1925)", because the reference is a rather outdated geographical checklist. A taxonomic coordinator team ("A & B") wants to assign this geographical record to a potential taxon in the database. Thus, "A & B" are the source reference of the assignment (remember that the bibliographical part of the model is capable to treat something like "A & B, IOPI taxonomic coordination " as a reference title). Although the basic referential integrity of the data model allows to assign this record to the potential taxon referred to Greuter et al., it is recommended that a data integrity rule is incorporated which enforces the creation of a potential taxon "Lathyrus odoratus L. sec A & B". This has the advantage that geography records attached to potential taxa may be merged without having to account for later additions. If "A & B" want to add the data of Greuter & al. to their potential taxon, they can either ascribe that geographical set to it, or they may include the potential taxon of Greuter & al. as a system synonym.

The data structure

Three levels may be distinguished in the geographical dataset: Data relating to the whole set of data, those relating to a taxon record within the set, and those relating to a single area.

The first level consists of a description of the dataset, such as its source and the gazetteer and occurrence standards used for the entire set. The dataset may be imported in electronic form, or it may be input manually from a published or unpublished reference.

The second level consists of data ascribed to a specific taxon within the set, such as a summary of its geographic distribution, its original name, and its assignment(s) to a potential taxon.

Finally, the third level lists individual geographical areas ("named areas") in which the occurrence of the taxon has been recorded.


Diagram 21: Geographic distribution in the IOPI Global Plant Checklist


Summary range phrase

Geographical distribution of taxa is often given as a concise statement citing only a large area (e.g. "Neotropics") or the limits of a range ("Mexico to Northern South America"). Such summary range statements are of limited value if the actual occurrence in a specified area is sought. For example, the latter case leaves open if the plant actually occurs in all countries between the extremes of the range, or if the taxon is confined to either the Atlantic or Pacific side of the continent, i.e. Venezuela vs. El Ecuador, Belize vs. El Salvador. However, such statements are often the only information provided by datasets, and for the IOPI checklist, a statement summarizing the global native as well as one summarizing the global non-native occurrence of the taxon has been prescribed by the data definition subgroup (Bisby, 1994). The detailed data structure for summary statements is given in Diagram 22.


Diagram 22: Geographic distribution summary


Area occurrence data

Data on the occurrence of a taxon in an area consists of two principal data areas: The description and circumscription of the area itself, and the occurrence status of the taxon within the area. Both data areas present severe problems regarding definitions and standards, which may greatly limit the usefulness of the information obtained.

"Named areas" as a term may include administrative areas like states or provinces, geographical areas like islands or mountain ranges, climatic areas like "tropics", ecological areas like deserts or lakes, phytogeographic units, or special-purpose areas such as national parks. For the purpose of the IOPI checklist, areas which can be considered summary statements (e.g. tropics) are excluded, as well as smaller scale areas which are normally used only in the context of collection site descriptions (e.g. private properties or national parks). However, it must be possible to cite small scale areas in the context of local endemites.

As in the case of the taxon concept, the circumscription of an area is not directly defined by its name. Administrative areas may vary greatly in time, a fact which must be taken care of by the data model by means of the inclusion of a validity time period for every named area.

The World Geographical Scheme for Recording Plant Distributions (Hollis & Brummitt 1992) has been accepted by TDWG as a standard for recording plant distributions. In some cases, e.g. the states within the former Soviet Union, the standard already needs to be revised.

To define the occurrence status (e.g. "native", "introduced", "extinct") of a taxon in a specified area, different schemes have been in use. IOPI decided to follow the Plant Occurrence and Status Scheme" (POSS, Leon & al. 1989) which, as a draft, has been accepted by TDWG although some uncertainty persists as to its final form. IOPI aims at providing the POSS occurrence status of species in all TDWG standard areas. Since a well defined hierarchy exists, data may be generalized from lower units to provide the occurrence e.g. in standard continents.

A third TDWG standard which may be considered for inclusion is the scheme of Takhtajan (1989) for phytogeographic units. It consists of a 3-level hierarchical subdivision of the floristic regions of the world. However, it has not been adopted by IOPI's data definition group for inclusion in the Global Plant Checklist.

Records imported from existing datasets in most cases will not adhere to neither standard. Such incoming datasets may include geographical area records and occurrence statements either in a non-standard form or in a standardized form according to a specific reference (e.g. Flora Europaea, Tropicos, Med-Checklist).

To make incoming datasets with well structured geographic data accessible to geographic coordinators, the incoming record must be preserved in the database. The standard system in use has to be documented. Where possible, an automatic translation into TDWG standard areas can be effected upon import of the dataset, or this information may be derived by a coordinator (but resulting data should be marked as derived). Diagram 23 depicts the data items involved in such a treatment of geographical areas.


Diagram 23: Geographic area data


The entity relation model for geographic distributions in the IOPI checklist


Diagram 24: Geographical data in the IOPI checklist



Contents of this article; Complete entity list; References cited; Author information. Last updated: June 23, 1995