The number of viruses occupying ecological niches as pathogens or silent passengers of humans, animals, plants, invertebrates, protozoa, fungi, and bacteria is very large. As we search in new niches and as the sensitivity and specificity of our detection techniques get better, also our lists are expanding. Today, the ICTV recognizes more than 3,600 virus species, but specialty groups keep track of far more viruses, strains and subtypes. It has been estimated that more that 30,000 viruses, strains, and subtypes are being tracked in specialty laboratories, reference centers, and culture collections communicating with the WHO, FAO, and other international agencies. Further, the viral quasispecies concept, with its prediction of rapid evolution of variants that may become fixed in nature as new species, forecasts the need to track even more viral entities.
A major goal of the ICTV was to design a universal virus database, the ICTVdB, to become available to all virologists. Another goal was to identify and implement user friendly software for taxonomic research and phylogenetic analysis that would be directly accessible to users of the database. These goals have been achieved, thanks to the seminal contributions of the American Type Culture Collection (ATCC), to the sponsorship of database development by the US National Science Foundation (NSF), and to the commitment of researchers at the Australian National University (ANU) and CSIRO in Canberra. Available on the worldwide web (WWW) since 1993 with information content equivalent to the Committees printed reports and linked to genomic and other databases, ICTVdB is now (late 1998) consulted over 10,000 times daily. Its data and software can be accessed from servers maintained by Dr Cornelia Büchen-Osmond in the Bioinformatics Group, Research School of Biological Sciences, Institute of Advanced Studies, ANU Canberra (http://life.anu.edu.au/viruses/welcome.htm) from mirror sites in NCBI, Bethesda, USA (http://www.ncbi.nlm.nih.gov/ICTVdB/) and from IACR, Rothamsted, UK (http://www.res.bbsrc.ac.uk/mirror/auz/welcome.htm).
The ICTVdB uses the same information as ICTVs Study Groups in developing and managing the universal system for virus taxonomy. The ICTV decided that the DELTA system, (DEscriptive Language for TAxonomy) developed by Dallwitz, 1980; a flexible and powerful method of recording taxonomic descriptions for computer processing should be used. Now adopted as a standard for data exchange by the International Taxonomic Databases Working Group, it is in use for diverse kinds of organisms, including corals, crustaceans, insects, fish, fungi, and plants.
The DELTA programs are continually refined and enhanced in response to feedback from users, through http://biodiversity.uno.edu/delta/.
The programs facilities include the generation and typesetting of descriptions and conventional keys, conversion of DELTA data for use by classification programs, and the construction of INTKEY packages for interactive identification and information retrieval. The DELTA system is particularly useful in the international context envisaged by ICTV because INTKEY packages can be prepared in different languages by translating the character list. Chinese, English, French, German, Malay, Portuguese, and Spanish versions are currently available.
Another objective of ICTVdB is to facilitate descriptions of viruses down to the species level and below, beyond the practicalities of hard copy reports. The database presently entertains links to genomic databases (GenBank, EBI) and catalogs (ATCC, DMZ) containing data relevant to the identification of subspecies, strain, variant and isolates, i.e., to taxonomic levels important in medicine, agriculture, and other fields. Several virus databases will be integrated into the ICTVdB. Presently, only the plant virus database VIDE (Brunt, Crabtree, Dallwitz, Gibbs and Watson, 1996a; Brunt, Crabtree, Dallwitz, Gibbs, Watson and Zurcher, 1996b) is incorporated, the veterinary virus database VIREF is the next candidate (A Della-Porta, personal communication, 1998). The arbovirus catalogue operated for the American Committee on Arthropod-borne Viruses (ACAV) by the Centers for Disease Control in Fort Collins, Colorado, USA is another attractive data collection for translation and incorporation into the ICTVdB; the expertise and the willingness to embark on this endeavor is there (C.H. Calisher, personal communication, 1994).
Büchen-Osmond and Dallwitz, 1996 and Büchen-Osmond, 1997 have published reports about progress with the ICTVdB.
The ICTVdB is a data set compiled by virologists organized in the ICTV and formatted for the DEsription Language for TAxonomy (DELTA). The data are accessible through the World Wide Web. The family descriptions are consistent, i.e., hierarchically analogous and of uniform terminology, as expected from the DELTA format. It is intended eventually to use the ICTVdB for input of primary data at the species level and below. At the strain level the information will be encyclopedic rather than taxonomic, and DELTAs automatic generation of taxonomic information will have to be peer-reviewed by the Committee before acceptance. The ICTVdB will eventually become the main repository for virus characteristics. The data set can be organized in more than one format, e.g., in that of a relational database.
Much work has gone into the ICTVdB. The software tools are being completed, and the virology community, led by ICTV, has expressed its willingness to deposit and help to curate the data. ICTVdB will thereby become a valuable and widely used resource for virus research. It has all the elements necessary to become such a resource-provided that its development is maintained or controlled. The use of XML or its contemporary equivalent tagging system to provide vocabulary switching and concept search facilities and similar developments are vital for effective use of diverse information (genome, host range etc.) in ICTVdB, expanding its usefulness for research in phylogeny and evolutionary relationships.
From the beginning ICTVdB recognized the difficulties inherent in the unequivocal identification of a virus. For the purposes of the database it was found convenient to use a decimal numbering system similar to that used for enzyme nomenclature (Büchen-Osmond and Blaine, 1998). The numbering system gives an internal structure to the database and the numbers assigned to each virus also serve as reference and locator numbers. It also represents an economic and unambiguous means of virus citation, which will grow with the acceptance of the database.
The families have been sorted in alphabetical order and each has been assigned a number which represents a particular family, or genus, if the genus is not yet assigned to a family (Table 1). The system can carry more levels to accommodate strains and isolates, and in common with present practice in genomic and protein databases, new families, revisions etc, are added as they appear without alphabetical consideration.
Description is on taxonomic level of species. Virus is the type species of the genus. Virus belongs to the genus Respirovirus (48.1.1.); subfamily Paramyxovirinae (48.1.); family Paramyxoviridae (48.); order Mononegavirales (VO01.).
The database has been constructed using a character list developed from the above virus properties. These have been translated into single property statements that can be recognized by the search facilities of the database, and that can be used for virus identification and phylogenetic analysis.
One of the best models for the kind of description necessary to avoid ambiguity in virus strain identification is that of the American Type Culture Collection in its frequently updated Catalogue of Animal Viruses and Antisera, Chlamydiae and Rickettsiae (1990). For example, St. Louis encephalitis virus is listed as:
St. Louis encephalitis virus Class III ATCC VR-80
Strain: Hubbard. Original source: Brain of patient, Missouri, 1937. Reference: McCordock, H. A., et al., Proc. Soc. Exp. Biol. Med. 37:288, 1937. Preparation: 20% SMB in 50% NIRS infusion broth; supernatant of low speed centrifugation. Host of Choice: sM (i.c.); M (i.c.). Incubation: 3-4 days. Effect: Death. Host Range: M, Ha, CE, HaK, CE cells. Special Characteristics: Infected brain tissue will have a titer of about 107. Agglutinates goose and chicken RBC. Cross reacts with many or all members of group B arboviruses . . .
Such descriptions are now embedded in the ICTVdB and accessible via the decimal code unique to St. Louis encephalitis virus Class III (ICTVdB 26.0.1.2.4.07).
This structured approach is facilitated by ICTVdB. Recognizing that a picture is worth a thousand words, EM images have been used as analogs of the “type specimen” in the strongly visual presentation of the database on the WWW. Images also play a critical role in preliminary identification of novel viruses using ICTVdB, in which enhanced images of all categories of particles and many virus genera are illustrated with negatively stained preparations, and thin sections of infected tissues when available. Using INTKEY, the identification and data retrieval facility of ICTVdB, novel viruses can be then compared with known viruses, and further evaluated against standard characters. In principle, there is no limit to the type or amount of comparative data that can be interrogated in ICTVdB, or in databases linked to it.
The ICTVdB does not yet contain all taxonomically relevant information. The list of descriptors is incomplete and needs to be extended. Descriptors for data input in DELTA format differ from those in a relational database and require special considerations. However, data sets can be translated between formats.
The work on new descriptors is progressing, and the terminology used for existing descriptors will be made consistent in a way acceptable to the Study Groups. The process of translating the formats into those accepted by the relational/object-oriented database management system has been started. These codes will have to be reviewed by the Study Groups to ensure that acceptable terminology is being used and/or synonym lists are produced, although the basic terms are those used in the DELTA software.
Character lists of symptoms are among the most difficult data to specify in ICTVdB. Images of symptoms in hosts are invaluable and will become common place in the database. However, the system used to construct ICTVdB also permits the inclusion of optional comments in the jargon experts use to specify complex symptoms in different organisms.
Database Maintenance, Editorial Supervision and Updating
Frequent updates are of paramount importance to keep the database attractive. A compromise must be sought between the need for up-to-date information, and the need for ICTV-approved information.
Traditionally, the activities of a Study Group are overseen by the parent Subcommittee; the Chair of the Subcommittee then co-ordinates considered views. This is definitely the way that the Plant Virus Subcommittee works. The principle is that views coming from Study Groups should carry the agreement of the wider relevant community of virologists, and the Subcommittee can give this over-view as it comprises the Chairs of all Study Groups. The effect of this constitution is to have a check against a Study Group with a particular bias, developing ideas not acceptable to other virologists. If Study Group Chairs were to have direct access to the database, there is a risk of conflicts of opinion. Notoriously, different branches of virology differ in how they organize their taxonomic consultation, and the ICTV constitution accommodates this.
Both the web format of the ICTVdB as well as the DELTA software have been developed at the Australian National University (ANU). A close networking style, an immediate involvement of ICTV are required for efficient database maintenance; procedures are being developed to maintain the ICTVdB, to keep up with technology, and to have it evolve to full potential. ICTV will be kept informed about the progress and direction in DELTA development, the number and profiles of persons involved, their locations and connections etc.
For database maintenance and data uploading, a scientist should be employed by ICTV to serve as the webmaster; this person should keep a log of activities, which may serve for tracking the changes made. The webmaster should operate in a virology environment, preferably with a taxonomic inclination (e.g., in a Subcommittee Chairs department). Feedback and motivation are essential to procure progress at a steady pace.
In the past, it has been difficult to attract funds for database work in general, except for the major sequence banks and a few others. However, both in the U.S. and in Europe there is a growing interest in bioinformatics and in the interoperability of biological information resources. Furthermore, emerging infectious diseases are attracting more and more attention.
Printed reports like the present one are published by the ICTV at regular intervals, ideally before the triennial International Congresses of Virology. A Report is a “taxonomic bible”, a quotable publication, which WWW information is not. It was the original objective of using the DELTA format to have automatic Report generation, with illustrations, tables, references and other supplementary material added by a process of desk editing. It is ICTVs goal to retain the Copyright of the printed Report and of the database published in any form anywhere. This issue needs critical attention: new intellectual property laws are coming, especially in Europe (the SUI GENERIS directive), that need scrutiny by all scientific organizations. ICTV must make sure it maintains the right to use its data in whatever way it chooses. Data cannot be copyrighted, whereas the format in which they are published can be protected. The ICTVdB on the Web and the Reports are data formats and should remain the property of ICTV.
The success of pioneering databases like VIDE (Virus Identification Data Exchange) depended on the willingness of individual plant virologists to contribute their expertise via completion of lengthy hard copy questionnaires. The larger task of filling out ICTVdB depends on a similar collaborative effort, but will be facilitated by electronic data input and management on the Web. Identified as EntVir (for enter virus), the contributor proceeds to identify her/himself, and follows the prompts, adding data that generate or improve the description of a virus in her/his area of expertise.
Guided by a coding template (the character list plus already available data), the contributor proceeds to make additions, or proposes changes, to the data presented in her/his area of expertise. The existing natural language description, with images and links to other databases, can be consulted in the process of data submission. Subsequent to submission, data will be subject to review in two steps; by ICTVdB management to check for errors, and by ICTV Study Groups for formal approval. Management will generate a natural language translation of the description from data submitted, and this will be returned to the contributor for checking, and forwarded to the Study Groups for review.
The ICTVdB team and the Virus Data Subcommittee of the ICTV will continue to research new tools for improved data structures and management. Present software systems rely on a set of controlled vocabulary descriptors that have been cross-mapped to solve the semantic/syntax disparities that often complicate efforts to permit interoperability across data sets. Both have limitations with respect to changing semantics and vocabularies, which are particularly challenging at the interfaces between genomic data, proteins and structural components of viruses. The database team will explore new software to facilitate vocabulary switching through concept search, permitting semantic retrieval of data from large collections (Schatz, 1997). These will convert ICTVdB from a reference resource to a research tool for studies of phylogeny and evolutionary relationships.
|
|