Table of Contents
In this is types of biological databases in bioinformatics post we have briefly explained about types of biological databases, primary databases, secondary databases, and specialist databases.
A database is a digital repository that stores and organises material so that it may be easily retrieved using a variety of search parameters. For data administration, databases are made up of computer hardware and software. The primary goal of database construction is to organise data into a set of organised records that can be easily retrieved.
Biological databases frequently have a higher level of requirement known as knowledge discovery, which refers to the finding of links between bits of information that were not known when the information was first recorded.
Flat files, relational databases, and object-oriented databases are all used in current biological databases. Despite the obvious disadvantages of using flat files for database management, many biological databases continue to do so. The reason for this is because this method requires very little database design and the search results are simple enough for working biologists to understand.
Types of Biological Databases in Bioinformatics
Biological databases can be split into three groups based on their content: primary databases, secondary databases, and specialist databases.
Original biological data can be found in primary types of biological databases. They are repositories for scientific community-submitted raw sequencing or structure data. Primary databases include GenBank and Protein Data Bank (PDB).
Secondary types of biological databases hold information that has been computationally processed or manually curated from primary databases. This category includes translated protein sequence databases with functional annotation. SWISS-Prot and Protein Information Resources (PIR) are two examples.
Databases that cater to a certain study topic are known as specialised databases. Flybase, HIV sequence database, and Ribosomal Database Project, for example, are databases that specialise in a specific organism or type of data.
GenBank, the European Molecular Biology Laboratory (EMBL) database, and the DNA Data Bank of Japan (DDBJ) are three main public sequence databases that store raw nucleic acid sequence data collected and submitted by researchers all over the world. They are all publicly available on the Internet.
To ensure that essential molecular data is made freely available, sequence submission to GenBank, EMBL, or DDBJ is currently a prerequisite for publishing in most scientific journals.
These three public databases work closely together and share new data on a daily basis. The International Nucleotide Sequence Database Collaboration is made up of them all.
This means that connecting to any of the three databases should give you access to the identical nucleotide sequence information.
Fortunately, the PDB is the only centralised database for three-dimensional structures of biological macromolecules. The atomic coordinates of macromolecules determined by x-ray crystallography and NMR are stored in this database.
Protein names, authors, experimental details, secondary structure, cofactors, and atomic coordinates are all represented in a flat file format.
The amount of sequence annotation information in the primary database is frequently insufficient. Much post processing of sequence information is required to convert basic sequence information into more complex biological knowledge. This necessitates the creation of secondary databases, which store computationally processed sequence data generated from main databases.
The degree of computer processing work varies widely among secondary databases; some are simply archives of translated sequence data from recognised open reading frames in DNA, whereas others give extra annotation and information related to higher levels of structure and functions.
SWISS-PROT, for example, is a well-known secondary database that provides extensive sequence annotation, including structure, function, and protein family assignment.
Secondary types of biological databases exist that deal with protein family classification based on activities or structures. The Pfam and Blocks databases provide aligned protein sequence information as well as derived motifs and patterns that can be used for protein family categorization and function inference.
The DALI database is a protein secondary structure database that is essential for protein structure categorization and threading analysis, which is used to detect distant evolutionary links between proteins.
Specialized databases usually cater to a particular scientific community or are focused on a single organism. Sequences or other sorts of information may be found in these databases.
These types of biological databases may contain sequences that overlap with those in a primary database, but they may also contain fresh data submitted directly by authors.
They may have unique groupings and additional annotations connected with the sequences because they are frequently curated by experts in the subject.
Many taxonomic-specific genomic types of biological databases come under this group. Flybase, WormBase, AceDB, and TAIR are among examples.
There are also specialised databases that contain original functional analysis data. Gene expression types of biological databases such as the GenBank EST database and the Microarray Gene Expression types of biological databases at the European Bioinformatics Institute (EBI) are two examples.