A Novel Complexity Measure for Comparative Analysis of Protein Sequences from Complete Genomes

Tannistha Nandi,Debasis Dash,Rohit Ghai,Chandrika B-Rao,Krishnamoorthy Kannan,Samir K. Brahmachari,C. Ramakrishnan,Srinivasan Ramachandran

A Novel Complexity Measure for Comparative Analysis of Protein Sequences from Complete Genomes

2003

Abstract Analysis of sequence complexities of proteins is an important step in the characterization and classification of new genomes. A new measure has been proposed to compute sequence complexity in protein sequences based on linguistic complexity. The algorithm requires a single parameter, is computationally simple and provides a framework for comparative genomic analysis. Protein sequences were classified into groups of ‘high’ or ‘low complexity’ based on a quantitative measure termed Fc which is proportional to the fraction of low complexity sequence present in the protein. The algorithm was tested on sequences of 196 non-homologous proteins whose crystal structures are available at ≤2.0 A resolution. Protein sequences of high complexity had ‘globular’ structures (95% agreement), whereas those of low complexity had ‘non-globular’ structures (80% agreement). Application of this measure to proteins of unknown structure/function from different genomes revealed that the sequences of ‘high’ complexity con...

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations