A Novel Complexity Measure for Comparative Analysis of Protein Sequences from Complete Genomes

2003 
Abstract Analysis of sequence complexities of proteins is an important step in the characterization and classification of new genomes. A new measure has been proposed to compute sequence complexity in protein sequences based on linguistic complexity. The algorithm requires a single parameter, is computationally simple and provides a framework for comparative genomic analysis. Protein sequences were classified into groups of ‘high’ or ‘low complexity’ based on a quantitative measure termed Fc which is proportional to the fraction of low complexity sequence present in the protein. The algorithm was tested on sequences of 196 non-homologous proteins whose crystal structures are available at ≤2.0 A resolution. Protein sequences of high complexity had ‘globular’ structures (95% agreement), whereas those of low complexity had ‘non-globular’ structures (80% agreement). Application of this measure to proteins of unknown structure/function from different genomes revealed that the sequences of ‘high’ complexity con...
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    18
    Citations
    NaN
    KQI
    []
    Baidu
    map