Scholarly article on aminocarbonyloxamide 617-49-2 from Zhurnal Russkago Fiziko-Khimicheskago Obshchestva p. 278

DOI: 10.1093/bioinformatics/17.1.13

Source and publish data:

Zhurnal Russkago Fiziko-Khimicheskago Obshchestva p. 278 (1885)

Update date:2022-08-17

Topics:: Authors:

Rudinskaja

Read Full Text PDF DownLoad Join now for total 90,000,000 free articles

Article abstract of DOI:10.1093/bioinformatics/17.1.13

Full text of DOI:10.1093/bioinformatics/17.1.13

Vol. 17 no. 1 2001

Pages 13–15

BIOINFORMATICS

Pro-Frame: similarity-based gene recognition in

eukaryotic DNA sequences with errors

Andrey A. Mironov, Pavel S. Novichkov and Mikhail S. Gelfand

State Scientiﬁc Center for Biotechnology NIIGenetika, Moscow, 113545, Russia

Received on September 22, 1999; revised on June 5, 2000; accepted on July 27, 2000

ABSTRACT

numerous protein–DNA alignment algorithms accounting

Summary: Performance of existing algorithms for

similarity-based gene recognition in eukaryotes drops

when the genomic DNA has been sequenced with er-

rors. A modiﬁcation of the spliced alignment algorithm

allows for gene recognition in sequences with errors, in

particular frameshifts. It tolerates up to 5% of sequencing

errors without considerable drop of prediction reliability

when a sufﬁciently close homologous protein is available

for frameshifts (Posfai and Roberts, 1992; Birney et al.,

1996; Guan and Uberbacher, 1996; Zhang et al., 1997;

Pearson et al., 1997). However, none of them handles

introns.

We have implemented a modiﬁed version of the spliced

alignment algorithm performing gene recognition in the

presence of frameshift errors. The algorithm treats introns

as non-penalized gaps that may start only at dinucleotide

GT and end at dinucleotide AG. Frameshifts and in-frame

stop codons in the genomic sequence are allowed, but

heavily penalized. There is an option for acceleration

of the dynamic programming, using the k-tuple align-

ment technique due to M.Roytberg (Nazipova et al.,

(

normalized evolutionary distance similarity score 50% or

higher).

Availability: The program is free for academic users and

available upon request at http://www.anchorgen.com

Contact: mgelfand@anchorgen.com

995). Since sequencing errors can destroy the invariant

Analysis of sequence similarity is a powerful tool for

gene recognition. It is employed in a number of database

search programs, most notably BLASTX (Gish and States,

dinucleotides at splicing sites, the program has a post-

processing step. At this step the program identiﬁes runs

of deletions at exon termini, and moves the exon–intron

boundary even if there are no suitable dinucleotides. More

exactly, observing more than 50% deleted positions in the

region (−30, +30) around the exon junction, the program

searches for the optimal position of the donor and acceptor

splicing sites allowing for a single deviation from the

invariant dinucleotide at each site. The program outputs

the exon positions before and after the correction and the

alignment of the predicted exons and the target protein.

Results of testing the algorithm on a sample of human

genes and related proteins from (Mironov et al., 1998) are

given in Figure 1. This sample consists of 256 genes. The

average length of genomic sequences is approximately

8100 nucleotides, with the longest sequence exceeding

180 000 nucleotides. The number of exons ranges from 1

through 54, the average number of exons per gene is 5.5.

The average length of exons in multi-exon genes is 140

nucleotides. The total number of protein targets is 731,

their average length is 575 amino acids. Five independent

rounds of mutations were performed for each sequence.

The 3655 predictions (ﬁve times 731 comparisons) were

done in about 5 h on a PC with Pentium II 400 MHz

processor under Windows NT.

993), and programs for exact prediction of exon–intron

structure, in particular, Procrustes (Gelfand et al., 1996;

Mironov et al., 1998), INFO (Hultner et al., 1994; Laub

and Smith, 1998), GeneWise (Birney and Durbin, 1997).

The common idea behind these algorithms is that among

numerous possible exon chains, an algorithm chooses the

chain having the highest similarity to a related protein

(

target). This is done by modiﬁed dynamic programming

treating introns as a special case of gaps (GeneWise) or by

spliced alignment (Procrustes).

Testing of the similarity-based gene recognition pro-

grams demonstrated that given sufﬁciently close relatives,

they produce highly reliable predictions. In particular, the

correlation between predicted and real human genes is

6–99% when homologous vertebrate genes are available

(

Mironov et al., 1998; Laub and Smith, 1998). However,

the quality of gene predictions when the genomic DNA

contains sequencing errors is much lower (Burset and

Guigo, 1996). One possibility to avoid this problem is to

use the DNA spliced alignment instead of aligning trans-

lated candidate exons with proteins (Sze and Pevzner,

997). However, it is well known that protein alignments

are much more sensitive to distant similarities than

nucleotide alignments. Thus it is indicative that there exist

The performance at different error levels is estimated

using the standard correlation coefﬁcient measure (Burset

ꢀ

c Oxford University Press 2001

A.A.Mironov et al.

100

Pro

10%

12%

100

Fig. 2. Prediction of coding regions in the sample of Figure 1 using

BLASTX. Axes and notation as in Figure 1.

Fig. 1. Testing of Pro-Frame on a sample of human genes.

Horizontal axis: similarity between actual genes and related proteins

(

in %, see the text for deﬁnition). Vertical axis: correlation

coefﬁcient (in %). Each curve corresponds to a speciﬁc level of

sequencing errors (the percent of erroneous positions is given in the

legend on the right). ‘Pro’ corresponds to the original Procrustes

algorithm.

statistical ﬁltering procedure implemented in Procrustes

(

Mironov et al., 1998). On the other hand, up to 3% rate

of sequencing errors does not considerably inﬂuence the

reliability of predictions, and further, up to 6% of errors

are easily tolerated if the target protein is sufﬁciently close

to the analyzed gene.

and Guigo, 1996; Mironov et al., 1998). For comparison

we also present the correlation coefﬁcient demonstrated

by the original Procrustes algorithm and results of gapped

BLASTX (Altschul et al., 1997) using the target protein

as the query sequence (Figure 2). Since the performance

depends on the similarity between the gene and a target,

the ﬁgures feature plots of the correlation coefﬁcient

at different similarity levels. The similarity measure is

the score of the alignment of the actual and target

proteins divided by the half-sum of the scores of (trivial)

alignments of the actual protein and the target protein

with themselves. Such normalization accounts for varying

protein length and amino acid composition. Sequencing

errors were modeled as random nucleotide substitutions

The above results demonstrate that Pro-Frame may be

a useful tool for analysis of preliminary sequencing data,

e.g. phase I or II output of major sequencing projects, draft

human genome sequences, etc. The initial identiﬁcation of

target proteins should be done by BLASTX, and then Pro-

Frame can be used to exactly map the exon boundaries

and to ﬁnd relatively short exons that could be missed by

the straightforward similarity analyses. Since Pro-Frame

does not rely on the statistical properties of the analyzed

genome (the only requirement is the GT–AG rule for

the intron termini), the program can be used for gene

recognition in invertebrate, plant, fungal, and even protist

sequences.

(

80%), insertions (10%) and deletions (10%); the latter

two types of errors had length one through three with equal

probabilities. For instance, at the error rate 15% this means

that nucleotides at 12% of all positions have been changed,

ACKNOWLEDGEMENTS

We are grateful to Drs V.Bafna, J.Fickett, P.Pevzner

and M.Roytberg for useful discussions. This work was

partially supported by grants from the Russian State

Scientiﬁc Program ‘Human Genome’, the Russian Fund

for Basic Research (99-04-48347 and 00-15-99362), the

Merck Genome Research Institute (244), and INTAS (99-

% of nucleotides were deleted (0.5% of single nucleotide

deletions, 0.5% of dinucleotide deletions, and 0.5% of

trinucleotide deletions), and 1.5% positions contained

insertions (with 0.5% being single nucleotides, 0.5%,

dinucleotides, and 0.5%, trinucleotides). Errors at GT-AG

invariant intron termini were not treated as a special case,

however, the fraction of mutated termini can be easily

estimated given the overall error rate.

At all similarity and error levels Pro-Frame provides

better recognition than straightforward BLASTX (cf.

Figures 1 and 2). It is noteworthy that in the absence of

sequencing errors Pro-Frame performs almost as well as

Procrustes when the target protein is close to the analyzed

gene, but the performance drops for distant relatives. This

agrees with our observations about importance of the

476).

REFERENCES

Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z.,

Miller,W. and Lipman,D.J. (1997) Gapped BLAST and PSI-

BLAST: a new generation of protein database search programs.

Nucleic Acids Res., 25, 3389–3402.

Birney,E., Thompson,J.D. and Gibson,T.J. (1996) PairWise and

SearchWise: ﬁnding the optimal alignment in a simultaneous

comparison of a protein proﬁle against all DNA translation

frames. Nucleic Acids Res., 24, 2730–2739.

Laub,M.T. and Smith,D.W. (1998) Finding intron/exon splice

junctions using INFO, INterruption Finder and Organizer. J.

Comput. Biol., 5, 307–321.

Mironov,A.A., Roytberg,M.A., Pevzner,P.A. and Gelfand,M.S.

(1998) Performance-guarantee gene predictions via spliced

alignment. Genomics, 51, 332–339.

Birney,E. and Durbin,R. (1997) Dynamite: a ﬂexible code gener-

ating language for dynamic programming methods used in se-

quence comparison. Proceedings of the 5th International Con-

ference on Intelligent Systems for Molecular Biology AAAI

Press, Menlo Park, CA, pp. 56–64.

Burset,M. and Guigo,R. (1996) Evaluation of gene structure predic-

tion programs. 34, 353–367.

Gelfand,M.S., Mironov,A.A. and Pevzner,P.A. (1996) Gene recog-

nition via spliced sequence alignment. Proc. Natl. Acad. Sci.

USA, 93, 9061–9066.

Gish,W. and States,D.J. (1993) Identiﬁcation of protein-coding

regions by database similarity search. Nature Genet., 3, 266–272.

Guan,X. and Uberbacher,E.C. (1996) Alignments of DNA and

protein sequences containing frameshift errors. Comput. Applic.

Biosci., 12, 31–40.

Hultner,M., Smith,D.W. and Wills,C. (1994) Similarity lanscapes: a

way to detect many structural and sequence motifs in both introns

and exons. J. Mol. Evol., 38, 188–203.

Nazipova,N.N., Shabalina,S.A., Ogurtsov,A.Yu., Kondrashov,A.S.,

Roytberg,M.A., Buryakov,G.V. and Vernoslov,S.E. (1995) SAM-

SON: a software package for the biopolymer primary structure

analysis. Comput. Applic. Biosci., 11, 423–426.

Pearson,W.R., Wood,T., Zhang,Z. and Miller,W. (1997) Comparison

of DNA sequences with protein sequences. Genomics, 46, 24–36.

Posfai,J. and Roberts,R.J. (1992) Finding errors in DNA sequences.

Proc. Natl. Acad. Sci. USA, 89, 4698–4702.

Sze,S.-H. and Pevzner,P.A. (1997) Las Vegas algorithms for gene

recognition: suboptimal and error-tolerant spliced alignment. J.

Comput. Biol., 4, 297–310.

Zhang,Z., Pearson,W.Ri. and Miller,W. (1997) Aligning a DNA

sequence with a protein sequence. J. Comput. Biol., 4, 339–349.

Products guided by the article

Product name:aminocarbonyloxamide

Cas No:617-49-2

R&D Labs maybe for 617-49-2

Changsha Yonta Industry Co., Ltd.

Contact:+ 86-731-8535 2228

Address:Rm.1717, North Bldg., No.368, East 2nd Ring Road(2nd Section)
Shanghai better-in Medical Technology Co.,LTD.

Contact:+86-21-38921049

Address:Lane 720 zhangjianggaoke cailun road, Pudong, Shanghai, room 513
Shanghai Kangxin Chemical Co., Ltd

Contact:+86 21 60717227

Address:118,Ganbai Village,Waigang Town,Jiading District,Shanghai
Shanghai Hohance Chemical Co., ltd

Contact:13914753421

Address:Fl.5；Bld. 70, Lane 1500; Xinfei Road
AZURIT-V

website:https://salesmerc.lookchem.com/

Contact:+380-95-210-21-30

Address:Khreschatyk St.44b Office 306 Kyiv 01001 Ukraine

Relevant to this article

Selective Conversion of Various Monosaccharaides into Sugar Acids by Additive-Free Dehydrogenation in Water

Doi:10.1002/cctc.202000544
(2020)
Immobilized ionic liquid on the zeolite: its characterization and catalytic activity in the synthesis of coumarins via Pechmann reaction

Doi:10.1007/s13738-020-01950-x
(2020)
Degradation of the radioactive and non-labelled branched 4(3′,5′-dimethyl 3′-heptyl)-phenol nonylphenol isomer by Sphingomonas TTNP3

Doi:10.1023/B:BIOD.0000009937.20251.d2
(2004)
Reduction of aryl nitro compounds with aluminium/NH₄Cl: Effect of ultrasound on the rate of the reaction

Doi:10.1016/S0040-4039(99)01678-0
(1999)
SEPARATION OF ENANTIOMERS OF AMINO ACIDS BY CIRCULATION GAS CHROMATOGRAPHY IN PACKED COLUMNS

Doi:10.1007/BF00955289
(1981)
Kinetic study of asymmetric hydrogenation of methyl levulinate using the (COD)Ru(2-methylallyl)₂-BINAP-HCl catalytic system

Doi:10.1016/j.molcata.2009.06.024
(2009)

Article Doi

DOI: 10.1093/bioinformatics/17.1.13

Source and publish data:

Authors:

Article abstract of DOI:10.1093/bioinformatics/17.1.13

Full text of DOI:10.1093/bioinformatics/17.1.13

Products guided by the article

R&D Labs maybe for 617-49-2

Relevant to this article

Hot Product