ASSP is a sequence analysis tool for the prediction and classification
of splice sites. It is based on an anlaysis of constitutive, skipped,
cryptic, and alternative exon isoform splice sites (Wang and Marín,
submitted), retrieved from the Altextron
- database (Clark and Thanaraj 2002, Thanaraj et al. 2004, see definitions
for splice site and exon nomenclature). This analysis revealed several
features distinguishing alternative isoform and cryptic but not skipped
splice sites from constitutive ones.
ASSP identifies putative splice sites using pre-processing models (position
specific score matrices) and subsequently classifies them as either
constitutive or alternative isoform/cryptic splice site using backpropagation
networks (see model description),
which combine several models and sequence statistics, such as position
specific score matrices for the splice sites, GC content, oligonucleotide
frequency models. Once the cutoff-value of the corresponding pre-processing
model is surpassed, a splice site is labeled "real" and subsequently
classified as "alternative isoform/cryptic" or "constitutive"
by the corresponding neural network. For more information about cutoff
values see evaluation.
The classification performance of ASSP, i.e. the distinction between
constitutive or alternative isoform/cryptic splice sites, is 67.45 percent
for acceptor sites and 71.23 percent for the donor sites (see evaluation
for details). However, the percentage of correctly identified splice
sites depends strongly on the score thresholds applied for the pre-processing
Exon identification is faciliated by the calculation of codon usage
(log-likelyhood) for a variable window size, and the identification
of stop codons for each reading frame. A sample output of ASSP is given
in figure 1.
Wang M., Marín A. 2006. Characterization
and prediction of alternative splice sites. Gene 366:219-227.
Seqool - A sequence analaysis tool. A software package for general sequence analysis and for building pattern recognition models for finding biological signals in DNA, RNA, or proteins. Available at http://www.biossc.de/seqool/index.html.
Altextron - database:
Clark F., Thanaraj T.A. 2002. Catergorization
and characterization of transcript-confirmed constitutively and alternatively
spliced introns and exons from human. Human Molecular Genetics 11:
Thanaraj T.A., Stamm S., Clark F., Riethoven
J.-J., Le Texier V., Muilu J. 2004. ASD: the Alternative Splicing
Database. Nucleic Acids Res. 32 (Database issue): D64-D69.