Talavera I., Silva F., Hernández N., González R., Palau J., Ferreira M. M. C., "Application of Chemometrics Tools for Automatic Classification and Profile Extraction of DNA Samples in Forensic Tasks". Águas de Lindóia, SP, Brazil, 10-15/09/2006: 10th International Conference on Chemometrics in Analytical Chemistry (CAC-2006, CAC-X), Book of Abstracts (2006) OP09. Oral 09.
10th International Conference on Chemometrics in Analytical Chemistry OP09
Application of Chemometrics
Tools for Automatic Classification
and Profile Extraction
of DNA Samples in Forensic Tasks
Isneri Talaver1*,
Francisco Silva1, Noslén Hernández1,
Ricardo González1, Juan Palau1,
Márcia M. C. Ferreira2
italavera@cenatav.co.cu
1Advanced Technologies
Application Center. MINBAS. Havana, Cuba. 2Instituto
de Química, Universidade
Estadual de Campinas, Campinas
– SP, 13083-970 BRAZIL
Keywords: Classificaiton,
Support Vector Machine, Image Analysis
_____________________________________________________________________________________
DNA profiling
has attracted a good deal of public attention in recent
years. The practical application of
DNA technology
to the identification of biological material has
a significant impact on forensic biology,
because it enables much
stronger conclusions in genetic identity tasks1.
During laboratory data generation,
the forensic scientist conducts
experiments to transform the biological samples into observable DNA
data2.
To carry out this transformation
our forensic especialists applies Electrophoresis Analysis on Polyacrilamide
Gels with silver tintion
reagent, and the DNA sequences are visualized as black spots on the gels.
There is
a standardized method to
manually detect spots of DNA and make the number designations,
but it is very
tedious, and inefficient
way to do the task. In this paper an automatic solution is presented
which integrates
image processing, and pattern
recognition techniques.
After
digital image acquisition and image preprocessing3
the next step is the description of the spots. A
representation using
14 boundary and regions descriptors was chosen. To find out
which of them are the
most significant to characterize
DNA spots, a combination of a PCA analysis
and a C4.5 Decision Tree
were used. All the
spots present on the polyacrilamide gel images were described automatically,
using the
most significant features
obtained. For the profile extraction only DNA spots were useful;
therefore, it was
necessary to
solve a two class classification problem among DNA spots
and No-DNA spots. In order to
perform the
classification process with high
velocity, effectiveness and robustness,
comparative
classification studies amongSupport
Vector Machine, K-NN and PLS-DA classifiers were made.
The best
results obtained with
the SVM classifier demonstrated the advantages attributed
to it in the literature as a
two class classifier4.
After
the Classification process, an image with only DNA
spots was obtained. For the profile extraction,
first it was
necessary to determine the regions
and sub regions in the image that contain
the DNA
sequences patterns
(STR Loci). These patterns contain all the possible alleles present in
a population with
especific numbers
and it is possible to visualize them in the image
as sequences of DNA black spots for
each STR Loci. It
is necessary to use a set of 12 STR Loci in ordeer to be able to
conform the profiles. To
solve the task, the
first step was the detection of the
candidate's regions according to the intensities
histogram along the x
axis; second the determination inside of these regions of the
periodic sequence of
the image according the
characteristics of the patterns and then we finished
with the determination of the
sub regions applying
a Sequential Cluster Leader Algorithm5.
Sometimes, as a consequence of
a
malfunction of
the classifiction algorithm, or by difficulties in the electrophoresis
chemical process, one or
more spots inside a sequence
of SRT Loci pattern were missing; to
restore them a new algorithm was
developed. The process finalizes
with the profile extraction of the experimental DNA samples. To solve
this
task, the formula
for distance of one point to a straight
line was applied. The numbers assigned to the
experimental spots were
the same assigned to the lines of the patterns whose distances
are the minors to
them.
A set
of original plates was processed by the expert using
the standardized manual procedure and the
results of the profile extraction
were compared with the results obtained applying the automatic method.
A
success rate of 97%
and a significant decrease in the time's response,
indicated that this method has a
very good computational
behavior, effectiveness, and provides a very
useful tool to reduce time and
increase the
quality of the forensic specialist responses. A
software which implements all of the method
was developed.
__________________________________________________________________________________________________________________________
References
1 Gill U.; Millican
E.; Oldroyd N.; Watson S.; Sparkers. Advances in Forensic Haemogenetics,
1996; 6:235.
2 Weber J.; May
P. Am. J. Hum. Genet. 1989:44; 388-96.
3 Silva F.; Talavera
I.; Gonzáles R.; Hernández N.; Palau J.; Santiesteban M.
LNCS 3773, 2005; 242-251.
4 Nianyi.; Wecong
L.; Jie Y.; Guozheng L. Support Vector Machine in Chemistry. World Scientific
Publishing Co. 2004.
5 Hartigan J.
Clustering Algorithm. John Wiley and Sons. New York 1975.