Ph.D. Thesis - Tesi di Dottorato
2004: conferred the Ph.D. degree from Cambridge University after the successful completion of almost four years of reasearch at the European Bioinformatics Institute, in the Ensembl group, under the supervision of Heikki Lehväslaiho, Liisa Holm and Michael Ashburner.
Here you can download or browse the dissertation.
No authorization is necessary to use its contents for research purposes
(although credit would be appreciated).
2004: conseguito il diploma di Dottorato rilasciato dall'Universita' di Cambridge in seguito al completamento con successo di quasi quattro anni di ricerca presso l'EBI: Istituto Bioinformatico Europeo, nel gruppo Ensembl, sotto la supervisione di Heikki Lehväslaiho, Liisa Holm and Michael Ashburner.
La tesi di ricerca puo' qui essere scaricata o esaminata online.
L'uso del materiale
ivi contenuto e' libero ed incoraggiato solo a fini di ricerca. Il giusto
riconoscimento verra' ovviamente gradito.
DNA Phonology: Investigating the Codon Space
Fonologia del DNA: investigazioni sullo spazio codonico
Summary:
The main part of the thesis is concerned with large-scale studies of codon usage in
completely sequenced genomes. A new compositional analysis scheme is presented,
complete with a number of computation and visualisation tools. The thesis addresses
the benefits of this very general scheme, named codon profiling, with comparisons to the
very similar synonymous codon usage. Codon profiling is applied to the analysis of several
domains of interest, with the scope of addressing several questions related to the
compositional constraints of coding sequences.
The heterogeneity of codon usage in the coding sequences of each genome was
examined and presented, noting the consistency of intra-genomic distributions of
codon similarity and atypicality. Such distributions provide the grounds on which to
elaborate practical applications that make use of these properties.
A computationally inexpensive methodology was developed to detect Horizontal
Gene Transfers (and for the first time to identify donor genomes), exploiting measures
of codon similarity and combining a compositional identification approach with a
phylogenetic verification process.
The thesis also presents a detailed procedure for the characterisation of coding
sequences with atypical codon usages, exemplified in a study conducted on a group of
human RNA binding proteins whose codon usage has striking similarity to that of some
human infecting retroviruses.
Finally, the concept of codon usage space, the space of all the possible codon usages, is
discussed. After calculating the theoretical extension of this space, the part visited by
known biological sequences was mapped and its dimensionality computed. The
comparison with the results obtained using several algorithms for random generation of
codon usages quantifies the constraints imposed on biological sequences and allows the
investigation and characterisation of the unexplored regions of the space.
Files: