Homeobox Genes DataBase


Data Mining


    Data mining, or Knowledge Discovery in Databases, is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data, or the search for relationships and global patterns that exist in large databases.

    The data mining process has been broken down into selection, processing, transformation, data mining, interpretation, and evaluation. The final two components should involve visual and global representations that yield a gestalt to the data [Klevecz, 1999].

    A gene-expression database should ideally provide comparisons of spatial regions and patterns, and this implies that data must be mapped into a common spatial and temporal framework [Bard et al., 1998]. New visualisation techniques are essential for this.

Data Mining in the HOX Pro

    Cluster analysis of  promoter regions.

In the homeobox genes, apart from homeoboxes, there are other conservative sites, including non-coding regions. Taking into account that the number of these 300-800 bp promoter regions is comparable with the number of sequestered homeoboxes, the phylogenetic analysis of these zones definitely is as promising as that of 183-bp homeoboxes. Cluster analysis allowed several gene groups to be revealed on the basis of conservatism of sequences in the promoter zones. Two compact groups are clearly seen: genes-homologs of the Deformed Drosophila gene (Dfd) and homologs of the Sex comb reduced gene (Scr).
 

    Hox RA-responding Enhancers Similarity with Alu Sequences.

We collect cases of sequence similarity between the documented cases of hox RAREs and known Alu motifs.