TY - JOUR
T1 - A dichotomic search algorithm for mining and learning in domain-specific logics
AU - Ferré, Sebastien
AU - King, Ross Donald
N1 - Ferré, S. and King, R. D. (2004) A dichotomic search algorithm for mining and learning in domain-specific logics. Fundamenta Informaticae. IOS Press. To appear
PY - 2004/11
Y1 - 2004/11
N2 - Abstract. Many application domains make use of specific data structures such as sequences and
graphs to represent knowledge. These data structures are ill-fitted to the standard representations
used in machine learning and data-mining algorithms: propositional representations are not expressive
enough, and first order ones are not efficient enough. In order to efficiently represent and reason
on these data structures, and the complex patterns that are related to them, we use domain-specific
logics. We show these logics can be built by the composition of logical components that model
elementary data structures. The standard strategies of top-down and bottom-up search are ill-suited
to some of these logics, and lack flexibility. We therefore introduce a dichotomic search strategy,
that is analogous to a dichotomic search in an ordered array. We prove this provides more flexibility
in the search, while retaining completeness and non-redundancy. We present a novel algorithm for
learning using domain specific logics and dichotomic search, and analyse its complexity. We also
describe two applications which illustrates the search for motifs in sequences; where these motifs
have arbitrary length and length-constrained gaps. In the first application sequences represent the
trains of the East-West challenge; in the second application they represent the secondary structure of
Yeast proteins for the discrimination of their biological functions.
AB - Abstract. Many application domains make use of specific data structures such as sequences and
graphs to represent knowledge. These data structures are ill-fitted to the standard representations
used in machine learning and data-mining algorithms: propositional representations are not expressive
enough, and first order ones are not efficient enough. In order to efficiently represent and reason
on these data structures, and the complex patterns that are related to them, we use domain-specific
logics. We show these logics can be built by the composition of logical components that model
elementary data structures. The standard strategies of top-down and bottom-up search are ill-suited
to some of these logics, and lack flexibility. We therefore introduce a dichotomic search strategy,
that is analogous to a dichotomic search in an ordered array. We prove this provides more flexibility
in the search, while retaining completeness and non-redundancy. We present a novel algorithm for
learning using domain specific logics and dichotomic search, and analyse its complexity. We also
describe two applications which illustrates the search for motifs in sequences; where these motifs
have arbitrary length and length-constrained gaps. In the first application sequences represent the
trains of the East-West challenge; in the second application they represent the secondary structure of
Yeast proteins for the discrimination of their biological functions.
M3 - Article
SN - 0169-2968
VL - 66
SP - 1
EP - 32
JO - Fundamenta Informaticae
JF - Fundamenta Informaticae
IS - 1-2
ER -