Mining parasite data using genetic programming

John Barrett, J. A. Raga, A. Kostadinova

Research output: Contribution to journalArticlepeer-review

3 Citations (SciVal)


Genetic programming is a technique that can be used to tackle the hugely demanding data-processing problems encountered in the natural sciences. Application of genetic programming to a problem using parasites as biological tags demonstrates its potential for developing explanatory models using data that are both complex and noisy. In many areas of biology, the ability to collect data outstrips the ability to analyse it. Techniques are needed to mine large datasets and extract biologically meaningful relationships. Genetic programming (GP) is a stochastic optimization approach that helps to discover comprehensible rules for data mining. It is one of a group of supervised, evolutionary programming techniques that uses darwinian concepts to generate and optimize predictive mathematical models. This is done by mimicking ‘natural selection’ using ‘populations’ of mathematical models. Initially, a population of n models (short computer programmes) is generated, each model representing a different, random combination of variables, constants and mathematical functions. The fitness of each model is determined (in terms of how well it solves the problem). The ‘best’ models are then selected for ‘breeding’ to produce the next generation of ‘fitter’ models, and so on until a model is evolved that solves the problem with the required degree of accuracy or until a specified stopping criterion is reached. During breeding, different parts of the models are recombined, and the mathematical functions and variables can be changed: the equivalent of crossover and mutation. Because GP is a randomized algorithm, it is not deterministic, and each new run with a dataset evolves an independent model. Therefore, several alternative solutions to a problem can be evolved. For complex problems for which there is no single answer, each run can result in a different best model, and a validation process must then be devised to select the most appropriate one.
Original languageEnglish
Pages (from-to)207-209
Number of pages3
Publication statusPublished - 2005


Dive into the research topics of 'Mining parasite data using genetic programming'. Together they form a unique fingerprint.

Cite this