Abstract
The expressive power, powerful search capability, and the explicit nature of the resulting models make evolutionary methods very attractive for supervised learning applications in bioinformatics. However, their characteristics also make them highly susceptible to overtraining or to discovering chance relationships in the data. Identification of appropriate criteria for terminating evolution and for selecting an appropriately validated model is vital. Some approaches that are commonly applied to other modelling methods are not necessarily applicable in a straightforward manner to evolutionary methods. An approach to model selection is presented that is not unduly computationally intensive. To illustrate the issues and the technique two bioinformatic datasets are used, one relating to metabolite determination and the other to disease prediction from gene expression data.
Original language | English |
---|---|
Pages (from-to) | 187-196 |
Number of pages | 10 |
Journal | BioSystems |
Volume | 72 |
Issue number | 1-2 |
DOIs | |
Publication status | Published - Nov 2003 |
Keywords
- validation
- genetic programming
- gene expression
- model selection
- generalisation