Abstract
We address the effect of spatial scale and temporal variation on model generality when forming predictive models for fish assignment using a new data mining approach, Random Forests (RF), to variable biological markers (parasite community data). Models were implemented for a fish host-parasite system sampled along the Mediterranean and Atlantic coasts of Spain and were validated using independent datasets. We considered 2 basic classification problems in evaluating the importance of variations in parasite infracommunities for assignment of individual fish to their populations of origin: multiclass (2–5 population models, using 2 seasonal replicates from each of the populations) and 2-class task (using 4 seasonal replicates from 1 Atlantic and 1 Mediterranean population each). The main results are that (i) RF are well suited for multiclass population assignment using parasite communities in non-migratory fish; (ii) RF provide an efficient means for model cross-validation on the baseline data and this allows sample size limitations in parasite tag studies to be tackled effectively; (iii) the performance of RF is dependent on the complexity and spatial extent/configuration of the problem; and (iv) the development of predictive models is strongly influenced by seasonal change and this stresses the importance of both temporal replication and model validation in parasite tagging studies.
| Original language | English |
|---|---|
| Pages (from-to) | 1833-1847 |
| Number of pages | 15 |
| Journal | Parasitology |
| Volume | 137 |
| Issue number | 12 |
| Early online date | 06 Jul 2010 |
| DOIs | |
| Publication status | Published - 01 Oct 2010 |
Keywords
- predictive models
- random forests
- fish population discrimination
- parasites as tags
- boops boops
- mediterranean
- north-east atlantic