TY - JOUR
T1 - Random forests, a novel approach for discrimination of fish populations using parasites as biological tags
AU - Kostadinova, Aneta
AU - Raga, Juan Antonio
AU - Montero, Francisco E.
AU - Perdiguero-Alonso, Diana
AU - Barrett, John
N1 - Perdiguero-Alonso, D., Montero, F. E., Kostadinova, A., Raga, J. A., Barrett, J. (2008). Random forests, a novel approach for discrimination of fish populations using parasites as biological tags. International Journal for Parasitology, 38, pp. 1425-1434
Keywords: Random forests; Classification algorithms; Fish population discrimination; Parasite communities; Atlantic cod; Gadus morhua; North East Atlantic
On file IMPF: 03.75 RONO: 00
PY - 2008
Y1 - 2008
N2 - Due to the complexity of host–parasite relationships, discrimination between fish populations using parasites as biological tags is difficult. This study introduces, to our knowledge for the first time, random forests (RF) as a new modelling technique in the application of parasite community data as biological markers for population assignment of fish. This novel approach is applied to a dataset with a complex structure comprising 763 parasite infracommunities in population samples of Atlantic cod, Gadus morhua, from the spawning/feeding areas in five regions in the North East Atlantic (Baltic, Celtic, Irish and North seas and Icelandic waters). The learning behaviour of RF is evaluated in comparison with two other algorithms applied to class assignment problems, the linear discriminant function analysis (LDA) and artificial neural networks (ANN). The three algorithms are used to develop predictive models applying three cross-validation procedures in a series of experiments (252 models in total). The comparative approach to RF, LDA and ANN algorithms applied to the same datasets demonstrates the competitive potential of RF for developing predictive models since RF exhibited better accuracy of prediction and outperformed LDA and ANN in the assignment of fish to their regions of sampling using parasite community data. The comparative analyses and the validation experiment with a ‘blind’ sample confirmed that RF models performed more effectively with a large and diverse training set and a large number of variables. The discrimination results obtained for a migratory fish species with largely overlapping parasite communities reflects the high potential of RF for developing predictive models using data that are both complex and noisy, and indicates that it is a promising tool for parasite tag studies. Our results suggest that parasite community data can be used successfully to discriminate individual cod from the five different regions of the North East Atlantic studied using RF.
AB - Due to the complexity of host–parasite relationships, discrimination between fish populations using parasites as biological tags is difficult. This study introduces, to our knowledge for the first time, random forests (RF) as a new modelling technique in the application of parasite community data as biological markers for population assignment of fish. This novel approach is applied to a dataset with a complex structure comprising 763 parasite infracommunities in population samples of Atlantic cod, Gadus morhua, from the spawning/feeding areas in five regions in the North East Atlantic (Baltic, Celtic, Irish and North seas and Icelandic waters). The learning behaviour of RF is evaluated in comparison with two other algorithms applied to class assignment problems, the linear discriminant function analysis (LDA) and artificial neural networks (ANN). The three algorithms are used to develop predictive models applying three cross-validation procedures in a series of experiments (252 models in total). The comparative approach to RF, LDA and ANN algorithms applied to the same datasets demonstrates the competitive potential of RF for developing predictive models since RF exhibited better accuracy of prediction and outperformed LDA and ANN in the assignment of fish to their regions of sampling using parasite community data. The comparative analyses and the validation experiment with a ‘blind’ sample confirmed that RF models performed more effectively with a large and diverse training set and a large number of variables. The discrimination results obtained for a migratory fish species with largely overlapping parasite communities reflects the high potential of RF for developing predictive models using data that are both complex and noisy, and indicates that it is a promising tool for parasite tag studies. Our results suggest that parasite community data can be used successfully to discriminate individual cod from the five different regions of the North East Atlantic studied using RF.
KW - Random forests
KW - Classification algorithms
KW - Fish population discrimination
KW - Parasite communities
KW - Atlantic cod
KW - Gadus morhua
KW - North East Atlantic
U2 - 10.1016/j.ijpara.2008.04.007
DO - 10.1016/j.ijpara.2008.04.007
M3 - Article
C2 - 18571175
VL - 38
SP - 1425
EP - 1434
JO - International Journal for Parasitology
JF - International Journal for Parasitology
ER -