TY - JOUR
T1 - P-found: Grid-enabling distributed repositories of protein folding and unfolding simulations for data mining
AU - Swain, Martin Thomas
AU - Silva, Cândida G.
AU - Loureiro-Ferreira, Nuno
AU - Ostropytskyy, Vitaliy
AU - Brito, João
AU - Riche, Olivier
AU - Stahl, Frederick
AU - Dubitzky, Werner
AU - Brito, Rui M.M.
PY - 2010/3
Y1 - 2010/3
N2 - The P-found protein folding and unfolding simulation repository is designed to allow scientists to perform data mining and other analyses across large, distributed simulation data sets. There are two storage components in P-found: a primary repository of simulation data that is used to populate the second component, and a data warehouse that contains important molecular properties. These properties may be used for data mining studies. Here we demonstrate how grid technologies can support multiple, distributed P-found installations. In particular, we look at two aspects: firstly, how grid data management technologies can be used to access the distributed data warehouses; and secondly, how the grid can be used to transfer analysis programs to the primary repositories — this is an important and challenging aspect of P-found, due to the large data volumes involved and the desire of scientists to maintain control of their own data. The grid technologies we are developing with the P-found system will allow new large data sets of protein folding simulations to be accessed and analysed in novel ways, with significant potential for enabling scientific discovery.
AB - The P-found protein folding and unfolding simulation repository is designed to allow scientists to perform data mining and other analyses across large, distributed simulation data sets. There are two storage components in P-found: a primary repository of simulation data that is used to populate the second component, and a data warehouse that contains important molecular properties. These properties may be used for data mining studies. Here we demonstrate how grid technologies can support multiple, distributed P-found installations. In particular, we look at two aspects: firstly, how grid data management technologies can be used to access the distributed data warehouses; and secondly, how the grid can be used to transfer analysis programs to the primary repositories — this is an important and challenging aspect of P-found, due to the large data volumes involved and the desire of scientists to maintain control of their own data. The grid technologies we are developing with the P-found system will allow new large data sets of protein folding simulations to be accessed and analysed in novel ways, with significant potential for enabling scientific discovery.
KW - Data mining
KW - Distributed systems
KW - Service-oriented architecture
KW - Grid
UR - http://hdl.handle.net/2160/12378
U2 - 10.1016/j.future.2009.08.008
DO - 10.1016/j.future.2009.08.008
M3 - Article
SN - 0167-739X
VL - 26
SP - 424
EP - 433
JO - Future Generation Computer Systems
JF - Future Generation Computer Systems
IS - 3
ER -