TY - GEN
T1 - Parallel computing TEDA for high frequency streaming data clustering
AU - Gu, Xiaowei
AU - Angelov, Plamen Parvanov
AU - Gutierrez, German
AU - Iglesias, Jose Antonio
AU - Sanchi, Araceli
N1 - Publisher Copyright:
© Springer International Publishing AG 2017.
PY - 2016/10/23
Y1 - 2016/10/23
N2 - In this paper, a novel online clustering approach called Parallel_TEDA is introduced for processing high frequency streaming data. This newly proposed approach is developed within the recently introduced TEDA theory and inherits all advantages from it. In the proposed approach, a number of data stream processors are involved, which collaborate with each other efficiently to achieve parallel computation as well as a much higher processing speed. A fusion center is involved to gather the key information from the processors which work on chunks of the whole data stream and generate the overall output. The quality of the generated clusters is being monitored within the data processors all the time and stale clusters are being removed to ensure the correctness and timeliness of the overall clustering results. This, in turn, gives the proposed approach a stronger ability of handling shifts/drifts that may take place in live data streams. The numerical experiments performed with the proposed new approach Parallel_TEDA on benchmark datasets present higher performance and faster processing speed when compared with the alternative well-known approaches. The processing speed has been demonstrated to fall exponentially with more data processors involved. This new online clustering approach is very suitable and promising for real-time high frequency streaming processing and data analytics.
AB - In this paper, a novel online clustering approach called Parallel_TEDA is introduced for processing high frequency streaming data. This newly proposed approach is developed within the recently introduced TEDA theory and inherits all advantages from it. In the proposed approach, a number of data stream processors are involved, which collaborate with each other efficiently to achieve parallel computation as well as a much higher processing speed. A fusion center is involved to gather the key information from the processors which work on chunks of the whole data stream and generate the overall output. The quality of the generated clusters is being monitored within the data processors all the time and stale clusters are being removed to ensure the correctness and timeliness of the overall clustering results. This, in turn, gives the proposed approach a stronger ability of handling shifts/drifts that may take place in live data streams. The numerical experiments performed with the proposed new approach Parallel_TEDA on benchmark datasets present higher performance and faster processing speed when compared with the alternative well-known approaches. The processing speed has been demonstrated to fall exponentially with more data processors involved. This new online clustering approach is very suitable and promising for real-time high frequency streaming processing and data analytics.
KW - Clustering
KW - High frequency streaming data
KW - Parallel computation
KW - Real time
KW - TEDA
UR - http://www.research.lancs.ac.uk/portal/en/publications/parallel-computing-teda-for-high-frequency-streaming-data-clustering(435cc049-019a-4284-964c-3ecaaff36030).html
UR - http://www.scopus.com/inward/record.url?scp=84994529004&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-47898-2_25
DO - 10.1007/978-3-319-47898-2_25
M3 - Conference Proceeding (Non-Journal item)
SN - 9783319478982
SN - 9783319478975
T3 - Advances in Intelligent Systems and Computing
SP - 238
EP - 253
BT - Advances in Big Data
A2 - Roy, Asim
A2 - Vellasco, Marley
A2 - Manolopoulos, Yannis
A2 - Iliadis, Lazaros
A2 - Angelov, Plamen
ER -