Projects per year
Abstract
String factorization is an important tool for partitioning data for parallel processing and other algorithmic techniques often found in the context of big data applications such as bioinformatics or compression. Duval's well-known algorithm uniquely factors a string over an ordered alphabet into Lyndon words, i.e., patterned strings which are strictly smaller than all of their cyclic rotations. While Duval's algorithm produces a pre-determined factorization, modern applications motivate the demand for factorizations with specific properties, e.g., those that minimize the number of factors or consist of factors with similar lengths. In this paper, we consider the problem of finding an alphabet ordering that yields a Lyndon factorization with such properties. We introduce a flexible evolutionary framework and evaluate it on biological sequence data. For the minimization case, we also propose a new problem-specific heuristic, Flexi-Duval, and a problem-specific mutation operator for Lyndon factorization. Our results show that our framework is competitive with Flexi-Duval for minimization and yields high quality and robust solutions for balancing where no problem-specific algorithm is available.
Original language | English |
---|---|
Title of host publication | Parallel Problem Solving from Nature – PPSN XVI - 16th International Conference, PPSN 2020, Proceedings |
Subtitle of host publication | 16th International Conference, Leiden, The Netherlands, September 5-9, 2020, Proceedings |
Editors | Thomas Bäck, Mike Preuss, André Deutz, Michael Emmerich, Hao Wang, Carola Doerr, Heike Trautmann |
Publisher | Springer Nature |
Pages | 390-403 |
Number of pages | 14 |
ISBN (Electronic) | 9783030581121 |
ISBN (Print) | 9783030581114 |
DOIs | |
Publication status | Published - 31 Aug 2020 |
Event | Parallel Problem Solving Nature - Leiden, Netherlands Duration: 05 Sept 2020 → 09 Sept 2020 Conference number: 16 https://ppsn2020.liacs.leidenuniv.nl |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 12269 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | Parallel Problem Solving Nature |
---|---|
Abbreviated title | PPSN |
Country/Territory | Netherlands |
City | Leiden |
Period | 05 Sept 2020 → 09 Sept 2020 |
Internet address |
Keywords
- Alphabet ordering
- Biosequences
- Duval’s algorithm
- Lyndon words
- Permutations
- String factorization
Fingerprint
Dive into the research topics of 'Evaluation of a Permutation-Based Evolutionary Framework for Lyndon Factorizations'. Together they form a unique fingerprint.Datasets
-
Supporting Information for Evaluation of a Permutation-Based Evolutionary Framework for Lyndon Factorizations - Data
Major, J. A., Clare, A., Daykin, J., Mora, B., Peña Gamboa, L. & Zarges, C., Prifysgol Aberystwyth | Aberystwyth University, 05 Sept 2020
DOI: 10.20391/e74e16cc-aa91-41b2-99d7-04465c23cf0a
Dataset
File -
Supporting Information for Evaluation of a Permutation-Based Evolutionary Framework for Lyndon Factorizations - Code
Major, L., Clare, A., Daykin, J., Mora, B., Peña Gamboa, L. & Zarges, C., Prifysgol Aberystwyth | Aberystwyth University, 05 Sept 2020
DOI: 10.5281/zenodo.3896389, https://github.com/jam86/Evaluation-of-a-Permutation-Based-Evolutionary-Framework-for-Lyndon-Factorizations
Dataset
Projects
- 1 Finished
-
SeqInfo : Big Data Stringology Algorithms for 2nd and 3rd Generation Sequencing - Dr Jacqueline Daykin - EU/WG
Clare, A. (PI)
15 Jan 2018 → 07 Jan 2021
Project: Externally funded research