Improving Transcription Factor Binding Site Predictions by Using Randomised Negative Examples

F. Rezwan, Yi Sun, Neil Davey, Rod Adams, Alistair G. Rust, Mark Robinson

Research output: Chapter in Book/Report/Conference proceedingConference Proceeding (Non-Journal item)

Abstract

It is known that much of the genetic change underlying morphological evolution takes place in cis-regulatory regions, rather than in the coding regions of genes. Identifying these sites in a genome is a non-trivial problem. Experimental methods for finding binding sites exist with some limitations regarding their applicability, accuracy, availability or cost. On the other hand predicting algorithms perform rather poorly. The aim of this research is to develop and improve computational approaches for the prediction of transcription factor binding sites (TFBSs) by integrating the results of computational algorithms and other sources of complementary biological evidence, with particular emphasis on the use of the Support Vector Machine (SVM). Data from two organisms, yeast and mouse, were used in this study. The initial results were not particularly encouraging, as still giving predictions of low quality. However, when the vectors labelled as non-binding sites in the training set were replaced by randomised training vectors, a significant improvement in performance was observed. This gave substantial improvement over the yeast genome and even greater improvement for the mouse data. In fact the resulting classifier was finding over 80% of the binding sites in the test set and moreover 80% of the predictions were correct.

Original languageEnglish
Title of host publicationInformation Processing in Cells and Tissues - 9th International Conference, IPCAT 2012, Proceedings
Subtitle of host publication9th International Conference, IPCAT 2012, Cambridge, UK, March 31 -- April 2, 2012, Proceedings
EditorsMichael A. Lones, Stephen L. Smith, Sarah Teichmann, Felix Naef, James A. Walker, Martin A. Trefzer
PublisherSpringer Nature
Pages225-237
Number of pages13
ISBN (Electronic)978-3-642-28792-3
ISBN (Print)978-3-642-28791-6
DOIs
Publication statusPublished - 05 Apr 2012
Externally publishedYes

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7223 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Fingerprint

Dive into the research topics of 'Improving Transcription Factor Binding Site Predictions by Using Randomised Negative Examples'. Together they form a unique fingerprint.

Cite this