MeDReaders: a database for transcription factors that bind to methylated DNA

To whom correspondence should be addressed. Tel: +86 139 4609 4199; Fax: +86 451 8641 3309; Email: ghwang@hit.edu.cn. Correspondence may also be addressed to Yadong Wang. Tel: +86 186 4511 8639; Fax: +86 451 8641 3309; Email: ydwang@hit.edu.cn

Search for other works by this author on:

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China

Search for other works by this author on: Jianan Wang , Jianan Wang

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China

Search for other works by this author on:

Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA

Search for other works by this author on: Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA Search for other works by this author on:

Department of Pharmacology and Molecular Sciences, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA

Search for other works by this author on: Jiang Qian , Jiang Qian The Wilmer Eye Institute, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA Search for other works by this author on: Yadong Wang Yadong Wang

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China

To whom correspondence should be addressed. Tel: +86 139 4609 4199; Fax: +86 451 8641 3309; Email: ghwang@hit.edu.cn. Correspondence may also be addressed to Yadong Wang. Tel: +86 186 4511 8639; Fax: +86 451 8641 3309; Email: ydwang@hit.edu.cn

Search for other works by this author on:

Nucleic Acids Research, Volume 46, Issue D1, 4 January 2018, Pages D146–D151, https://doi.org/10.1093/nar/gkx1096

14 November 2017 15 August 2017 Revision received: 16 October 2017 21 October 2017 14 November 2017

Cite

Guohua Wang, Ximei Luo, Jianan Wang, Jun Wan, Shuli Xia, Heng Zhu, Jiang Qian, Yadong Wang, MeDReaders: a database for transcription factors that bind to methylated DNA, Nucleic Acids Research, Volume 46, Issue D1, 4 January 2018, Pages D146–D151, https://doi.org/10.1093/nar/gkx1096

Navbar Search Filter Mobile Enter search term Search Navbar Search Filter Enter search term Search

Abstract

Understanding the molecular principles governing interactions between transcription factors (TFs) and DNA targets is one of the main subjects for transcriptional regulation. Recently, emerging evidence demonstrated that some TFs could bind to DNA motifs containing highly methylated CpGs both in vitro and in vivo. Identification of such TFs and elucidation of their physiological roles now become an important stepping-stone toward understanding the mechanisms underlying the methylation-mediated biological processes, which have crucial implications for human disease and disease development. Hence, we constructed a database, named as MeDReaders, to collect information about methylated DNA binding activities. A total of 731 TFs, which could bind to methylated DNA sequences, were manually curated in human and mouse studies reported in the literature. In silico approaches were applied to predict methylated and unmethylated motifs of 292 TFs by integrating whole genome bisulfite sequencing (WGBS) and ChIP-Seq datasets in six human cell lines and one mouse cell line extracted from ENCODE and GEO database. MeDReaders database will provide a comprehensive resource for further studies and aid related experiment designs. The database implemented unified access for users to most TFs involved in such methylation-associated binding actives. The website is available at http://medreader.org/.

INTRODUCTION

In the process of gene transcription cooperative interactions between transcription factors (TFs) and DNA methylation play an important role in regulating gene expression. The classical view of TF–DNA interaction is that TFs usually bind to non-methylated DNA motifs in open chromatin regions, whereas high level of methylation at CpG dinucleotides (mCpG) in the cis-regulatory elements prohibits recruitment of TFs, except only a few proteins with a mCpG-binding domain (MBD), including MeCP2, MBD1, MBD2 and MBD4. These MBD proteins are known to recognize methylated DNA in a sequence-independent manner ( 1, 2). However, several TFs without MBDs were found to interact with methylated DNA in sporadic studies previously. For example, transcription factor KLF4 ( 3), Kaiso ( 4), ZFP57 ( 5) and CEBPα ( 6) were identified with high affinity to distinct methylated DNA sequences. More recently, systematic efforts have revealed that hundreds of TFs could specifically bind to methylated DNA by means of tandem mass spectrometry ( 7), functional protein microarray ( 3), DNA microarray ( 8), systematic evolution of ligands by exponential enrichment (SELEX) ( 9) and ChIP-BS-seq ( 10). Identification of such TFs and elucidation of their functions become important stepping stones towards understanding the mechanism underlying these methylation-mediated biological processes, leading to crucial implications for human diseases and cancer.

Over the past 30 years, many databases have been constructed to archive information of TF binding sites, providing invaluable resources for the transcription community and beyond. For instance, TRANSFAC ( 11), JASPAR ( 12) and UniPROBE ( 13) are the most common open-access databases containing hundreds of transcription factor position weight matrices (PWMs) constructed from DNA binding sequences. The PWMs can help search and predict potential TF binding sites in the whole genome. Meanwhile, TF regulatory activity has been known as biological species-dependent. Hence, lots of species-specific TF databases were created, such as PlantTFDB for plant ( 14), AnimalTFDB for Animal ( 15) and ITFP for human, mouse and rat ( 16). Some databases such as TFBSshape ( 17) not only contain extensive nucleotide sequences of TFs, but also calculate DNA structural features from nucleotide sequences provided by motif databases. Unfortunately, none of these databases records methylated DNA binding sites for TFs.

With the advance of next generation sequencing technologies, DNA methylation sites can be determined at the single base pair resolution. A number of systematical DNA methylation databases have been developed for epigenetic studies. As the first DNA methylation database, MethDB stores DNA methylation data and gene expression information ( 18). NGSMethDB archives DNA methylation profiles generated from bisulfite sequencing technique ( 19). MethBank ( 20), MethyCancer ( 21) and MENT ( 22) focus on DNA methylation status of some specific biological problems, such as embryonic development and multifarious cancers. MethSMRT hosts the DNA N6-methyladenine and N4-methylcytosine methylomes ( 23). ENCODE database also contains many datasets of Whole Genome Bisulfite Sequencing (WGBS) and ChIP-Seq datasets obtained from many cell lines. These databases provide us with a large amount of profiles including TFs binding sequences and corresponding DNA methylation status. However, none of the existing databases systematically documents the interactions between TFs and methylated DNA sequences.

To fill this gap for the researchers to better understand the interactions between DNA methylation and TFs, we collected information about methylated DNA–TF interactions from two major public sources: published literatures and ENCODE database. We developed a database, dubbed as MeDReaders, where 753 methylated DNA–TF interactions involving 731 TFs were manually curated from the literature. A total of 292 TFs were predicted to bind to distinct methylated and unmethylated DNA motifs based on integration of WGBS data and ChIP-Seq data in six human cell lines and one mouse cell line extracted from ENCODE and GEO database. MeDReaders can help the scientists to compare methylated DNA binding activities between different species and datasets, and further understand the biological processes that are mediated by DNA methylation. The MeDReaders is publicly available at http://medreader.org/ without use restriction.

MATERIALS AND METHODS

Data sources

To extract experimentally confirmed methylated DNA–TF interactions from the published literatures, we first searched all relevant papers from the PubMed literature database. CEBPα ( 3, 6), ZFP57/KAP1 ( 5, 24), ZBTB33 ( 4), CEBPB/ATF4 ( 25) were found to interact with methylated DNA using EMSA or ChIP-BS-seq experiments. Hundreds of TFs were identified to prefer CpG-methylated sequences by high-throughput technology, such as Tandem mass spectrometry (MS/MS) ( 26, 27), protein microarray ( 3), methylation-sensitive SELEX ( 9). In total we manually curated 753 methylated DNA–TF interactions involving 731 TFs from 4 human cell lines/tissues and 4 mouse cell lines/tissues (Table 1). However, the retrieved records are different due to diverse methods in individual experiments. For example, using SELEX in vitro, we only got TF binding motifs instead of binding sequences. But we obtained some protein binding DNA sequences from protein arrays, where methylated binding motif logos for only a few specific TFs can be retrieved.