High throughput sequencing methods produce massive amounts of data. The most common first step in interpretation of these data is to map the data to genomic intervals and then overlap with genome annotations. A major interest in computational genomics is spatial genome-wide correlation among genomic features (e.g. between transcription and histone modification). The key hypothesis here is that features that are similarly distributed along a genome may be functionally related. Results: Here, we propose a method that rapidly estimates genomewide correlation of genomic annotations; these annotations can be derived from high throughput experiments, databases, or other means. The method goes far beyond the simple overlap and proximity tests that are commonly used, by enabling correlation of continuous data, so that the loss of data that occurs upon reduction to intervals is unnecessary. To include analysis of nonoverlapping but spatially related features, we use kernel correlation. Implementation of this method allows for correlation analysis of two or three profiles across the human genome in a few minutes on a personal computer. Another novel and extraordinarily powerful feature of our approach is the local correlation track output that enables overlap with other correlations (correlation of correlations). We applied our method to the datasets from the Human Epigenome Atlas and FANTOM CAGE. We observed the changes of the correlation between epigenomic features across developmental trajectories of several tissue types, and found unexpected strong spatial correlation of CAGE clusters with splicing donor sites and with poly(A) sites.
Please refer to our BiorXiv preprint for details of the method and for examples.
Source code and Galaxy integration scripts
The source code of the tool and Galaxy integration files are avalable as a GitHub repository.
Input of the StereoGene program are files in one of standard Genome Browser formats: BED, WIG, BedGraph, BroadPeak. Program parameters are taken from config file, some parameters can be listed in comand line.
Full program and parameter description is presented here.
A set of command-line invocation examples is here. Refer to the examples read me file.
Program test: Human Epigenome Atlas Pairwise Correlation Anthology
As a simple, productive test of our method, we prepared an anthology of pairwise correlations of the profiles from Human Epigenome Atlas. We built a pipeline that analyzes colocalization at all pairs of different profiles from the same tissue (or cell line) and all the pairs of same profiles from different tissues. The results were organized in a web page.
An immediate observation is that almost every comparison of Epigenomics Roadmap profiles shows a significant positive correlation, while negative correlations appear rarely.
If you find some bug or need additional information, please mail to Elena Stavrovskaya.