1 Download data and set working directory

This workbook requires the sample data used for the DiffBind vignette. These data can be obtained as follows:

tmpdir <- tempdir()
url <- 'https://content.cruk.cam.ac.uk/bioinformatics/software/DiffBind/DiffBind_vignette_data.tar.gz'
file <- basename(url)
options(timeout=600)
download.file(url, file.path(tmpdir,file))
untar(file.path(tmpdir,file), exdir = tmpdir )
knitr::opts_knit$set(root.dir=file.path(tmpdir,"DiffBind_Vignette"))

The examples in this workbook all follow the analysis or the impact of tamoxifen resistance on ER binding as discussed in detail in the DiffBind vignette.

2 Introduction

The dba.plotProfile() function enables the computation of peakset profiles and the plotting of complex heatmaps. It serves as a front-end to enable experiments analyzed using DiffBind to more easily use the profiling and plotting functionality provided by the profileplyr package written by Tom Carroll and Doug Barrows.

Processing proceed in two phases.

In the first phase, specific peaksets are extracted from a DiffBind DBA object and profiles are calculated for these peaks for set of samples in the DiffBind experiment. Profiles are calculated by counting the number of overlapping reads in a series of bins upstream and downstream of each peak center.

In the second phase, the derived profiles are plotted in a series of complex heatmaps showing the relative intensity of overlapping peaks in each bin for each peak in each sample, along with summary plots showing the average profile across the sites for each sample.

Due to the computational cost of this function, it is advised that the calculation of profiles and the plotting be separated into two calls, so that the profiles do not need to be re-generated if something goes wrong in the plotting. By default, when a DBA object is passed in to generate profiles, plotting is turned off and a profileplyr object is returned. When dba.plotProfile() is called with a profileplyr object, a plot if generated by default.

The main aspects of the profile plot are which samples are plotted (the X-axis) and which sites are plotted (the Y-axis). These can be specified in a number of flexible ways. Other parameters to dba.plotProfile() determine how the data are treated, controlling aspects such as how many sites are included in the plot, data normalization, sample merging (computing mean profiles for groups of samples), and control over the appearance of the plot.

3 Default plots

The default plot depend on whether or not an analysis has been completed.

3.1 Default plot: no analysis

If no analysis has been completed, the default plot will include all samples, in a single group. They will be merged based on the DBA_REPLICATE attribute, such that each sample class will have one heatmap based on the normalized mean read counts for all the replicate samples in that class.

By default, up to 1,000 of the consensus sites (randomly sampled) will be included, in a single group. If the genome is supported, annotation to nearby genomic features (promoters, genes, intragenic) will be determined and plotted.

data(tamoxifen_counts)
tamoxifen$config$RunParallel <- TRUE
profiles <- dba.plotProfile(tamoxifen)
## 
## 
## Generating profiles...
dba.plotProfile(profiles)
## Plotting...