ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads...
-
Upload
andra-jones -
Category
Documents
-
view
222 -
download
1
Transcript of ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads...
Initial QC
• FASTQC• Mappability• Uniquely mapped reads• Uniquely mapped locations• Uniquely mapped locations / Uniquely mapped
reads• Good to keep one read / location in peak calling
2
Peak Calls
• Tag distribution along the genome ~ Poisson distribution (λBG = total tag / genome size)
• ChIP-Seq show local biases in the genome– Chromatin and sequencing bias– 200-300bp control windows have to few tags– But can look
further
Dynamic λlocal =
max(λBG, [λctrl, λ1k,] λ5k, λ10k)
ChIP
Control
300bp1kb5kb10kb
http://liulab.dfci.harvard.edu/MACS/Zhang et al, Genome Bio, 2008
Peak Call Statistics
• P-value and FDR • Simulation: random sampling of reads? • FDR = A / B, BH correction or Qvalue• P-value / FDR changes with sequencing depth• Fold change does not
4
<1% enriched
MAT: Quality Control
Background
Enriched DNA
A B
ChIP-seq QC
• Number of peaks with good FDR and fold change• FRiP score:
– Fraction of reads in peaks
– Often higher for histone modifications than transcription factors
– Often increase slightly with increasing read depth
• Overlap with union of peaks in public DNase-seq data– Working ChIP-seq peaks overlap > 70% of union DHS
5
DNase-seq
• Captures all regulatory sequences in the prostate genome
66
Sabo et al, Nat Methods 2006; Thurman et al, Nat 2012
ChIP-seq QC
• Evolutionary conservation– Can be used for ChIP QC
• Conserved sites more functional?– Majority of functional sites
not conserved
7
Odom et al, Nat Genet 2007
Enrichment Distribution
• CEAS (Shin et al, Bioinfo, 2009)– Meta-gene profiles: TF and histone marks
– % of peaks at promoter, exons, introns, and distal intergenic sequences
– SitePro of signal at specific sites
• Replicate agreement: > 60% or > 0.6
8
Human TF Binding Distribution
• Most TF binding sites are outside promoters• How to assign targets?• Nearest distance?• Binding within 10KB?• Number of binding?• Other knowledge?
11
How to Assign Targets for Enhancer Binding Transcription Factors?
• Regulatory potential: sum of binding sites weighted by distance to TSS with exponential decay
• Decay modeled from Hi-C experiments
14
TSS
Direct Target Identification
• Binary decision?• Rank product of
regulatory potential and differential expression
• BETA
15
Is My Factor an Activator, Repressor, or Both?
• Most labs have differential expression profiling of transcription factor together with TF ChIP-seq
• Do genes with higher regulatory potential show more up- or down-expression than all the genes in the genome?
16
ChIP-chip/seq Motif Finding
• ChIP-chip gives 10-5000 binding regions ~200-1000bp long. Precise binding motif?– Raw data is like perfect clustering, plus enrichment
values
• MDscan– High ChIP ranking => true targets, contain more sites
– Search TF motif from highest ranking targets first (high signal / background ratio)
– Refine candidate motifs with all targets
17
Similarity Defined by m-match
For a given w-mer and any other random w-mer
TGTAACGT 8-mer
TGTAACGT matched 8
AGTAACGT matched 7
TGCAACAT matched 6
TGACACGG matched 5
AATAACAG matched 4
m-matches for TGTAACGT
Pick a reasonable m to call two w-mers similar
18
MDscan Seeds
ATTGCAAATTTTGCGAATTTTGCAAAT
Seedmotif pattern
ATTGCAAAT
A 9-mer
TTTGCAAAT
TTTGCGAAT
Hig
her
enri
chm
ent
ChIP-chip selected upstream sequences
TTGCAAATC
CAAATCCAACAAATCCAAGAAATCCAC
GCAAATCCAGCAAATTCGGCAAATCCAGGAAATCCAGGAAATCCT
TGCAAATCCTGCAAATTC
GCCACCGTACCACCGTACCACGGTGCCACGGC…
TTGCAAATCTTGCGAATATTGCAAATTTTGCCCATC
19
Further Refine Motifs
• Could also be used to examine known motif enrichment
• Is motif enrichment correlated with ChIP-seq enrichment?
• Is motif more enriched in peak summits than peak flanks?
• Motif analysis could identify transcription factor partners of ChIP-seq factors
22
Estrogen Receptor
• Carroll et al, Cell 2005• Overactive in > 70% of breast cancers• Where does it go in the genome?• ChIP-chip on chr21/22, motif and expression
analysis found its “pioneering factor” FoxA1
TF??ER
Estrogen Receptor (ER) Cistrome in Breast Cancer
• Carroll et al, Nat Genet 2006
• ER may function far away (100-200KB) from genes
• Only 20% of ER sites have PhastCons > 0.2
• ER has different effect based on different collaborators
AP1
ER
NRIP
Estrogen Receptor (ER) Cistrome in Breast Cancer
• Carroll et al, Nat Genet 2006
• ER may function far away (100-200KB) from genes
• Only 20% of ER sites have PhastCons > 0.2
• ER has different effect based on different collaborators
AP1
ERNRIP
Cell Type-Specific Binding
• Same TF bind to very different locations in different tissues and conditions, why?
• TF concentration?• Collaborating factors, esp pioneering factors• Interesting observations about pioneering factors
26
Summary
• ChIP-seq identifies genome-wide in vivo protein-DNA interaction sites
• ChIP-seq peak calling to shift reads, and calculate correct enrichment and FDR
• Functional analysis of ChIP-seq data:– Strong vs weak binding, conserved vs non-conserved
– Target identification
– Motif analysis
• Cell type-specific binding Epigenetics
27