Search of Therapeutic Targets in Metabolic Pathways of TGF...

1
Search of Therapeutic Targets in Metabolic Pathways of TGF-β Using Graph Coloring Approaches Maxime Folschette 1, 2 , Anne Siegel 1 , Vincent Legagneux 2 , Carito Guziolowski 3 , Nathalie Th ´ eret 1, 2 1 Univ Rennes, Inria, CNRS, IRISA 2 IRSET Inserm UMR 1085 3 ´ Ecole centrale de Nantes, LS2N Journ´ ees Ouvertes Biologie, Informatique et Math´ ematiques (JOBIM) — 3 rd –6 th of July 2018 Introduction & Context Hepatocellular carcinoma (HCC) is the most widespread type of liver cancer. Its occurrence is associated to chronic hepatitis (virus, alcohol, steatosis, etc.) that often happen with the development of fibrosis which, in its terminal phase, cirrhosis, is a major cause of HCC. Because of a late detection, the prognostic is very bad with a survival rate ranging from weeks to months. The aim of this work is to identify new invasiveness markers by exploiting the knowledge stored into ontological and experimental databases. Transforming Growth Factor β (TGF-β) The TGF-β is a protein implied in the signaling of many biological functions (cellular growth, differenciation, etc.). It plays a major role in the tumor progression of HCC by especially triggering the fibrosis, the epithelial-mesenchymal transition (EMT) and thus the tumor invasion. Epithelial-Mesenchymal Transition (EMT) Early stage of tumor Epithelial cells Adhesive Late stage of tumor Mesenchymal cells Motile & invasive EMT The EMT happens when epithelial cancerous cells transform into mesenchymal cells, able to reshape the extra-cellular matrix and move in it, thus increasing tumor aggressiveness by creating metastasis. We use an EMT signature which is a set of genes (including TGF-β) known to be over-expressed during and after the EMT and thus constitutes a good marker to discriminate two tumor stages: early and non-invasive versus late and invasive. The signature is proposed by the Broad institute and is available in MSigDB [Subramanian et al., 2005]. 1. Select Differently Expressed Genes in Invasive Tumors Data: 294 samples extracted from liver cancers taken from the project LIHC-US of the database ICGC [Hudson et al., 2010]. Objective: Discriminate two groups (early and late stages of tumor) to carry a differential analysis, and establish two sets of genes that are over- and under-expressed in the late group compared to the early group. Method: Clustering study based on the EMT signature of MSigDB (H-v6.1). The two sets are extracted by setting thresholds for the P-value (P adj < 10 -5 ) and the fold-change (log 2 (FC) > 2 for the over-expression and log 2 (FC) < -2 for the under-expression). Results: 821 over-expressed genes and 89 under-expressed genes, among which 80 are part of the EMT signature. The volcano plot on the right represents the P-value as a function of the fold-change in logarithmic scales. The genes of the EMT signature are marked in red. EMT signature Under-expressed Over-expressed 2. Extract a Regulatory Graph from Pathway Commons Data: Previous 910 seed genes (821 over- and 89 under-expressed genes) Objective: Extract a regulatory graph from Pathway Commons, which is a gathering of 25 pathway databases [Cerami et al., 2010]. Method: The tool BRAvo [Lefebvre et al., 2017] allows to interrogate this database using SPARQL queries. From the 910 seed genes, it computes all (direct and indirect) upstream regulations. Results: A graph in SIF format containing 1197 nodes and 10551 edges. Only 645 genes of the initial 910 genes were found. Difficulties: Pathway Commons is very heterogeneous and the result requires postprocessing. Some bases should also be excluded from the search. 3. Apply Coloring Propagation to Obtain Gene Predictions Data: Previous graph & sets of 821 over- and 89 under-expressed seed genes Objective: Infer if the new nodes in the graph should be over-expressed (+), under-expressed (-) or invariant (0) based on the value of the seed genes. Method: The tool Iggy [Thiele et al., 2015] achieves this with consistency rules: An affectation (+, - or 0) must be explained by at least one predecessor; There must be a path between an affectation + or - and one of the 910 seed genes; Without such a path or in case of contradictory influences, coloring 0 can be affected. A prediction is a node affectation that is common to all compatible colorings. Results: 185 (34%) non-seed nodes are predicted, and 61 (11%) as + or -. 4. Exploitation, Validation & Outlooks Exploitation: The obtained predictions may be suitable as new tumor invasiveness markers or therapeutic targets. We can also search for the regulating miRNA and point to those of particular interest. Validation: Cross-validation (by hiding a part of the observations and checking the new predictions) would allow to assess the robustness of our approach. Outlooks: Searching for key controllers: minimal sets of genes that have a maximum impact on predictions. Dynamical analysis on attractors reachability [Poret and Guziolowski, 2018]. References Cerami, E. G. et al. (2010). Pathway Commons, a web resource for biological pathway data. Nucleic acids research, 39. http://www.pathwaycommons.org/. Hudson, T. J. et al. (2010). International network of cancer genome projects. Nature, 464. http://icgc.org/. Lefebvre, M. et al. (2017). Regulatory and signaling network assembly through linked open data. In Journ´ ees Ouvertes en Biologie, Informatique et Math´ ematiques. Demo paper. https://github.com/symetric-group/bionets-demo. Poret, A. and Guziolowski, C. (2018). Therapeutic target discovery using boolean network attractors: improvements of kali. Royal Society Open Science, 5(2). Subramanian, A. et al. (2005). Gene Set Enrichment Analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. of the Nat. Ac. of Sci., 102(43). http://software.broadinstitute.org/gsea/. Thiele, S. et al. (2015). Extended notions of sign consistency to relate experimental data to signaling and regulatory network topologies. BMC Bioinformatics, 16(1). http://bioasp.github.io/iggy/. EMT signature (MSigDB) Expression data (ICGC) Clustering & fold-change Partial coloring +/ - Genes of interest 910 Querying of Pathway Commons (BRAvo) Graph of regu- latory pathways 1197 nodes 10551 edges Consistency rules (Iggy) Predictions 0 + - 22% 8.7% 2.4% Exploitation Validation Other outcomes Key controllers Dynamical analysis

Transcript of Search of Therapeutic Targets in Metabolic Pathways of TGF...

Search of Therapeutic Targets in Metabolic Pathways of TGF-β Using Graph Coloring ApproachesMaxime Folschette1, 2, Anne Siegel1, Vincent Legagneux2, Carito Guziolowski3, Nathalie Theret1, 2

1Univ Rennes, Inria, CNRS, IRISA 2IRSET Inserm UMR 1085 3Ecole centrale de Nantes, LS2N

Journees Ouvertes Biologie, Informatique et Mathematiques (JOBIM) — 3rd–6th of July 2018

Introduction & ContextHepatocellular carcinoma (HCC) is the mostwidespread type of liver cancer. Its occurrence isassociated to chronic hepatitis (virus, alcohol, steatosis,etc.) that often happen with the development of fibrosiswhich, in its terminal phase, cirrhosis, is a major causeof HCC. Because of a late detection, the prognostic isvery bad with a survival rate ranging from weeks tomonths. The aim of this work is to identify newinvasiveness markers by exploiting the knowledge storedinto ontological and experimental databases.

Transforming Growth Factor β (TGF-β)The TGF-β is a protein implied in the signaling of many biological functions (cellular growth,differenciation, etc.). It plays a major role in the tumor progression of HCC by especially triggeringthe fibrosis, the epithelial-mesenchymal transition (EMT) and thus the tumor invasion.

Epithelial-Mesenchymal Transition (EMT)Early stage

of tumor

Epithelialcells

Adhesive

Late stageof tumor

Mesenchymalcells

Motile & invasive

EMT

The EMT happens when epithelial cancerous cellstransform into mesenchymal cells, able to reshape theextra-cellular matrix and move in it, thus increasingtumor aggressiveness by creating metastasis. We usean EMT signature which is a set of genes (includingTGF-β) known to be over-expressed during and after theEMT and thus constitutes a good marker to discriminatetwo tumor stages: early and non-invasive versus late andinvasive. The signature is proposed by the Broad instituteand is available in MSigDB [Subramanian et al., 2005].

1. Select Differently Expressed Genes in Invasive TumorsData: 294 samples extracted from liver cancers taken from the project

LIHC-US of the database ICGC [Hudson et al., 2010].Objective: Discriminate two groups (early and late stages of tumor) to

carry a differential analysis, and establish two sets of genes that areover- and under-expressed in the late group compared to the early group.

Method: Clustering study based on the EMT signature ofMSigDB (H-v6.1). The two sets are extracted by setting thresholds forthe P-value (Padj < 10−5) and the fold-change (log2(FC) > 2 for theover-expression and log2(FC) < −2 for the under-expression).

Results: 821 over-expressed genes and 89 under-expressedgenes, among which 80 are part of the EMT signature. The volcanoplot on the right represents the P-value as a function of the fold-changein logarithmic scales. The genes of the EMT signature are marked in red.

EMT signature

Und

er-e

xpre

ssed

Ove

r-exp

ress

ed

2. Extract a Regulatory Graph from Pathway CommonsData: Previous 910 seed genes (821 over- and 89 under-expressed genes)Objective: Extract a regulatory graph from Pathway Commons, which

is a gathering of 25 pathway databases [Cerami et al., 2010].Method: The tool BRAvo [Lefebvre et al., 2017] allows to interrogate this

database using SPARQL queries. From the 910 seed genes, it computes all(direct and indirect) upstream regulations.

Results: A graph in SIF format containing 1197 nodes and 10551 edges. Only645 genes of the initial 910 genes were found.

Difficulties: Pathway Commons is very heterogeneous and the resultrequires postprocessing. Some bases should also be excluded from the search.

TEF

CERCAM

CST6

MYB

TCF21

FOXO1

JPH1

JAG1

CALB2

BICC1

KIRRELADAMTS3

C1QTNF7

SNCA

BHLHE22

TFDP1

CDC5L

PAX3

CTXN1

ATF3

CCBE1

FOXA1

GFPT2

CFTR

SAMD11

FLRT2

LAMC2

APBA2

IGJ

TF

TFCP2

E4F1

POU1F1

BEGAIN

CRX

ROR1

F3

HHIP PITX2

ETV7

IL7RPLCgamma1

CLIP3

MAP1B

WT1

IRF7

PGR

p300

CDO1

F12

SLC26A9

F2RL2

EGR3

ARG1

DKK4

SFRP4

TNC

ADAM28

CA5A

DPT

HNF1A

WNT10A

ELF2

STRA6

MGP

HAND1

VCAN

COL3A1

ITGA3

FZD2

CXCL14

DPYSL4

REN

EGR4

TRIM55

PRRX1

TM6SF1

SRF

ESRRA

HAO1

STON1

Histamine_HRH2_complex

OVOL2

MN1

CHAD

RAP1GAP2

AP1

C1orf126

APOM

SCRN1

AR_T-DHT

CH25H

NCAM1

PHLDB1

TSPYL5

CTHRC1

DARC

EGR1_Csda

CXCR3

CLIC6

STMN3

BRAFEGR1

LEF1_beta_catenin_PITX2

SOD3

TRPV4

Fra2_JUN

CTNNA2

TCF1E_beta_catenin

GGT6

KCNF1

FGF18

BGN

COL10A1LARP6

LEF1

PLG

INMT

KLHDC8A

COL5A1

WISP2

PLXNB3

BMP5

CPS1

NOTCH3

KIF3C

HTR2B

CILP2

PRELP

FZD10

SGCD

TNFRSF11B

MEP1A

MFAP5

MGC29506

TEK

FHL2

C19orf21

PDPN

LTF

LRRN2

FAM167A

BHMT2

ciglitazone_PPARG_complex

KRT7

THBD MLF1DUOX2

ABI3BP

JUNDWNT3A

RASSF5

PLAURCTNNB1

DDC

Heterocyclic_Compounds_Thiazoles_complex

CYBRD1 PRSS16

PTPLA

PROC

PHLDA3

AZGP1

HTRA3

MYOZ1

KCNS3SOAT2

MLLT3

LOXL4

NXNTIMP1PIPOX

RTP3

HSPB2

JUN_FOS

EVC

FBN1

PRSS8

TGFB2

MAZ

PRDM16

REST

GRAIL

FGFR1

PRL

NHLH1

FOXO3

WNT4

PDLIM3

C10orf107

DCDC2

PDE10A

GJB3

RRAD

Disulfiram_Copper_complex

ALOX5AP

HFE2

WISP1

BMPR1B

LSAMP

OLFML2A

FMOD

JPH2

TNFSF15

NTRK3

AIB1

MAT1A

PDLIM4

CRIP1

C1QTNF1

NSC_689534_Copper_complex

COL15A1

TMEM51

COL22A1

LINGO1

ROR2

CYP2D6

KCNK3

PFKFB1

FAM184A

VEPH1

LEPREL1

PBX1

TFAP4

NFE2 PTEN

REPIN1

SF1

HIF1A

NR2F2

RUNX1

MARK1

ELK1

PRSS35

PAX4

KAT2B

AG2

RORA

RBP4

CCL22

JUNB

PPARG

ELF1

ZHX2

HMX1

PMEPA1

CUX1

HSPB7

OSBPL5

STMN2

CHST1

PTH1R

ATF4

STAT3CCDC80

ITIH1

VTCN1

Organoselenium_Compounds_Copper_complex

F9

NMNAT2

MYCBP

APLNR

DBP

BACH2

LIF

PLA2G5ARHGEF4

EMILIN1

PKHD1

IGFBP6

PRND

FANK1

OLFM1

SRPX

MAFA

AXL

WNK2

BAI2SFRP5

LIME1

STAT6

ADAMTS12

FOXL1

EDNRAHGF

GTF2A2

E-cad-CTF2_p120_catenin_KaisoECM1

NOG

CHRDL1

PMP22

MTF1

CFHR5

HAND2

Dinoprostone_PTGER1_complex

NUDT10

GABPB2

FGF2

GATA6

LDOC1

LHX3

ANO4

KLF15

CRLF1

CTNND2

SERPINC1

RCAN2

TTYH1

ADAM33

LHX2

MIR21

OLFML3

FZD7

TMEM158

BID

WNT2B

ZBTB18

TRPV6

AAA1

F2

STAT5B

LPAR1

RXRs_LXRs_DNA_OxysterolsLUM

PIK3CA_gene_mutant_form

COL13A1

TNFSF10

DHH

NOV

ADAM12

IL34

SYT13

FOXI1

EPS8L1

ONECUT1

ITGB8

ESR1

TBP

DMKN

CDX2

RUNX2

PPARA

PRRX2RGS1

FOXC1

EVC2

SIRT1

MIF

MSX1

CACNA1H

DCN

NRP2

HAPLN3

ABR

NGN3

MAMDC2

SEL1L3 SPINT1PLN

SCARA5

ITGB6PARP1

HOXB4

CART1

MAFG

CREB1

NR3C1SEMA6A

PCBP1

ETS1

PTGDS

PDE5A

OLR1

COLEC10

SPON1

RGS4

HSPB6 SFRP1

CAPN6

COL8A1

ITGBL1

RREB1ADAMTSL2

SMARCD3

RELA

FXYD2 STAT1

TGIF EN1

HAMP

PNMAL1

DACT3

ZEB1

SLC1A2

FOXF2

OGN

KLF12

KRT81

COL12A1

ACAN

NTRK2

MSMB

FGF1

SRPX2

COL4A3

MAF

QSOX1

ETA_receptor_Endothelin-1

MARVELD1

SLC34A2

MFAP2

LRFN4

GTF3A

SPINT2

BACH1

HMCN1

PROCR

HOXA9

dNp63a

AQP1

GFI1

FOXO4

PAX2

SULF2

FOXJ1

BEND6

NKX2-1

MMP7

NF1

ETV4

GCM1

TFAP2A

CXCL5

CBFA2T3

MAPK10

ZG16

PI16

SELP

AQP9

TEAD1

FOXM1

HOXA5

GEFT

MACC1

HEPH

NTS

p70S6K

TGFB1

RB1_E2F1-3_DP

IL5

naphthalene_pyrrolo_2_1-c__1_4_benzodiazepine_complex

PRKN

PML

ICAM1

SOX30

Niclosamide_Polyethyleneimine_complex

TP53

EFEMP2

ESR2

TPBG

JUN_ATF2

SRF__MAL_

KCNH2

TCF4E_beta_catenin

CALHM2

HIF2A_ARNT

sELK1

TWIST2

PTN

IL1RL1

insulin

FAM43BJNK1-2-active

SLC6A6MYC_Max_MIZ-1

SMAD3_SMAD4_JUN_FOSF2R NHS

VSTM2L

CITED2

PROL1

DAO

ZNF462LECT2

SULT1E1

CDR2L

UGT2B10

MYOF

BASP1CTSE

SMAD3_SMAD4_ER_alpha

Fra2_JUNB ETB_receptor_Endothelin-1

SMAD3_SMAD4_SP1

COL1A2

PODNSDK2

SMAD4_SMAD1-5-8MSC

ERK1-2-activeCLDN4

FSTL1

PTGES

POU6F1

ATF6

TC2N

ZIC1

LEPREL2

GJC2

HOXD10

MEIS1

C18orf2

SLC22A17

TCF12

NALCN

BACE2

FOXA3

MYL9

NR1H4

TCF4

UBP1

ADAMTS2

DCLK1

RHOV

CREB3L1

SP3

PKLR

CORO2B

SULF1

MECOM

RHOA

Fra1_JUNB

MMP2

LPA_LPA1

PDE3A

MMP28

ARMCX2

PLEKHH2

CEBPG

ADAMTS15

FA2H

MAB21L2

MIR663A

TWIST1

MAGEL2

ANXA3

MMP11

C14orf132

CCL15

GLIS3

DUSP4

LHX6

TFDP2

ASAM

MAP1A

HMGA1

CYP2A7

NRF1

ANXA8

FMO2

ANTXR1

ABCG5

GPR124

F2RL1IGDCC4

ANKRD1

APLP1

CNN1

PRSS22

TLX2

DPYSL3

p107

CADM3 GPR161

PDGFRA

PPP2R2C

CDHR5

CPN1

RASEF

ARSI

ATF2_c-Jun

AEBP1

PROM2

IAPP

Jnk1

JUN

SREBF1

ZFPM2

CPA3

AMPH

TUSC3

LTC4S

FAP

ZIC2

MAPK12

NFE2L1

CYTSB

ZIC3

OTC

HOXD9

MAP9

SGCA

KCNA5

SERPIND1

APOA2

SLC4A3

HIF1A_ARNT

LGI2

NR1H3

TAp73a

FGF16

PRKG1

ALX4

KRT17

CYP17A1

ANK3

HEYL

CTBP2

NTM

GSTM5 CHST4

ERCC6_gene_mutant_form

ALOX5

MST1R

SERPINE2

CRTAC1

C7

CEBPD PTF1A

HLF NFATc

RXRB

ASPN

SPP2

STAT5A

BMX

FOXD1

SOX9

ZBTB14

NGFR

SLC38A3

GATA3

TCF3

CEBPA

ADAMTSL1

LMO2

VSNL1

Myb_GATA1

SMAD1

GATA1

MASP2

FOXN1

ATF2PAX6

CA4

VSX1

NEXN

BNC2

POSTN

SPZ1

PLA2G4A

GSTP1

PCDH18

DDIT3

NTF3

CRMP1

PRODH2

COL6A3

MYO10

SMAD4 NFAT1

PPP1R14C

SLC7A9SLC4A11

SPIB

CEBPB

IFNA1

SOX5WFDC2

FAM133A

OR5I1

POU3F2

SLIT3

F13B

HAS2

C20orf103

ZNF423

FGF10

KAP1

PTPRS

SRY

NFIL3

POU2F1

MYC_Max_TRRAP_GCN5_STAF65G_SPT3_TAF9_TAF10_TAF12_core_Mediator

Mad_Max

SP3_HDAC2

HPV-16_E6_E6AP

SP1_HDAC2

TERT

IFN-gamma_IRF1

Interferon_type_I

Smad3_MYC

EGF_EGFR

NFX1_SIN3_HDAC_complex

PKC_isoenzymes

Cbp_p300

GLI2

STAT3-active

SNAI1

GSN

PROM1

CHST3

NAP1L3

SMAD3_SMAD4

SERPINB5

p53

AHR

PCDH7

PAX5 TFAP2C

MYCNMAX

CDKN1B

PPAP2C

WBSCR17

LEF1_beta_catenin

ITGA11RASSF1

LRRC32 TGFB3

FAM180A

E2_ERA

RAB25

CCDC8WNT2

PAPLN

CRYABGAS1

HSPB8

MTTP

DES

MFAP4

GATA5

VEGFC

FASLG

pirinixic_acid_PPARA_complex

GUCY1A3

CXCL12

CTSK

SLIT2

SPEG

cortisol_GR_alpha

KEAP1

JUN_FOSNFAT1-c-4

RBPJ

MMP23B

CEBP

SRRM3

Gold_Dimethyldithiocarbamate_complex MYL3

PTX3

TP53_geneSP1_3

HOXB3

C1orf116

ZDHHC1

ESRP1

ENPP5

TAGLN

CLEC11A

DOK6

POU3F1

MRC2

CCDC74B

Kaiso

SMAD3

GUCY1B3

MST1 PKP3

CYP8B1

IGF1R

JNK2

IL1A

COX7B2

EP300

SLC39A5

THBS2

CD40LG

CXCL1

troglitazone_PPARG_complex

IFNG

IL12A_IL12B_complexHRH4

CTAG2

TAC1

PTK7

AHSG

GPR88

ELN

MRVI1

STAT5-active

REM1

STAT4

HNF4

FABP1

RBMS3

IL2_IL2R_alpha_beta_gamma_JAK1_LCK_JAK3_STAM1-2

SKI_SIN3a_NCoR1_TIF1B_Myb

FOXM1C_SP1

E2F1-3_DP

PP2A_HeterotrimerNF_kappa_B1_RelA

MYC_Max

MEOX1 DACT1

SFRP2GEM

TNXB

FOXP3

TMEM132A

SMOC2EPHB3

XYLT1

CCNJLSULT2A1

GALNT7

CDCP1SSPN

OMDEGR2

MYH11NBL1

MDM2

FSTL3C13orf33SEMA3C

BRCA1_NFE2L2_complex

NFAT1-c-4_PPARG

MOXD1

DEFB1

IGF1

FOXA2

PELI2

CDH3

JDP2

CHST8

IKBKB

SFTPA1_polymorphism_SFTPA2_polymorphism_complex

Dioxins_AHR_complex

Lipopolysaccharides_TLR4_complex

CLEC4E

TNF

NF_Kappa_B1

Serotonin_HTR2A_complex

AHR_IL6_complex

IL11B

IFNA2

MPO

IL1RNIL6

LKB1_ER_alpha

Fra1_USF2

HRAS

IL13

TNFSF11

NFAT1-c-4_MAF_IRF4

C5

APOA1

CSF1

CYR61

JUNB_Fra1_NFAT1-c-4

JUN_FOS_NFAT1-c-4IL27_IL27R_JAK2_TYK2

NFAT1-c-4_ICER1

JUN_NFAT1-c-4_p21SNFT NFAT1_FOXP3

CD2

JUN_FOS_NFAT1

IL2

E2F4_DP2_TRRAP

F10

PLA2G12B

MIM_GLI2A

CtBP_CBP_TCF1_TLE1_AES

GLYAT

TCF3_beta_catenin_TERT_BRG1

CFHR2

RBPJ_NICD

Thiosemicarbazones_Copper_complex

HNF4A

ZNF521

PKNOX2

EZH2

TMEM59L

RB1

PLXDC2 TMEM156

YY1

FOXQ1

FOXA1-2

G6PC

ENO1

S100B

PFKFB3

CSF2

RFX1

CLSTN2

TTR

APOC4

PAX8

PCDHGB7

IRF1 IRF2

E2F7-8

E2F1

RBPJ_NCoR_HDAC1

NICD1 HES1_HEY1-2 E2F4_DP2_p107-p130_HDAC1

SHARP_RBPJ

E2F6_DP

HES1

nonivamide_CCND1_complex

IKKs

PAGE4

TACSTD2

MYOCD

NFE2L2

TAp63g

EFEMP1

FOXD3COL4A2

KCTD12

TMEM100

AGR2

GREM1

NR6A1

SPI1

FOXJ2

HOXA4

CDH6

GATA2

ARHGAP6

UCHL1

APP_modified_form_Aluminum_complex

XBP1

ISYNA1

PDZK1IP1PAMR1 ARHGAP8

FN1JUN_JUND

NFAT1-c-4-active

HAS1

SUSD2

GPC4

RBPJ_NICD_MAML1-2_p300

ABCG8

Myb

GYS2 HNF1B

SRSF2

OSM

GPR20

N_N__-1_4-butanediylbis_N_-_3-isothiocyanatophenyl__thiourea_P2RY6_complex

Vitallium_analog_titanium_dioxide_complex

PLAT

HSF2

CLDN10

TXN

EFNB3SDK1

OSBPL10

HMOX1

MAPK3

BMP2

PRKCA

IL23_IL23R_JAK2_TYK2

PRODH

NAGS

HOXC4

dNp73a

KCNH3

CRP

VIP

PDGFB

AKR1B10

NFKB2_BCL3_complex

TMEM125

PEG10

CCND1

SEZ6L2

RAB31

TLR2

FCER2

PSEN2_mutant_form

COLEC11

PTGS2

HUNK

TLX1

CCL2

EGF

MARCO

TYMP

GAPT

HSF1

SPP1

LRRK1

DIO3

NPM1

COL16A1

PTGIS

AFAP1

C11orf63

ETNK2

DDB2

IL4

ADCY5

GABRB3

RAI2

CD14

KLHL30

ITPR3

CFD

ETS2

PAPPA

TP53_CDKN1A_complexHDAC1

TGIF1

EGR1_CDKN1A_complex

FOS_JUN_complex

n_1_25-dihydroxy-26_27-dimethylcholecalciferol_VDR_complex CDKN1A

n_3_4_3__4_-tetrachlorobiphenyl_AHR_complex

n_1_4-dihydroxy-2-naphthoic_acid_AHR_complex AHR_CYP1A1_complex

copper_II__nitrate_1_10-phenanthroline_Quinolones_complex

copper_II__nitrate_5-methyl-1_10-phenanthroline_Quinolones_complex

MIR892A

chrysene_AHR_complex

JUN_CXCL8_complex

GSTO1_gene_polymorphism

PLA2G1B

adenosine_5_-O-_3-thiotriphosphate__P2RY11_complex n_4_4_-_carbonylbis_imino-3_1-phenylenecarbonylimino-3_1-_4-methylphenylene_carbonylimino__bis_1_3-xylene-alpha_alpha_-diphosphonic_acid__P2RY11_complex

IL24Uridine_Diphosphate_CYSLTR1_complex

Toluene_2_4-Diisocyanate_ALB_complex

TLR4

ormosil_Polyethylene_Glycols_complex CXCL8

E2_ERA_PDI

E2_ERA_ERAP140

E2_ERA_PCNA

IL17A

E2_ERA_DP97

EDN1

E2_ERA_TIF2_CoCoA

E2_ERA_Src-1_MPG

E2_ERA_TIF2_P72

TFF1

Ketoconazole_AHR_complex

AHR_gene_mutant_form

n_3-__2_4-dimethylpyrrol-5-yl_methylidenyl_-2-indolinone_AHR_complex

AHR_ARNT_complex

copper_II__nitrate_bathophenanthroline_Quinolones_complex NRIP1_alternative_form_CYP1A1_complex AHR_CYP1A1_gene_polymorphism_complex

FOS

JUN_CDKN1A_complex TP63_mutant_form

STK11_mutant_form TP53_SP1_complex

NR1I3

Benzo_a_pyrene_AHR_complex

IFNB1

n_9_10-Dimethyl-1_2-benzanthracene_AHR_complex

NF_kappa_B1_p50_RelA

SFTPA1_polymorphism

IL33

Tetrachlorodibenzodioxin_AHR_complex benzo_k_fluoranthene_AHR_complex n_1-hydroxy-2-naphthoic_acid_AHR_complex Itraconazole_AHR_complex CYP1A1

tungsten_carbide_Cobalt_complex

SFTPA2_polymorphism

CREBBP_CDKN1A_complexHMGB1

Curcumin_VDR_complex

TP53_TP53_gene_complex

n_3-_2-hydroxy-4-_1_1-dimethylheptyl_phenyl_-4-_3-hydroxypropyl_cyclohexanol_CNR2_complex

CPNE7

NPTXR

ENPP7

PODNL1

C2orf40

WFDC1

TIMP2

SP1

LTBP2

MDFI

GPR173 BHLHE41

C4orf7

NGF

BDNF_modified_formTSPAN2

CaM_Ca2__Calcineurin_A_alpha-beta_B1

IGF2

LAMA2

TESC

TBX10

USH1C

HSPA1A

C19orf33

FRAS1

FAS

EPAS1

OBSCN

EEF1A2TCEAL7

GAL3ST4

KIAA1199

INS

ID4

CNTN4

UBASH3B

SYN1

NUDT11

CILP

TPPP3

PAQR5

CYP3A4

IL18

MUC1

STAT1_Cbp_p300

CPXM1

Fra1_JUN

ALB

RASSF9

C12orf39

IL1B

PTPRR

FLNC

EBF2

FOXM1B

MYOG

GATA4

REL

MYOD1

NPTX2

PDX1

ABHD1

HNF6

SLC24A3

TBX1

GALR3

RASL11B

IGFBP5

GPBAR1

OSR1

ARHGEF17

CPZ

HS3ST1 GLI3

ATF1

TMC5

DZIP1

LOXL1

Copper_Isatin_2-_2-aminoethyl_pyridine_complex

RB1CC1

CRABP2

Copper_Isatin_trimethylenediamine_complex

FXYD1

GABPB1

FGFR2

UNC5B

AGT_modified_form

NRXN3

BPTF

IRF8

STAT2

RUNX1T1

APP_modified_form

NME5

AR

PTHLH

NQO1

CAV1_mutant_form

MIR630

LY96

Estradiol_ESR2_complex

FBXL2

CDH11

FBLN1

KCNE4

CXCL6

FGF7

KRT19

FOXS1

COUP-TFII

EPCAM

PADI2

VLDLR

LRRC15

RASAL1

A1BG

LHFP

Bezafibrate_PPARA_complex

RNF34

CCL26

ATP10A

LEP

TGFA

FBLN2

KRT23

VEGFA

SCG2

CACNA2D1

TFPI2

FBLN5

FOXM1B_HNF6

CCL11

C3orf36

Glucocorticoids_NR3C1_complex

Ketoconazole_NR3C1_complex

NFKB1_RELB_complex

CYP2E1

CASP3

Miconazole_NR3C1_complex

CXCR4

NR1I2

TP63

GHRL_modified_form

DPP3

PTK2

SP1_SOD2_complex

n_2__5_-oligoadenylate_RNASEL_complex

SQSTM1

TP73_alternative_form

SOD2

C1QTNF2HIST3H2A

RASGEF1A PTMA

GPR182

SLCO2A1 EPHA3

FCER1A

CCL25

PDGFB_PDGFB_complex

CRHBP

COL1A1

Fra1_JUND IL8 TCF4_beta_catenin

HB-EGF_EGFR

ATF2_JUND_macroH2A

PPARD

Fog1

Akt1

CCDC3ABCB11

RXRA

E2F4_DP2_GCN5

GJA5

TSHZ3

GPM6A

PDZD4

E2F1_DP

LAMA1

Methotrexate_Lipids_complex

HOXA3

DKK2

EFNA5

TCF4_beta_catenin_TNIK

CEBPA_BRM_RB1_E2F4

NICD_YY1

TCF4_beta_catenin_JUN_DVL3

beta_catenin_TCF1_CtBP_CBP_TLE1_AES_SMAD4

MIM_GLI1

MYC

E2_ERA_AIB1

LBH

CYP7A1

ADAMTS1

FKBP10

3. Apply Coloring Propagation to Obtain Gene PredictionsData: Previous graph & sets of 821 over- and 89 under-expressed seed genesObjective: Infer if the new nodes in the graph should be over-expressed (+),

under-expressed (−) or invariant (0) based on the value of the seed genes.Method: The tool Iggy [Thiele et al., 2015] achieves this with consistency rules:

• An affectation (+, − or 0) must be explained by at least one predecessor;• There must be a path between an affectation + or − and one of the 910 seed genes;• Without such a path or in case of contradictory influences, coloring 0 can be affected.A prediction is a node affectation that is common to all compatible colorings.

Results: 185 (34%) non-seed nodes are predicted, and 61 (11%) as “+” or “−”.

4. Exploitation, Validation & OutlooksExploitation: The obtained predictions may be suitable as new tumor invasiveness markers or therapeutic targets.

We can also search for the regulating miRNA and point to those of particular interest.Validation: Cross-validation (by hiding a part of the observations and checking the new predictions) would allow to

assess the robustness of our approach.Outlooks: • Searching for key controllers: minimal sets of genes that have a maximum impact on predictions.

• Dynamical analysis on attractors reachability [Poret and Guziolowski, 2018].

ReferencesCerami, E. G. et al. (2010). Pathway Commons, a web resource for biological pathway data. Nucleic acids research, 39. http://www.pathwaycommons.org/.Hudson, T. J. et al. (2010). International network of cancer genome projects. Nature, 464. http://icgc.org/.Lefebvre, M. et al. (2017). Regulatory and signaling network assembly through linked open data. In Journees Ouvertes en Biologie, Informatique et Mathematiques. Demo paper. https://github.com/symetric-group/bionets-demo.Poret, A. and Guziolowski, C. (2018). Therapeutic target discovery using boolean network attractors: improvements of kali. Royal Society Open Science, 5(2).Subramanian, A. et al. (2005). Gene Set Enrichment Analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. of the Nat. Ac. of Sci., 102(43). http://software.broadinstitute.org/gsea/.Thiele, S. et al. (2015). Extended notions of sign consistency to relate experimental data to signaling and regulatory network topologies. BMC Bioinformatics, 16(1). http://bioasp.github.io/iggy/.

EMT signature(MSigDB)

Expressiondata (ICGC)

Clustering& fold-change

Partialcoloring

+ / −

Genes ofinterest

910

Querying ofPathway Commons

(BRAvo)

Graph of regu-latory pathways

1197 nodes10551 edges

Consistencyrules (Iggy)

Predictions0 + −

22% 8.7% 2.4%

ExploitationValidation

Otheroutcomes

Key controllersDynamical analysis