URP_course

This project is maintained by sarbal

Lesson 4: Another potentially fun example.

Let’s do some analyses.

Download these files into your working directory:

####
To check your working directory:

getwd()

To set your working diretory:

setwd("H:/URP")

Run this to install/load libraries

source("helper.R") 
load("lesson4.Rdata")

The many ways of gene set enrichment

Gene set enrichment (GSE)

A method to identify properties of genes or proteins that are over-represented. GSE is a (relatively) straightforward method to sumamrize the results of your experiment, in particualr when you believe there is some link or association to a known phenotype (e.g., enrichment for dopamine receptors in Parkinson’s disease). Uses statistical approaches to identify significantly enriched groups of genes, and the main annotation database is the Gene Ontology (GO) and the Molecular Signatures database (MSigDB).

install.packages("enrichR")
library(enrichR)
dbs <- listEnrichrDbs()
dbs <- c("GO_Molecular_Function_2015", "GO_Cellular_Component_2015", "GO_Biological_Process_2015")
enriched <- enrichr(c("Runx1", "Gfi1", "Gfi1b", "Spi1", "Gata1", "Kdr"), dbs)

Multifunctionality

Replicating a figure

Let’s take a look at this paper: https://genome.cshlp.org/content/22/4/602

  1. Get data
gene_expression_file = " Gene.expression.data/Normalized.expression.data.txt" 
exprs = read.table(gene_expression_file, header=T, row.names=1) 
samples = matrix(unlist(strsplit(colnames(exprs)[33:94] , "_" ) ) , byrow=T, ncol=2)
  1. Clean up data
    exprs.means = t(sapply(1:dim(exprs)[1], function(i) tapply(  as.numeric(exprs[i,33:94]), samples[,1], mean, na.rm=T) ))
    samples.sub = names(tapply(  as.numeric(exprs[i,33:94]), samples[,1], mean, na.rm=T)) 
    colnames(exprs.means) = samples.sub
    rownames(exprs.means) = rownames(exprs)
    
  2. Data analysis
    • Which genes are present in enough species?
    • Which species does not have enough data? ``` colSums(is.na(exprs.means )) filt.species = which.max(colSums(is.na(exprs.means ))) gene.subset = rowSums(is.na(exprs.means[,-filt.species ] ) ) >5

samples.cor = cor(exprs.means[gene.subset,-filt.species ], m=”s”, use=”p”)

4. Plot/graph 

heatmap.3(samples.cor) samples.cor = cor(exprs.means[ ,-filt.species], m=”s”, use=”p”) heatmap.3(samples.cor) ```

Back to the homepage