R Lab - Part II

Association modeling

In this part of the exercise, we model (on the log-scale) the association of miRNA espression on protein expression adjusting for the corresponding mRNA.

Investigate miR-107 and B-RAF (Aure et al, 2015, Figure 2H)

prt.BRAF = prt[12,]
rna.BRAF = rna[12,]
mir.107 = mir[16,]

(a) Linear regression model

on the log-scale, Aure et al. 2015, equation (3)

fitA <- lm(prt.BRAF ~ mir.107 + rna.BRAF)
summary(fitA)


Call:
lm(formula = prt.BRAF ~ mir.107 + rna.BRAF)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.3028 -0.6126  0.0453  0.6153  3.1361 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -7.315e-08  5.068e-02   0.000        1    
mir.107      4.324e-01  5.079e-02   8.513 1.06e-15 ***
rna.BRAF     3.200e-01  5.079e-02   6.301 1.15e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.851 on 279 degrees of freedom
Multiple R-squared:  0.281, Adjusted R-squared:  0.2758 
F-statistic: 54.51 on 2 and 279 DF,  p-value: < 2.2e-16

Add smooth non-linear cures to the scatterplots: use existing panel.smooth() function, and add linear regression lines to the scatterplots:

panel.linear <- function (x, y, col.regres = "blue", ...) 
{ 
  points(x, y, pch=19) 
  ok <- is.finite(x) & is.finite(y) 
  if (any(ok)) 
    abline(stats::lm(y[ok] ~ x[ok]), col = col.regres, ...) 
} 

pairs(data.frame(mir.107, prt.BRAF, rna.BRAF), 
      lower.panel = panel.smooth,
      upper.panel = panel.linear)

(b) Lasso-penalised linear model

with all miRNAs (Aure et al. 2015, equation (4))

library(glmnet)

# 10-fold CV to determine the optimal lambda:
# Note: rna.BRAF is penalised together with all the mir variables. 
# You can use the penalty.factor option to avoid this.
set.seed(1234)
cvfit <- cv.glmnet(y=prt.BRAF, x=t(rbind(mir, rna.BRAF)),
                   alpha=1, nfolds=10, standardize=TRUE)

par(mfrow=c(1,1))
plot(cvfit)
lambda.opt <- cvfit$lambda.min

# Coefficient path plot and coefficients for optimal lambda:
fitB <- cvfit$glmnet.fit

plot(fitB, xvar="lambda")
abline(v=log(lambda.opt))

coef(fitB, s=lambda.opt)
predict(fitB, type="nonzero", s=lambda.opt)

Compare the regression coefficient of mir.107 from the models in (a) and (b):

coef(fitA)["mir.107"]
as.matrix(coef(fitB, s=cvfit$lambda.min))["hsa-miR-107",]

Task 3

Repeat the lasso analysis, but this time do not penalise the rna.BRAF variable together with the mir variables.

Check out the information on the penalty.factor option in ?glmnet to understand how.

--- title: "R Lab - Part II" format: html: code-fold: false code-tools: true editor: visual --- ## Association modeling In this part of the exercise, we model (on the log-scale) the association of miRNA espression on protein expression adjusting for the corresponding mRNA. Investigate **miR-107** and **B-RAF** (Aure et al, 2015, Figure 2H) ```{r} #| label: load-data-againn #| warning: false #| echo: false # need to load the data in this page again. DO NOT SHOW mir <- read.table("lab/data/miRNA-421x282.txt", header=T, sep="\t", dec=".") rna <- read.table("lab/data/mRNA-100x282.txt", header=T, sep="\t", dec=".") prt <- read.table("lab/data/prot-100x282.txt", header=T, sep="\t", dec=".") mir <- as.matrix(mir) rna <- as.matrix(rna) prt <- as.matrix(prt) ``` ```{r} #| label: select-braf #| warning: false #| echo: true prt.BRAF = prt[12,] rna.BRAF = rna[12,] mir.107 = mir[16,] ``` ### (a) Linear regression model on the log-scale, Aure et al. 2015, equation (3) ```{r} #| echo: true #| eval: true fitA <- lm(prt.BRAF ~ mir.107 + rna.BRAF) summary(fitA) ``` Add smooth non-linear cures to the scatterplots: use existing `panel.smooth()` function, and add linear regression lines to the scatterplots: ```{r} #| echo: true #| eval: true panel.linear <- function (x, y, col.regres = "blue", ...) { points(x, y, pch=19) ok <- is.finite(x) & is.finite(y) if (any(ok)) abline(stats::lm(y[ok] ~ x[ok]), col = col.regres, ...) } pairs(data.frame(mir.107, prt.BRAF, rna.BRAF), lower.panel = panel.smooth, upper.panel = panel.linear) ``` ### (b) Lasso-penalised linear model with **all miRNAs** (Aure et al. 2015, equation (4)) ```{r} #| echo: true #| eval: false library(glmnet) # 10-fold CV to determine the optimal lambda: # Note: rna.BRAF is penalised together with all the mir variables. # You can use the penalty.factor option to avoid this. set.seed(1234) cvfit <- cv.glmnet(y=prt.BRAF, x=t(rbind(mir, rna.BRAF)), alpha=1, nfolds=10, standardize=TRUE) par(mfrow=c(1,1)) plot(cvfit) lambda.opt <- cvfit$lambda.min # Coefficient path plot and coefficients for optimal lambda: fitB <- cvfit$glmnet.fit plot(fitB, xvar="lambda") abline(v=log(lambda.opt)) coef(fitB, s=lambda.opt) predict(fitB, type="nonzero", s=lambda.opt) ``` Compare the regression coefficient of mir.107 from the models in (a) and (b): ```{r} #| echo: true #| eval: false coef(fitA)["mir.107"] as.matrix(coef(fitB, s=cvfit$lambda.min))["hsa-miR-107",] ``` ::: callout-note ## Task 3 Repeat the lasso analysis, but this time do not penalise the `rna.BRAF` variable together with the mir variables. Check out the information on the `penalty.factor` option in `?glmnet` to understand how. :::