= prt[12,]
prt.BRAF = rna[12,]
rna.BRAF .107 = mir[16,] mir
R Lab - Part II
Association modeling
In this part of the exercise, we model (on the log-scale) the association of miRNA espression on protein expression adjusting for the corresponding mRNA.
Investigate miR-107 and B-RAF (Aure et al, 2015, Figure 2H)
(a) Linear regression model
on the log-scale, Aure et al. 2015, equation (3)
<- lm(prt.BRAF ~ mir.107 + rna.BRAF)
fitA summary(fitA)
Call:
lm(formula = prt.BRAF ~ mir.107 + rna.BRAF)
Residuals:
Min 1Q Median 3Q Max
-2.3028 -0.6126 0.0453 0.6153 3.1361
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -7.315e-08 5.068e-02 0.000 1
mir.107 4.324e-01 5.079e-02 8.513 1.06e-15 ***
rna.BRAF 3.200e-01 5.079e-02 6.301 1.15e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.851 on 279 degrees of freedom
Multiple R-squared: 0.281, Adjusted R-squared: 0.2758
F-statistic: 54.51 on 2 and 279 DF, p-value: < 2.2e-16
Add smooth non-linear cures to the scatterplots: use existing panel.smooth()
function, and add linear regression lines to the scatterplots:
<- function (x, y, col.regres = "blue", ...)
panel.linear
{ points(x, y, pch=19)
<- is.finite(x) & is.finite(y)
ok if (any(ok))
abline(stats::lm(y[ok] ~ x[ok]), col = col.regres, ...)
}
pairs(data.frame(mir.107, prt.BRAF, rna.BRAF),
lower.panel = panel.smooth,
upper.panel = panel.linear)
(b) Lasso-penalised linear model
with all miRNAs (Aure et al. 2015, equation (4))
library(glmnet)
# 10-fold CV to determine the optimal lambda:
# Note: rna.BRAF is penalised together with all the mir variables.
# You can use the penalty.factor option to avoid this.
set.seed(1234)
<- cv.glmnet(y=prt.BRAF, x=t(rbind(mir, rna.BRAF)),
cvfit alpha=1, nfolds=10, standardize=TRUE)
par(mfrow=c(1,1))
plot(cvfit)
<- cvfit$lambda.min
lambda.opt
# Coefficient path plot and coefficients for optimal lambda:
<- cvfit$glmnet.fit
fitB
plot(fitB, xvar="lambda")
abline(v=log(lambda.opt))
coef(fitB, s=lambda.opt)
predict(fitB, type="nonzero", s=lambda.opt)
Compare the regression coefficient of mir.107 from the models in (a) and (b):
coef(fitA)["mir.107"]
as.matrix(coef(fitB, s=cvfit$lambda.min))["hsa-miR-107",]
Task 3
Repeat the lasso analysis, but this time do not penalise the rna.BRAF
variable together with the mir variables.
Check out the information on the penalty.factor
option in ?glmnet
to understand how.