sociogenomics2022

Lab week 8. Working with summary statistics

Manhattan Plots, QQplots and genetic correlations matrices

Alternative plots using package manhattanly

This first part of the lab will use Rstudio.

Use R to download the summary statistics

download.file("http://ssgac.org/documents/EduYears_Main.txt.gz", dest="EA2_results.txt.gz")

Import the summary statistics in R

EAgwasResults<-read.table("EA2_results.txt.gz", header=T)

head(EAgwasResults)
dim(EAgwasRsults)

This is a very large file, we can spped up things by selecting ony SNPS with Pvalue <0.005

EAgwasResults<-subset(EAgwasResults, Pval<0.0005)
dim(EAgwasResults)

We can use a library created to plot manhattan plots Load the manhattanly library

install.packages("manhattanly")
library(manhattanly)

Create Manhattan plot

help(manhattanly)
manhattanly(EAgwasResults, snp = "MarkerName" , bp="POS", p="Pval", chr="CHR")

We can use the same library to plot QQplots

qqly(EAgwasResults, snp = "MarkerName" , bp="POS", p="Pval", chr="CHR")

PLotting genetic correlations in R

import data on genetic correlation

data_rg<-read.table("LD-Hub_genetic_correlation_example.txt",fill =T, sep="\t", header=T, quote="") 

draw heatmap

install.packages("ggplot2")
library(ggplo2)
ggplot(data = data_rg, aes(Trait1, Trait2, fill = rg))+
    geom_tile(color = "white")+
    scale_fill_gradient2(low = "blue", high = "red", mid = 
                                 "white",  midpoint = 0, limit = 
            c(-1.1,1.1), space = "Lab",
              name="Genetic\nCorrelation") +
      theme_minimal()+ 
    theme(axis.text.x = element_text(angle = 45, vjust = 1, 
          size = 8, hjust = 1))+
 coord_fixed()

Locus zoom

https://my.locuszoom.org

Look at results for BMI

https://my.locuszoom.org/gwas/236887/ LOOK at SNP rs9939609 https://www.snpedia.com/index.php/Rs9939609

search at 16:53820527

Look for Genetic correlation from the ATLAS

https://atlas.ctglab.nl

The next commands show how to install it in google shell

First, some data cleaning to free some space on the google shell account

cd /home/nicola_barban/Sociogenomics/Data/
rm *.*

cd /home/nicola_barban/Sociogenomics/Results/
rm *.*

This requires time and installs miniconda (which is a python distribution) in the system

cd /home/nicola_barban/Sociogenomics/Software
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh
bash ~/miniconda.sh -b -p 
rm ~/miniconda.sh
source $HOME/miniconda3/bin/activate

This installs LDSC in your system

cd /home/nicola_barban/Sociogenomics/Software
git clone https://github.com/bulik/ldsc.git


cd ldsc

Congratulations you have installed LDSC now you have to install some packeges that LDSC uses for its calculations

conda env create --file environment.yml
source activate ldsc

Now, let’s check if everything works

./ldsc.py -h
./munge_sumstats.py -h

This is the LDSC command to estimate heritability. We need the following files:

Giant_Height2018.txt.gz
w_hm3.snplist

cd /home/nicola_barban/Sociogenomics

First we need to “munge” summary statistics, that is to make ti comparable with reference data that have info on LD scores

##This takes a lot of time. you can skip this and find the prepared data in the folder

python Software/ldsc/munge_sumstats.py \
  --sumstats Data/Giant_Height2018.txt.gz \
  --snp SNP \
  --N-col N \
  --a1 Tested_Allele \
  --a2 Other_Allele \
  --p P \
  --frq Freq_Tested_Allele_in_HRS \
  --out Results/height \
  --merge-alleles Data/w_hm3.snplist

The folder eur_w_ld_chr/ contains the LD scores that are merged with our summary statistics

python Software/ldsc/ldsc.py \
 --h2 Data/height.sumstats.gz \
 --ref-ld-chr Data/eur_w_ld_chr/ \
 --w-ld-chr Data/eur_w_ld_chr/ \
 --out Results/height_h2

Now we can try to estimate genetic correlation between height and educational attainment.

First, we need to prepare data for Educational Attainment