Metagenomics

Overview & Terminology

collected genomic information of a sample
unbiased targeting (no special marker like metabarcoding)
Metabarcoding (16S) is used for large number of samples, i.e., multiple patients, longitudinal studies, etc. but offers limited taxonomical and functional resolution in comparision
Whole Metagenome sequencing (WMS), or shotgun metagenome sequencing, provides insight into community biodiversity and function in contrast to Metabarcoding, where only a specific region of the bacterial community (the 16s rRNA) is sequenced
WMS aims at sequencing all the genomic material present in the environment
WMS is more expensive but offers increased resolution, and allows the discovery of viruses as well as other mobile genetic elements

Microbiota organism within a set environment mainly focused on bacteria * exluding viruses & mobile elements

Microbiome includes all factors of an environment (organism, genetic elements, metabolites, proteins)

Holobiont assemblages of different species that form a ecological unit divided into: virome, microbiome and macrobiological hosts

Whole Metagenome Sequencing - practical

comparing samples from the Pig Gut Microbiome against the Human Gut Microbiome

Software used: - FastQC - Kraken - R - Pavian

Data used:

curl -O -J -L https://osf.io/h9x6e/download
tar xvf subset_wms.tar.gz

we ssh using additionally the -X flag to allow a graphical interface
we used fastqc do check the quality of our data

1. Taxonomic read assignment with `Kraken`

first we need the Kraken database

#create a databases directory and download the database
wget https://ccb.jhu.edu/software/kraken/dl/minikraken_20171019_4GB.tgz
tar xzf minikraken_20171019_4GB.tgz

running kraken on the reads
produces a tab-delimited file with an assigned TaxID for each read

# for the script you need to declare the KRAKEN_DB path
KRAKEN_DB="/mnt/databases/minikraken_20171013_4GB"
for i in *_1.fastq
do
    prefix=$(basename $i _1.fastq)
    # print which sample is being processed
    echo $prefix
    kraken --db $KRAKEN_DB --threads 2 --fastq-input ${prefix}_1_corr.fastq ${prefix}_2_corr.fastq > /home/student/wms/results/${prefix}.tab
    kraken-report --db $KRAKEN_DB /home/student/wms/results/${prefix}.tab > /home/student/wms/results/${prefix}_tax.txt
done

2. Visualization with Pavian in R

Installing and run Pavian in R with those commands:

#one time only installations and setup
options(repos = c(CRAN = "http://cran.rstudio.com"))
if (!require(remotes)) { install.packages("remotes") }
remotes::install_github("fbreitwieser/pavian")
#Start up command
pavian::runApp(port=5000)

creates a new R-tab were you can open the kraken files
creates a nice sankey chart of your organism within your sample
its scalable, allows comparision between samples (as a table) and indepth analysis of samples (see figure)

Example results in one sample as a sankey chart using pavian

KRONA chart is also for visualization of metagenomic data
good for explanation of your sample
but PAVIAN is way better

Metagenome assembly and binning - practical

AIM: inspect and assemble metagenomic data and retrieve draft genomes

using a mock community of 20 bacteria simulating Illumina HiSeq created with InSilicoSeq
selected from the Tara Ocean study that recovered 957 distinct Metagenome-assembled-genomes (or MAGs) that were previsouly unknown

1. Correction and assembly of the illumina reads

check the fastq data with fastqc
do sickle and scythe as needed; see HTS
assembly here with megahit

sickle pe -f tara_reads_R1.fastq.gz -r tara_reads_R2.fastq.gz -t sanger -o tara_trimmed_R1.fastq -p tara_trimmed_R2.fastq -s /dev/null
megahit -1 tara_trimmed_R1.fastq -2 tara_trimmed_R2.fastq -o tara_assembly

2. Binning

First map the reads back against the assembly (for coverage information)
bowtie2 is a ultra fast short read mapper/aligner

ln -s tara_assembly/final.contigs.fa . #link contigs to current folder
bowtie2-build final.contigs.fa final.contigs #index
bowtie2 -x final.contigs -1 tara_trimmed_R1.fastq -2 tara_trimmed_R2.fastq | samtools view -bS -o tara_to_sort.bam #alignment and bam conversion
samtools sort tara_to_sort.bam -o tara.bam
samtools index tara.bam

then we run metabat

runMetaBat.sh -m 1500 final.contigs.fa tara.bam
mv final.contigs.fa.metabat-bins1500 metabat

3. QC of the bins

first time you run checkm you have to create the database

sudo checkm data setRoot ~/.local/data/checkm

checkm lineage_wf -x fa metabat checkm/
checkm bin_qa_plot -x fa checkm metabat plots

4. Get organism information and plot it

#tells you which organism, saves under checkm/lineage needs folder checkm
checkm qa checkm/lineage.ms checkm
#plots its
checkm bin_qa_plot -x fa checkm metabat plots

you get something like this, which should represent 10 different species (theoretically):
now you can annotate each bin (fasta files are in metabat dir)
more informations can be found here and here

5. Barrnap

BAsic Rapid Ribosomal RNA Predictor
isnt tsuitable for novel stuff since it has to be in the database, but seems still to work (see figure above of unknown marine bacteria)
you can download it here

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search