SARS-CoV2_ARTIC_Illumina新冠病毒分型和突变分析
一. 本文适用于使用Artic扩增子扩增,Illumina双端测序,用于分析新冠病毒突变及分型鉴定
二. 概览:按照惯例,先上一张概览图
流程输入 | SRR22216743_1.fastq.gz SRR22216743_2.fastq.gz 测试数据下载 参考文件,默认路径/opt/ref下 GCF_009858895.2_ASM985889v3_genomic.gff GFF文件 分析流程文件(可一键导入sliverworkspace运行)及报告文件,conda环境文件下载,导入操作 |
---|---|
运行环境 | docker image based on ubuntu21.04 Conda Mamba(默认使用清华源) ssh |
分析软件 | - fastp=0.23.2 |
输出结果 | multiqc_report.html 测序数据trim前后质量数据 |
环境搭建: 为了快速完成环境搭建,节省95%以上时间。
本文使用docker + conda (mamba) 作为基础分析环境,镜像获取:docker/docker-compoes 的安装及镜像构建见《基于docker的生信基础环境镜像构建》,docker镜像基于ubuntu21.04构建,并安装有conda/mamba,ssh服务。并尝试初次运行时初始化安装所需软件下载所需文件(作为代价首次运行时间会较长,切需网络通畅),即实现自动初始化的分析流程。
备注:docker运行的操作系统,推荐为Linux,windows,macOS系统改下docker可能部分功能(网络)不能正常运行
<code style="margin-left:0"> # 拉取docker镜像 docker pull doujiangbaozi/sliverworkspace:latest # 查看docker 镜像 docker images</code>
基础环境配置,docker-compose.yml 配置文件,可以根据需要自行修改调整
<code style="margin-left:0"> version: "3" services: SarsCov2: image: doujiangbaozi/sliverworkspace:latest container_name: SarsCov2 volumes: - /media/sliver/Data/data:/opt/data:rw #挂载原始数据,放SC2目录下 - /media/sliver/Manufacture/SC2/envs:/root/mambaforge-pypy3/envs:rw #挂载envs conda环境目录 - /media/sliver/Manufacture/SC2/config:/opt/config:rw #挂载config conda配置文件目录 - /media/sliver/Manufacture/SC2/ref:/opt/ref:rw #挂载reference目录 - /media/sliver/Manufacture/SC2/result:/opt/result:rw #挂载中间文件和输出结果目录 ports: - "9024:9024" #ssh连接端口可以按需修改 environment: - TZ=Asia/Shanghai #设置时区 - PS=20191124 #修改默认ssh密码 </code>
基础环境运行
<code style="margin-left:0"> # docker-compose.yml 所在目录下运行 docker-compose up -d # 或者 docker-compose up -d -f /路径/docker-compose.yaml # 查看docker是否正常运行,docker-compose.yaml目录下运行 docker-compose ps # 或者 docker ps</code>
docker 容器使用,类似于登录远程服务器
<code style="margin-left:0"> # 登录docker,使用的是ssh服务,可以本地或者远程部署使用 ssh root@192.168.6.6 -p9024 # 看到如下,显示如下提示即正常登录 (base) root@SliverWorkstation:~# </code>
三. 分析流程
1. 变量设置
<code style="margin-left:0"> #样本编号 export sn=SRR22216743 #数据输入目录 export data=/opt/data #数据输出、中间文件目录 export result=/opt/result #conda安装的环境目录 export envs=/root/mambaforge-pypy3/envs #artic primer 版本V1,V2,V3,V4,V4.1 export artic_primer_version=4.1 #设置可用线程数 export threads=8</code>
2. 分析前QC,看下数据质量
<code style="margin-left:0"> #conda检测环境是否存在,首次运行不存在创建该环境并安装软件 if [ ! -d "${envs}/qc" ]; then mamba env create -f /opt/config/qc.yaml fi conda activate qc mkdir -p ${result}/${sn}/clean mkdir -p ${result}/${sn}/qc fastqc ${data}/SC2/${sn}_1.fastq.gz ${data}/SC2/${sn}_2.fastq.gz -o ${result}/${sn}/qc fastp -i ${data}/SC2/${sn}_1.fastq.gz -I ${data}/SC2/${sn}_2.fastq.gz \ -o ${result}/${sn}/clean/${sn}_1_clean.fastq.gz -O ${result}/${sn}/clean/${sn}_2_clean.fastq.gz \ -h ${result}/${sn}/qc/${sn}_fastp.html -j ${result}/${sn}/qc/${sn}_fastp.json fastqc ${result}/${sn}/clean/${sn}_1_clean.fastq.gz ${result}/${sn}/clean/${sn}_2_clean.fastq.gz \ -o ${result}/${sn}/qc #汇总一下之前结果,得到一个总体报告 multiqc ${result}/${sn}/qc/ -f -o ${result}/${sn}/qc conda activate</code>
3. 比对到参考基因组上,得到bam文件并排序
<code style="margin-left:0"> mkdir -p ${result}/${sn}/aligned if [ ! -d "/opt/ref/artic-ncov2019" ]; then apt-get install -y git git clone https://github.com/artic-network/artic-ncov2019.git "/opt/ref/artic-ncov2019" fi #conda检测环境是否存在,首次运行不存在创建该环境并安装软件 if [ ! -d "${envs}/SC2" ]; then mamba env create -f /opt/config/SC2.yaml fi conda activate SC2 #如果没有索引,创建索引 if [ ! -f /opt/ref/artic-ncov2019/primer_schemes/nCoV-2019/${artic_primer_version}/nCoV-2019.reference.fasta.amb ] || [ ! -f /opt/ref/artic-ncov2019/primer_schemes/nCoV-2019/${artic_primer_version}/nCoV-2019.reference.fasta.ann ] || [ ! -f /opt/ref/artic-ncov2019/primer_schemes/nCoV-2019/${artic_primer_version}/nCoV-2019.reference.fasta.bwt ] || [ ! -f /opt/ref/artic-ncov2019/primer_schemes/nCoV-2019/${artic_primer_version}/nCoV-2019.reference.fasta.pac ] || [ ! -f /opt/ref/artic-ncov2019/primer_schemes/nCoV-2019/${artic_primer_version}/nCoV-2019.reference.fasta.sa ]; then if [ ! -f /opt/ref/artic-ncov2019/primer_schemes/nCoV-2019/${artic_primer_version}/nCoV-2019.reference.fasta ]; then cp -f /opt/ref/artic-ncov2019/primer_schemes/nCoV-2019/${artic_primer_version}/SARS-CoV-2.reference.fasta \ /opt/ref/artic-ncov2019/primer_schemes/nCoV-2019/${artic_primer_version}/nCoV-2019.reference.fasta fi bwa index /opt/ref/artic-ncov2019/primer_schemes/nCoV-2019/${artic_primer_version}/nCoV-2019.reference.fasta fi bwa mem -t ${threads} \ /opt/ref/artic-ncov2019/primer_schemes/nCoV-2019/${artic_primer_version}/nCoV-2019.reference.fasta \ ${result}/${sn}/clean/${sn}_1_clean.fastq.gz ${result}/${sn}/clean/${sn}_1_clean.fastq.gz | \ samtools sort | samtools view -F 4 -o ${result}/${sn}/aligned/${sn}.sorted.bam samtools index ${result}/${sn}/aligned/${sn}.sorted.bam mv ${result}/${sn}/aligned/${sn}.sorted.bam.bai ${result}/${sn}/aligned/${sn}.sorted.bai conda deactivate</code>
4. 去除artic primer (primer trim)
<code style="margin-left:0"> conda activate SC2 if [ ! -f /opt/ref/artic-ncov2019/ARTIC-${artic_primer_version}.bed ]; then if [ ! -f /opt/ref/artic-ncov2019/primer_schemes/nCoV-2019/${artic_primer_version}/nCoV-2019.scheme.bed ]; then cp -r /opt/ref/artic-ncov2019/primer_schemes/nCoV-2019/${artic_primer_version}/SARS-CoV-2.scheme.bed \ /opt/ref/artic-ncov2019/primer_schemes/nCoV-2019/${artic_primer_version}/nCoV-2019.scheme.bed fi perl -ne 'my @x=split m/\t/; print join("\t",@x[0..3], 60, $x[3]=~m/LEFT/?"+":"-"),"\n";' \ < /opt/ref/artic-ncov2019/primer_schemes/nCoV-2019/${artic_primer_version}/nCoV-2019.scheme.bed \ > /opt/ref/artic-ncov2019/ARTIC-${artic_primer_version}.bed fi ivar trim -e -i ${result}/${sn}/aligned/${sn}.sorted.bam \ -b /opt/ref/artic-ncov2019/ARTIC-${artic_primer_version}.bed \ -p ${result}/${sn}/aligned/${sn}.primertrim conda deactivate</code>
5. primer trim 排序
<code style="margin-left:0"> conda activate SC2 samtools sort ${result}/${sn}/aligned/${sn}.primertrim.bam \ -o ${result}/${sn}/aligned/${sn}.primertrim.sorted.bam conda deactivate</code>
6. 获取拼接后一致性序列
<code style="margin-left:0"> conda activate SC2 samtools mpileup -A -d 1000 -B -Q 0 \ --reference /opt/ref/artic-ncov2019/primer_schemes/nCoV-2019/${artic_primer_version}/nCoV-2019.reference.fasta \ ${result}/${sn}/aligned/${sn}.primertrim.sorted.bam | ivar consensus -p ${result}/${sn}/${sn}.consensus -n N -i ${sn} grep -v ">" ${result}/${sn}/${sn}.consensus.fa | grep -o -E "C|A|T|G" | wc -l conda deactivate</code>
7. 使用Pangolin获取序列分型信息
<code style="margin-left:0"> #conda检测环境是否存在,首次运行不存在创建该环境并安装软件 if [ ! -d "${envs}/pangolin" ]; then mamba env create -f /opt/config/pangolin.yaml fi conda activate pangolin pangolin ${result}/${sn}/${sn}.consensus.fa --outdir ${result}/${sn} conda deactivate</code>
8. 获取突变信息
<code style="margin-left:0"> conda activate SC2 if [ ! -f "/opt/ref/GCF_009858895.2_ASM985889v3_genomic.gff" ]; then aria2c https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/858/895/GCF_009858895.2_ASM985889v3/GCF_009858895.2_ASM985889v3_genomic.gff.gz -d /opt/ref gzip -f -d /opt/ref/GCF_009858895.2_ASM985889v3_genomic.gff.gz fi samtools mpileup -aa -A -d 0 -B -Q 0 \ --reference /opt/ref/artic-ncov2019/primer_schemes/nCoV-2019/${artic_primer_version}/nCoV-2019.reference.fasta \ ${result}/${sn}/aligned/${sn}.primertrim.sorted.bam | \ ivar variants -p ${result}/${sn}/${sn}_variants -q 30 -t 0.01 -m 0 \ -r /opt/ref/artic-ncov2019/primer_schemes/nCoV-2019/${artic_primer_version}/nCoV-2019.reference.fasta \ -g /opt/ref/GCF_009858895.2_ASM985889v3_genomic.gff ivar filtervariants -p ${result}/${sn}/${sn}_variantsfiltered ${result}/${sn}/${sn}_variants.tsv conda deactivate</code>
9. 获取quast报告及bam覆盖度、测序深度、baseQ、mapQ等质控信息
<code style="margin-left:0"> conda activate SC2 if [ ! -f "/opt/ref/GCF_009858895.2_ASM985889v3_genomic.gff" ]; then aria2c https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/858/895/GCF_009858895.2_ASM985889v3/GCF_009858895.2_ASM985889v3_genomic.gff.gz -d /opt/ref gzip -f -d /opt/ref/GCF_009858895.2_ASM985889v3_genomic.gff.gz fi quast ${result}/${sn}/${sn}.consensus.fa \ -r /opt/ref/artic-ncov2019/primer_schemes/nCoV-2019/${artic_primer_version}/nCoV-2019.reference.fasta \ --features /opt/ref/GCF_009858895.2_ASM985889v3_genomic.gff \ --ref-bam ${result}/${sn}/aligned/${sn}.sorted.bam \ --output-dir ${result}/${sn}/quast samtools coverage ${result}/${sn}/aligned/${sn}.sorted.bam -o ${result}/${sn}/aligned/${sn}.samcov.tsv conda deactivate</code>
10. 报告(可选)
11. 使用IGV Browser查看突变信息(可选)
四. 参考链接
https://github.com/artic-network/artic-ncov2019
https://github.com/CDCgov/SARS-CoV-2_Sequencing/tree/master/protocols/BFX-UT_ARTIC_Illumina
未经允许不得转载:木盒主机 » SARS-CoV2_ARTIC_Illumina新冠病毒分型和突变分析