lineagespot 1.2.0
lineagespot
is a framework written in R, and aims to identify
SARS-CoV-2 related mutations based on a single (or a list) of variant(s)
file(s) (i.e., variant calling format). The method can facilitate the
detection of SARS-CoV-2 lineages in wastewater samples using next
generation sequencing, and attempts to infer the potential distribution
of the SARS-CoV-2 lineages.
lineagespot
is distributed as a Bioconductor
package and requires R
(version “4.1”), which can be installed on any
operating system from CRAN, and
Bioconductor (version “3.14”).
To install lineagespot
package enter the following commands in
your R
session:
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("lineagespot")
## Check that you have a valid Bioconductor installation
BiocManager::valid()
Example fastq files are provided through zenodo. For the pre processing steps of them, the bioinformatics analysis pipeline is provided here.
Once lineagespot
is successfully installed, it can be loaded as follow:
library(lineagespot)
lineagespot
can be run by calling one function that implements the overall
pipeline:
results <- lineagespot(vcf_folder = system.file("extdata", "vcf-files",
package = "lineagespot"),
gff3_path = system.file("extdata",
"NC_045512.2_annot.gff3",
package = "lineagespot"),
ref_folder = system.file("extdata", "ref",
package = "lineagespot"))
The function returns three tables:
# overall table
head(results$variants.table)
#> CHROM POS ID REF ALT DP AD_ref AD_alt
#> 1: NC_045512.2 328 NC_045512.2;328;ACA;ACCA ACA ACCA 36 34 1
#> 2: NC_045512.2 355 NC_045512.2;355;C;T C T 42 41 1
#> 3: NC_045512.2 366 NC_045512.2;366;C;T C T 42 28 14
#> 4: NC_045512.2 401 NC_045512.2;401;CTTAA;CTAA CTTAA CTAA 37 35 2
#> 5: NC_045512.2 406 NC_045512.2;406;AGA;AA AGA AA 35 34 1
#> 6: NC_045512.2 421 NC_045512.2;421;C;A C A 35 34 1
#> Gene_Name Nt_alt AA_alt AF codon_num sample
#> 1: ORF1a 64dupC Q22fs 0.02777778 21 SampleA_freebayes_ann
#> 2: ORF1a 90C>T G30G 0.02380952 30 SampleA_freebayes_ann
#> 3: ORF1a 101C>T S34F 0.33333333 34 SampleA_freebayes_ann
#> 4: ORF1a 138delT D48fs 0.05405405 46 SampleA_freebayes_ann
#> 5: ORF1a 142delG D48fs 0.02857143 47 SampleA_freebayes_ann
#> 6: ORF1a 156C>A G52G 0.02857143 52 SampleA_freebayes_ann
# lineages' hits
head(results$lineage.hits)
#> Gene_Name AA_alt sample DP AD_alt AF lineage
#> 1: M I82T SampleC_freebayes_ann 3984 2770 0.6952811 AY.1
#> 2: N D63G SampleC_freebayes_ann 2180 787 0.3610092 AY.1
#> 3: N R203M SampleC_freebayes_ann 4147 4125 0.9946950 AY.1
#> 4: N G215C SampleC_freebayes_ann 4477 2574 0.5749386 AY.1
#> 5: N D377Y SampleC_freebayes_ann 4271 1623 0.3800047 AY.1
#> 6: ORF1a A1306S SampleC_freebayes_ann 2202 1267 0.5753860 AY.1
# lineagespot report
head(results$lineage.report)
#> lineage sample meanAF meanAF_uniq minAF_uniq_nonzero N
#> 1: AY.1 SampleA_freebayes_ann 0.08333333 0.0000000 NA 1
#> 2: AY.1 SampleB_freebayes_ann 0.08333333 0.0000000 NA 1
#> 3: AY.1 SampleC_freebayes_ann 0.43162568 0.0000000 NA 6
#> 4: AY.2 SampleA_freebayes_ann 0.07692308 0.0000000 NA 1
#> 5: AY.2 SampleB_freebayes_ann 0.07692308 0.0000000 NA 1
#> 6: AY.2 SampleC_freebayes_ann 0.33117826 0.1198191 0.1594335 4
#> lineage N. rules lineage prop.
#> 1: 31 0.03225806
#> 2: 31 0.03225806
#> 3: 31 0.19354839
#> 4: 29 0.03448276
#> 5: 29 0.03448276
#> 6: 29 0.13793103
Here is the output of sessionInfo()
on the system on which this document was
compiled running pandoc 2.5
:
#> R version 4.2.1 (2022-06-23)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.5 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas.so
#> LAPACK: /home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack.so
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] lineagespot_1.2.0 RefManageR_1.4.0 BiocStyle_2.26.0
#>
#> loaded via a namespace (and not attached):
#> [1] bitops_1.0-7 matrixStats_0.62.0
#> [3] lubridate_1.8.0 bit64_4.0.5
#> [5] filelock_1.0.2 progress_1.2.2
#> [7] httr_1.4.4 GenomeInfoDb_1.34.0
#> [9] tools_4.2.1 backports_1.4.1
#> [11] bslib_0.4.0 utf8_1.2.2
#> [13] R6_2.5.1 DBI_1.1.3
#> [15] BiocGenerics_0.44.0 tidyselect_1.2.0
#> [17] prettyunits_1.1.1 bit_4.0.4
#> [19] curl_4.3.3 compiler_4.2.1
#> [21] cli_3.4.1 Biobase_2.58.0
#> [23] xml2_1.3.3 DelayedArray_0.24.0
#> [25] rtracklayer_1.58.0 bookdown_0.29
#> [27] sass_0.4.2 rappdirs_0.3.3
#> [29] stringr_1.4.1 digest_0.6.30
#> [31] Rsamtools_2.14.0 rmarkdown_2.17
#> [33] XVector_0.38.0 pkgconfig_2.0.3
#> [35] htmltools_0.5.3 bibtex_0.5.0
#> [37] MatrixGenerics_1.10.0 BSgenome_1.66.0
#> [39] dbplyr_2.2.1 fastmap_1.1.0
#> [41] rlang_1.0.6 RSQLite_2.2.18
#> [43] jquerylib_0.1.4 BiocIO_1.8.0
#> [45] generics_0.1.3 jsonlite_1.8.3
#> [47] BiocParallel_1.32.0 dplyr_1.0.10
#> [49] VariantAnnotation_1.44.0 RCurl_1.98-1.9
#> [51] magrittr_2.0.3 GenomeInfoDbData_1.2.9
#> [53] Matrix_1.5-1 Rcpp_1.0.9
#> [55] S4Vectors_0.36.0 fansi_1.0.3
#> [57] lifecycle_1.0.3 stringi_1.7.8
#> [59] yaml_2.3.6 SummarizedExperiment_1.28.0
#> [61] zlibbioc_1.44.0 plyr_1.8.7
#> [63] BiocFileCache_2.6.0 grid_4.2.1
#> [65] blob_1.2.3 parallel_4.2.1
#> [67] crayon_1.5.2 lattice_0.20-45
#> [69] Biostrings_2.66.0 GenomicFeatures_1.50.0
#> [71] hms_1.1.2 KEGGREST_1.38.0
#> [73] knitr_1.40 pillar_1.8.1
#> [75] GenomicRanges_1.50.0 rjson_0.2.21
#> [77] codetools_0.2-18 biomaRt_2.54.0
#> [79] stats4_4.2.1 XML_3.99-0.12
#> [81] glue_1.6.2 evaluate_0.17
#> [83] data.table_1.14.4 BiocManager_1.30.19
#> [85] png_0.1-7 vctrs_0.5.0
#> [87] assertthat_0.2.1 cachem_1.0.6
#> [89] xfun_0.34 restfulr_0.0.15
#> [91] tibble_3.1.8 GenomicAlignments_1.34.0
#> [93] AnnotationDbi_1.60.0 memoise_2.0.1
#> [95] IRanges_2.32.0 ellipsis_0.3.2