Loading a ChAS export file and do a bit of cleaning
ChAS exports files contain only basic information about the copy number (gain,
loss or LOH), plus the segment may overlap the centromere.
When the file is loaded by OncoscanR (load_chas
function), all segments are
assigend to a chromosomal arms and split if necessary.
The LOH segments are given by ChAS independently of the copy number variation
segments. Therefore one may have a LOH segment overlapping with a copy loss. As
this information is redundant (a copy loss will always have a LOH), we need to
trim and split these LOH with the adjust_loh
function.
library(magrittr)
# Load the ChAS file and assign subtypes.
chas.fn <- system.file("extdata", "chas_example.txt", package = "oncoscanR")
segments <- load_chas(chas.fn, oncoscan_na33.cov)
# Clean the segments: restricted to Oncoscan coverage, LOH not overlapping
# with copy loss segments, smooth&merge segments within 300kb and prune
# segments smaller than 300kb.
segs.clean <- trim_to_coverage(segments, oncoscan_na33.cov) %>%
adjust_loh() %>%
merge_segments() %>%
prune_by_size()
Of note, the oncoscan_na33.cov
objects contains the genomic coverage of the
oncoscan assay (start/end for each chromosomal arm, hg19). One could re-compute
the latter by downloading the annotation file from the ThermoFisher website
and process it with the get_oncoscan_coverage_from_probes
function.
Computation of arm-level alteration
Function armlevel_alt
An arm is declared globally altered if more than 80% of its bases are altered
with a similar CNV type (amplifications [3 extra copies or more], gains [1-2
extra copies], losses or copy-neutral losses of heterozygozity [LOH]). For
instance, “gain of 3p” indicates that there is more than 80% of arm with 3
copies but less than 80% with 5 (otherwise it would be an amplification). Prior
to computation, segments of same copy number and at a distance <300Kbp (Oncoscan
resolution genome-wide) are merged. The remaining segments are filtered to a
minimum size of 300Kbp.
For instance if we want to get all arms that have a global LOH alteration, we
run:
chas.fn <- system.file("extdata", "triploide_gene_list_full_location.txt",
package = "oncoscanR")
segments <- load_chas(chas.fn, oncoscan_na33.cov)
armlevel.loh <- get_loh_segments(segments) %>%
armlevel_alt(kit.coverage = oncoscan_na33.cov)
The variable armlevel.loh
is a named vector containing the arms that have
percentage of base with LOH above the threshold (90%). To obtain the percentage
of LOH bases in all arms, one could set the threshold to zero:
armlevel.loh <- get_loh_segments(segments) %>%
armlevel_alt(kit.coverage = oncoscan_na33.cov, threshold = 0)
Global level of alteration
Several functions are available to perform such computation:
score_avgcn
: compute the average copy number across the genome
score_estwgd
: computes an estimation of the number of whole-genome doubling
events
score_mbalt
: computes the total number of Mbp that have an alteration (w/o
LOH segments)
mbalt <- score_mbalt(segments, oncoscan_na33.cov)
percent.alt <- mbalt['sample']/mbalt['kit']
message(paste(mbalt['sample'], 'Mbp altered ->', round(percent.alt*100),
'% of genome'))
#> 2503 Mbp altered -> 88 % of genome
avgcn <- score_avgcn(segments, oncoscan_na33.cov)
wgd <- score_estwgd(segments, oncoscan_na33.cov)
message(paste('Average copy number:', round(avgcn, 2), '->', wgd['WGD'],
'whole-genome doubling event'))
#> Average copy number: 3.25 -> 1 whole-genome doubling event
HRD scores
The package contains several HRD scores described below.
Score LST
Function score_lst
Procedure based on the paper from Popova et al, Can. Res. 2012 (PMID: 22933060).
First segments smaller than 3Mb are removed, then segments are smoothed with
respect to copy number at a distance of 3Mb.
The number of LSTs is the number of breakpoints (breakpoints closer than 3Mb are
merged) that have a segment larger or equal to 10Mb on each side. This score was
linked to BRCA1/2-deficient tumors.
Score HR-LOH
Function score_loh
Procedure based on the paper from Abkevich et al., Br J Cancer 2012 (PMID:
23047548).
Number of LOH segments larger than 15Mb but excluding segments on chromosomes
with a global LOH alteration. This score was linked to BRCA1/2-deficient tumors.
Score nLST
Function score_nlst
HRD score developed at HUG and based on the LST score by Popova et al. but
normalized by an estimation of the number of whole-genome doubling events.
Of note, copy-neutral LOH segments are removed before computation.
nLST = LST - 7*W/2
where W
is the number of whole-genome doubling events.
The score is positive if there are at least 15 nLST.
Score gLOH
Function score_gloh
The percentage genomic LOH score is computed as described in the FoundationFocus
CDx BRCA LOH assay; i.e. the percentage of bases covered by the Oncoscan that
display a loss of heterozygosity independently of the number of copies,
excluding chromosomal arms that have a global LOH (>=90 arm length). To compute
with the armlevel_alt function on LOH segments only). This score was linked to
BRCA1/2-deficient tumors.
Example
First we need to load and clean the ChAS export file (from a female patient). We
adjust the Oncoscan coverage to exclude the 21p arm as it is only partially
covered.
# Load data
chas.fn <- system.file("extdata", "LST_gene_list_full_location.txt",
package = "oncoscanR")
segments <- load_chas(chas.fn, oncoscan_na33.cov)
# Clean the segments: restricted to Oncoscan coverage, LOH not overlapping
# with copy loss segments, smooth&merge segments within 300kb and prune
# segments smaller than 300kb.
segs.clean <- trim_to_coverage(segments, oncoscan_na33.cov) %>%
adjust_loh() %>%
merge_segments() %>%
prune_by_size()
# Then we need to compute the arm-level alteration for loss and LOH since many
# scores discard arms that are globally altered.
arms.loss <- names(get_loss_segments(segs.clean) %>%
armlevel_alt(kit.coverage = oncoscan_na33.cov))
arms.loh <- names(get_loh_segments(segs.clean) %>%
armlevel_alt(kit.coverage = oncoscan_na33.cov))
# Get the number of LST
lst <- score_lst(segs.clean, oncoscan_na33.cov)
# Get the number of HR-LOH
hrloh <- score_loh(segs.clean, arms.loh, arms.loss, oncoscan_na33.cov)
# Get the genomic LOH score
gloh <- score_gloh(segs.clean, arms.loh, arms.loss, oncoscan_na33.cov)
# Get the number of nLST
wgd <- score_estwgd(segs.clean, oncoscan_na33.cov) # Get the avg CN, including 21p
nlst <- score_nlst(segs.clean, wgd["WGD"], oncoscan_na33.cov)
print(c(LST=lst, `HR-LOH`=hrloh, gLOH=gloh, nLST=nlst))
#> LST HR-LOH gLOH nLST.nLST
#> "26" "25" "0.411605161891022" "22.5"
#> nLST.HRD
#> "Positive"
TDplus score
function score_td
Procedure based on the paper from Popova et al., Cancer Res 2016 (PMID:
26787835). The TDplus score is defined as the number of regions larger than 1Mb
but smaller or equal to 10Mb with a gain of one or two copies. This score was
linked to CDK12-deficient tumors.
They also identified as second category of tandem duplication whose size is
smaller or equal than 1Mb and around 300Kb but could not link it to a phenotype.
Note that due to its resolution the Oncoscan assay will most likely miss this
second category. Nonetheless it is reported by the function but not by the
standard workflow.
# Load data
chas.fn <- system.file("extdata", "TDplus_gene_list_full_location.txt",
package = "oncoscanR")
segments <- load_chas(chas.fn, oncoscan_na33.cov)
# Clean the segments: restricted to Oncoscan coverage, LOH not overlapping
# with copy loss segments, smooth&merge segments within 300kb and prune
# segments smaller than 300kb.
segs.clean <- trim_to_coverage(segments, oncoscan_na33.cov) %>%
adjust_loh() %>%
merge_segments() %>%
prune_by_size()
td <- score_td(segs.clean)
print(td$TDplus)
#> [1] 93
Main workflow (as used at the Geneva University Hospitals)
The main workflow used for routine analysis can be launched either in R via the
workflow_oncoscan.run(chas.fn, gender)
function or via the script
bin/run_oncoscan_workflow.R
:
Usage:
Rscript path_to_oncoscanR_package/bin/oncoscan-workflow.R CHAS_FILE
- CHAS_FILE
: Path to the text export file from ChAS or a compatible text file.
The script will output a JSON string into the terminal with all the computed
information. :
{
"armlevel": {
"AMP": [],
"LOSS": ["17p", "2q", "4p"],
"LOH": ["14q", "5q", "8p", "8q"],
"GAIN": [19p", "19q", "1q", "20p", "20q", "3q", "5p", "6p", "9p", "9q",
"Xp", "Xq"]
},
"scores": {
"HRD": "Negative, nLST=12",
"TDplus": 22,
"avgCN": "2.43"
},
"file": "H19001012_gene_list_full_location.txt"
}
Or to launch the workflow within R:
segs.filename <- system.file('extdata', 'LST_gene_list_full_location.txt',
package = 'oncoscanR')
dat <- workflow_oncoscan.run(segs.filename)
message(paste('Arms with copy loss:',
paste(dat$armlevel$LOSS, collapse = ', ')))
#> Arms with copy loss: 15q
message(paste('Arms with copy gains:',
paste(dat$armlevel$GAIN, collapse = ', ')))
#> Arms with copy gains: 11q, 12p, 12q, 16p, 1q, 20p, 20q, 21q, 2p, 4p, 5p, 6p, 6q, 7p, 7q, 8q, 9q
message(paste('HRD score:', dat$scores$HRD))
#> HRD score: Positive, nLST=22.5
Please read the manual for a description of all available R functions.