Copy number data can be difficult to appreciate when presented in a list format or as a tab-delimited file. When projected onto a scatterplot, large scale changes can be appreciated, but smaller, gene-level changes are often missed. CNViz takes copy number data, at the probe, gene and/or segment level, and launches an interactive shiny application to visualize your sample. Loss of heterozygosity (LOH) and variant data (SNVs and short indels) can also be included.
launchCNViz
This function launches an interactive Shiny application with your copy number data. See detailed descriptions and examples for each input below.
sample_name
A character string that identifies your sample “i.e. sample123”
meta_data
Can include any information about your sample you would like. For example, you might include purity of the sample, tumor ploidy, sex of patient, diagnosis, etc. It shoud only be one row. The only value that will be relevant to the visualization is ploidy. If not included, it will be assumed to be 2.
probe_data
This should be a dataframe containing probe-level data. Therefore each row should have chromosome, corresponding gene, chromosomal starting location (hg38), chromosomal end location, and log-2 ratio. A weight column is optional, but if included should represent a level of confidence in the log-2 value. This dataframe will likely have hundreds to thousands of rows depending on how many genes you wish to include. Only one of probe_data, gene_data or segment_data is required, but having all will make the visualization more informative. A GRanges object will be accepted as well as long as gene and log2 are meta-data columns.
data(probe_data)
head(probe_data)
#> chr start end gene log2 weight
#> 1 chr1 2556608 2556788 TNFRSF14 0.1795960 0.657040
#> 2 chr1 2557689 2557869 TNFRSF14 0.2031640 0.692880
#> 3 chr1 2558285 2558525 TNFRSF14 0.1353360 0.761431
#> 4 chr1 2558849 2559029 TNFRSF14 -0.0260057 0.683930
#> 5 chr1 2559780 2560020 TNFRSF14 0.0926252 0.743339
gene_data
This is similar to probe_data, however each row corresponds to a single gene. If your gene data comes with copy number estimate instead of log-2 ratio, you can convert copy number to log-2 with log(C/2, 2)
where C is the copy number estimate. In the example below, the copy numbers were 4, 4, 4 and 0. You will notice 0 corresponds to a log-2 value of -Inf
. Do not worry about this, the function will adjust these values so they appear on the plot. If your data does not come with weight values, and you have probe data available, you can use the number of probes targeted to that gene as the weight or the sum of the weights of all probes targeted to that gene (if your probe data has a weight column). This data frame can have an LOH columnm, indicating whether or not this gene is suspected to have loss of heterozyogisity - this column should take on values TRUE or FALSE. A GRanges object will be accepted as well as long as gene and log2 are meta-data columns.
* If your probe data corresponds to raw/unadjusted values, and your gene data corresponds to values adjusted for tumor purity and/or ploidy, take note of this, and explain this to any users of the application. This is important because the log-2 values on the gene plot and probe plots may not align.
segment_data
Each row will correspond to a segment as shown below. An LOH column can be included in this dataframe as well, indicating whether or not the entire segment is suspected of having loss of heterozygosity.
variant_data
Each row will correspond to a singlue nucleotide variant (SNV) or small indel. The gene, and mutation_id (in any format) must be included. Other columns like depth and starting location (start) are optional. Any additional columns you include will be displayed in the table. If start is included, this will enable CNViz to plot the location of the mutation on the probe plot. This table can include short insertions and deletions as long as they are formatted in the same manner as SNVs - in this case, you may want to include a ref and alt column. A VRanges object will be accepted as well as long as gene is a meta-data columns.
This will launch a Shiny application. See https://rebeccagreenblatt.shinyapps.io/cnviz_example for a live example using simulated data. * Note: if any of meta_data, probe_data, gene_data, segment_data or snv_data is not included, it will be set to an empty data frame (data.frame()
). See ?launchCNViz
in the console.