Page 45 - Molecular features of low-grade developmental brain tumours
P. 45
SEGA IN TSC HAVE TSC1/TSC2 BIALLELIC INACTIVATION & NO BRAF MUTATIONS
ligated to the dual indexed adaptors for Illumina sequencing. A MiSeq run was performed to quantify each library. Libraries were then pooled in equal mass and captured using the custom baitset using Agilent SureSelect hybrid capture kit. The captured libraries were then sequenced on the either the HiSeq 2500 or the HiSeq 3000 instrument.
The sequencing output was de-convoluted into individual sample reads and sorted using Picard tools 39. Reads were aligned to the reference sequence hg19 from the Human Genome Reference Consortium using bwa 40-43, and duplicate reads were identified and marked using the Picard tools. The alignments were further refined using the GATK tool for localized realignment around indel sites and recalibration of the quality scores was also performed using GATK tools 40,42,44. Mutation analysis for single nucleotide variants (SNV) was performed using MuTect v1.1.4 and annotated by Variant Effect Predictor (VEP) 45,46. Insertions and deletions were called using Indel Locator and SomaticIndelDetector tool 40,47. MuTect was run in paired mode using a CEPH sample as a normal since normal DNA samples were not available, and a germline variant filter was then applied. Variants were filtered against the 6,500 exome release of the Exome Sequencing Project (ESP) database ExAC (exclude variants seen in more than 3 normal subjects; http://exac.broadinstitute.org), 1000G and GnomAD 48,49. Variants represented at >1% in either the African-American or European- American subsets of these reference databases and not in COSMIC > 2x were considered to be germline. Variants found in BRAF were analysed using cBio (http://www.cbioportal. org) and were further assessed for functionality using 3 different in silico prediction tools: PROVEAN (http://provean.jcvi.org), SIFT (http://sift.jcvi.org) and MutationAccessor (http:// mutationassessor.org) 50-54.
A second approach was used in parallel to analyze the sequence data, with capture of read calls at all positions using SAMtools Pileup, followed by custom processing in Python and Matlab to determine base call frequency at each position in each read orientation. These data were then filtered to eliminate variant calls observed in only a single read orientation, or seen in multiple samples to exclude artifacts derived from the sequencing process. All variants observed at a frequency of >1% were directly reviewed using the Integrative Genomics Viewer, to identify bona fide variant calls and exclude sequencing or alignment artifacts 21,23,26. Potential pathogenic variants seen at frequency > 1% were also examined in the GnomAD database and the TSC LOVD database.
A minimal median read depth of 20x coverage for the coding exons of TSC1 and TSC2 was required for the samples reported here. The median read depth for coding exons of TSC1 and TSC2 was a median of 107 (range 20 – 1120) among the 31 samples.
LOH was assessed using two allele frequencies: 1) at the site of mutation, using Unix grep to precisely quantify mutant vs. wild-type reads for indel mutations; and 2) at all SNPs identified in the TSC1 and TSC2 genes that had a population allele frequency of > 0.05% in the GnomAD database. If either the mutant allele frequency for the mutation was > 55%, or the median SNP minor allele frequency for TSC1/TSC2 was < 40%, this was considered evidence of CN-LOH. LOH was assessed only in the tumour samples; normal brain tissue adjacent to the tumour, was not available.
43
2