|
Using
PennCNV within BeadStudio via UniversalCNVAdapter
This note describes the procedure to run PennCNV within Illumina BeadStudio software to facilitate automatic processing and visualization of CNV calls. This procedure has been only tested on 32-bit Windows XP and Windows Vista with ActivePerl 5.8 installation. Before using PennCNV within BeadStudio, one should be aware of the some of the advantages and disadvantages. The advantage is obvious: one can simply click mouse buttons and perform CNV detection and visualization in BeadStudio. The disadvantages are: (1) it is very slow: The CNV calling is implemented by exporting signal files from BeadStudio one by one, and then calling PennCNV again and again for each file, and each time reloading all necessary model files into memory, which is an extremely inefficient way to perform CNV analysis by PennCNV. (2) it does not allow flexible post-processing of CNV calling results: instead, when using PennCNV command line options the output file can be easily processed by downstream analysis scripts. (3) It does not allow family-based PennCNV calls, which represents significant enhancements to individual-based CNV calls.
In general, due to the inefficiency of running PennCNV within BeadStudio, if you have >500 samples and you only use Windows system and you do not want to wait >1 day to run CNV analysis, you are probably better off using command line in Windows shell or in Cygwin to run PennCNV, while using the BeadStudio plug-in for validating important CNV calls in chosen samples. We have now provided auxiliary programs (visualize_cnv.pl) to transform PennCNV output directly to BeadStudio bookmarks, so that you can run PennCNV in command line, then directly import the calls to Illumina Genome Viewer for visualization. The procedure for using PennCNV with BeadStudio is described below in step-by-step fashion. Make sure your computer is 32-bit computer with Windows XP
or To make sure that PennCNV work correctly in your operating system, open a command terminal (Click “Start” button in the taskbar in the lower left of your computer screen, select “Run …”, then type “cmd.exe”), then type in “perl C:\penncnv\detect_cnv.pl” to see whether the program can run successfully (a list of command line options will be printed out in the screen). If you have not done so, download the Illumina Universal CNV Adapter plugin from http://www.illumina.com/pagesnrn.ilmn?ID=229, and install the program with all default options (the default installation locations is C:\Program Files\Illumina\BeadStudio 2.0\CNVAlgorithm\UniversalCNVAdapter). Now go to the directory C:\Program Files\Illumina\BeadStudio 2.0\CNVAlgorithm\UniversalCNVAdapter, rename the UniversalCNVAdapterPlugin.dll.config file to UniversalCNVAdapterPlugin.dll.config.bak, so that we can restore to the original settings if wanted. Now we can copy a new UniversalCNVAdapterPlugin.dll.config file from C:\penncnv\extra\ to here. This configuration file contains necessary command line options for running PennCNV through BeadStudio. Note that by default, all necessary options have been selected (the PFB file is hhall.hg18.pfb, the HMM file is hhall.hmm and the GCModel file is hhall.hg18.gcmodel, the default minsnp threshold is set as 10). (The GCModel is a new feature that is still under beta-testing, but since it usually improves performance it is set as default options in Universal CNVAdapter). We can open the file by a text editor (for example, click Start -> All programs -> Accessories -> Notepad) and change some of the default options there, such as changing the file path for executable programs, and change the default PFB files from hhall.hg18.pfb to hh550.hg18.pfb if using HumanHap550 arrays. In addition, we will also have the chance to change some of the command line options before running PennCNV in BeadStudio to fine-tune the parameters. (Note that in a previous version of PennCNV, the default library files are for HumanHap550 array, yet some users are analzying HumanHap1M array without changing the default library files, so they are essentially using half the information for CNV inference. To make things simpler for users, in the new version of PennCNV, the default is set as hhall (that handles HumanHap1M, HumanHap550, HumanHap300, Human610, HumanCNV370, Human1, etc).) Now we can open a BeadStudio project file. We can use the same project file as used in the PennCNV tutorial, which contains genotyping data for 3 individuals (father, mother and autistic child) within a family. Click “Analysis” menu, then click “CNV analysis” (see below).
The CNV analysis dialogue will show up, now select “PennCNV” from the dropdown menu. See below:
We can now change some parameters if necessary. For example, by clicking the “CommandLineParams” text box, we can change the path to the detect_cnv.pl program, or change the HMM and PFB file. Note that all the parameters are separated by comma (NOT by space!!!).
Now we can click “Calculate New CNV Analysis”, a progress dialogue will show up:
It takes about 3-5 minutes to process one sample in a modern computer. If it takes >10 minutes for one sample, then clearly something is going wrong: check the computer CPU/memory usage by opening Windows Task Manager to see whether this is due to insufficient memory. The CNV call results will be given in the CNV Region Display (see below). Each colored bar represent one CNV call, while the color indicates the copy number (see legend in the upper right of the figure). If you like the generated figure, you can right click the legend area by mouse, then select “save as image…” to save a TIFF file for the CNV calls.
You can use the “scroll” button in mouse (middle button) to zoom in and out of particular genomic regions, when the cursor is located on top of a region of interest in the graph. For example, we can zoom in the CNV in chr5, and we can see that there are two CNVs (one deletion, one duplication) adjacent to each other in the father, and the deletion is inherited to offspring.
In the above calculation, we used the 10-SNP threshold for CNV calling (this means that only CNV calls containing >=10 SNPs are printed in output). Now we can try to do it again using 3-SNP threshold. We can type in a new name (for example, 3SNP) for the analysis in the “CNV Analysis Name” box, then click the textbox next to “CommandLineParams”, and change the “-minsnp,10” to “-minsnp,3” (see below).
Run the program again, we will have the following output in the “CNV region display”:
When 3-SNP threshold is used, much more CNV calls are generated. Note that by default, PennCNV only process autosomes. To handle chrX, one need to add “,-chrx” to the command line parameters and run PennCNV again from there. In the near future, we will try to modify the program to process all chromosomes simultaneously. The generated CNV calls can be further visualized along the chromosome in the Illumina Genome Viewer. Click the “Tools” menu, then click “Show Genome Viewer …” to open the Genome Viewer. Then click “View” menu, and select “CNV analysis as bookmarks” (see below). In the dropdown box, we can select the PennCNV analysis that we have just performed. It is probably a good idea to increase the default Opacity level to about 80% for easier visual identification of small CNVs in the Genome Viewer.
After clicking “OK”, we can then check the CNV calls visually to eliminate spurious calls. One example is shown below. (However, the Genome Viewer below used Human genome build 35, resulting in small discordances. I will try to replace this figure in the future by using build 36). Make sure that “Immediate Mode” checkbox is selected, or one must click “Update Plots” after selecting different samples for the plot to update.
Through visual examination of the signal intensity patterns within predicted CNV region, we may be able to gain additional confidence in CNV calls, or eliminate false positive calls due to random signal fluctuation. With the “Bookmark Viewer” (see below), we can examine the details of the CNV calls and export them as text files for further processing.
The CNV calls can be saved as a XML file for further processing, by clicking the “Save Selected Bookmark Analysis” button. However, when you have a lot of samples, it is much easier and more informative to run PennCNV directly with command line and save the output files.
The
PennCNV program also generates some LOG files containing program running
information and sample quality logs. By default the LOG file will be stored
at C:\Documents
and Settings\<username>\Local Settings\Application
Data\Illumina\UniversalCNVAdapter. When something seems to be wrong,
it is a good idea to examine the log file. Examination of the LOG file will
help identify the problem. In some other cases, a sample generates
extraordinarily large number of CNV calls, so examining sample quality
summary will help identify low-quality samples not suitable for CNV calling. |