BIT 815, Deep Sequencing Data Analysis¶
Instructor: Dr. Ross Whetten¶
Course Description¶
The Analysis of Deep Sequencing Data course is designed to introduce biologists to the Linux command-line computing environment, to cloud computing, and to open-source software for analysis of next-generation sequencing data. Class sessions consist of two-hour blocks, each beginning with presentation and discussion of a specific topic, followed by hands-on computing exercises using model datasets. A total of 45 two-hour blocks are scheduled over a 15-week period, and the course is offered once per calendar year. The importance of cloud computing is emphasized, due to the increasing demands for RAM and storage space required for analysis and storage of high-throughput DNA sequencing data, and the cost-effectiveness and flexibility provided by cloud computing solutions. Applications of sequencing discussed include genome sequencing (both de-novo and resequencing), transcriptome analysis, discovery of sequence and structural variations, ChIP-seq methods for mapping DNA-protein interactions, and genotyping by sequencing (GBS and RAD-seq methods). For each application of sequencing technology, discussion topics include experimental design strategies, methods for library construction, sources of experimental and biological variation, and analytical approaches available in open-source software packages. Computing exercises utilize the software discussed, and provide participants with the opportunity to carry out analysis of sample datasets using a virtual machine image through the NC State University Virtual Computing Lab. This Linux system is customized to provide the bioinformatics software described during the course, and is available for class participants to use at any time. The objective of the course is not to make course participants experts in every aspect of sequence analysis, but instead to empower participants to learn the specific skills they need by teaching basic skills in command-line Linux computing, and providing an introduction to the literature and on-line resources. The course is directed at graduate students, but has also attracted participation from faculty, post-doctoral researchers, and research technicians interested in expanding their skills in the area of sequence data analysis.
- Semester Overview, 2022
- Readings and Resources
- NC State Bioinformatics Users Group (BUG)
- Course Notes
- Global overview books and papers
- Data Management and Project Organization
- Library construction and experimental design
- Data formats and alignment software tools
- Data quality assessment, filtering, and correction
- De novo assembly
- Chromatin analysis
- Transcriptome analysis
- Comparing genomes and assemblies; variant detection
- Population Genomics
- Workspace environments
- Links for Exercise Data
- Links to other useful sites
- Computing Hardware: The High-Performance Computing Cluster (HPC) at NC State
- Introduction to Linux and the Command-line Interface
- Sequencing Instruments
- Experimental Design
- Data Preprocessing and Quality Control
- Error Correction and Alignment
- Transcriptome Assembly
- Genome Sequencing and Assembly
- Re-sequencing, Alignment, Structural Variation
- Discovering and Genotyping Genetic Variation
- R and R Studio
- Transcriptome Analysis: Differential Gene Expression and Annotation
- Genome or Chromatin Structural Analysis: Chromatin immunoprecipitation, DNAse hypersensitivity, 3-D conformation
- Awk, Sed, and Bash: Command-line file Editing and Processing
- CLC Genomics Workbench
- HPC and LSF
Indices and tables¶
Last modified 3 April 2020. Edits by Ross Whetten, Will Kohlway, & Maria Adonay.