December 7 - 9, 2016

Wednesday - Friday

Germantown, MD

The Bioscience Education Center


1 Hour Lunch Break

This in depth lecture and hands-on computer based laboratory workshop is designed for those interested in advanced bioinformatics analysis for DNA-seq. Ideal candidates are research scientists with some NGS analysis experience and bioinformaticians with some or little NGS analysis experience. Knowledge on using command-line in Linux is prerequisite for this workshop.

Lecture and Hands-on Interactive Training
Team taught by active researchers
Comprehensive binder containing workshop material
Space limited to 24 participants
Registration Fee: $995

Course Director

Course Director photo

Sijung Yun, PhD,
C.E.O. of Yotta Biomed, LLC

This workshop gives with a strong foundation of introductory information before moving on to the hands-on analysis portion of the program during the lecture hours.

Students are assigned personal cloud computing accounts to process real big data as an overnight assignment during the program.

Laptops are supplied for the daytime instruction hours but you are welcome to bring your own. Students will need access to their own laptop for the evening assignments. Mac, PC, or Linux will work, Mac is preferred.

  • Overview on DNA-seq analysis in command-line with emphasis on GATK best practices
  • Advanced preprocessing - quality trimming, masking, etc. (2 hr)
  • Short read aligners explained
  • Anatomy of sam/bam file format
  • Details on SNP calling algorithm
  • Details on Insertion/Deletion calling algorithm
  • Anatomy of vcf file format
  • Annotation with ANNOVAR
    • Gene based annotation: RefSeq genes, UCSC genes, ENSEMBL genes, etc.
    • Region based annotation: Conserved regions among 44 species, predicted transcription factor binding sites, segmental duplication regions, GWAS hits, database of genomic variants, DNAse I hypersensitivity sites, ENCODE H3K4Me1/H3K4Me3/H3K27Ac/CTCF sites, ChIP-Seq peaks, RNA-Seq peaks
  • Filtering variants with ANNOVAR
    • Filter-based annotation: dbSNP, allele frequency from 1000 Genome Project, NHLBI-ESP 6500 exomes, EXAC (Exome Aggregation Consortium), SIFT, PolyPhen, LRT, MutationTaster, MutationAssessor, FATHMM, MetaSVM, MetaLR scores, GERP++ score, etc.

Databases and Applications

  • DNA-seq for cancer
    • Somatic variant calling algorithm
    • Using TCGA database
  • NCBI databases for DNA-seq
  • De Novo assembly
  • When you prepare your libraries, you should be aware of the followings

Additional topics being considered:

  • Advanced visualization
  • Systems biology approach for DNA-seq
  • Homology modeling of mutated protein and molecular dynamics simulations

Dr. Yun obtained his Ph.D. in computational biology from Boston University, with his research focusing on the aggregation of amyloid beta protein in Alzheimer's disease.  Sijung took a postdoc position at the NIH, with the National Cancer Institute (NCI) studying structural bioinformatics and proteomics. Later, he worked at the genomics core in National Institute of Diabetes Digestive and Kidney Diseases (NIDDK). Currently, he is an independent contract bioinformatician primarily working for National Institutes of Health (NIH), C.E.O. of Yotta Biomed, LLC and is a lead instructor in numerous bioinformatics next generation sequencing (NGS) training activities. Dr. Yun had directed our Bio-Trac NGS related workshops since 2009 and has provide NGS instruction to over 700 Bio-Trac participants. 

Header image: Darryl Leja, NHGRI

The Bioscience Education Center

Montgomery College
20200 Observation Drive
Germantown, MD 20876