Introduction: Single Cell Genomics Tutorial


Single cell genomics is an emerging technology that allows one to explore the genome sequence of individual cells. During this tutorial you will work with real single cell genome data from a single cell that was isolated from a hot spring in Yellowstone National Park (USA). The data you will work with is part of a larger project ('PUZZLE_CELL') that aims to identify and genomically probe novel prokaryotic lineages, and to gain insight in the origin and evolution of life.

The data that you will work with is paired-end Illumina reads. We have chosen to have you work with both HiSeq ( 2x100 bp - dataset 1 ) and MiSeq ( 2x250 bp - dataset 2 ) datasets. Both datasets were generated from the same single cell, hence allowing you to develop a feeling for what you might want to used in any potential future SCG project. In addition, there is a third MiSeq dataset (2x250 bp - dataset 3) that contains a completely new organism, which might be a bit more challenging to work with. Depending on how much time you have at the end of the tutorial you can choose what you want to do. It would be good to at least take some time to think about what kind of analysis would give you answer to the question ‘What is this cell?’. There are several more suggestions for optional exercises that you can choose if you happen to have some extra time at the end of the tutorial.

Overview of steps in this exercise

We will have a lunch break :-). We will split the discussion of results into two parts, one when you have the assembly results and then another one to summarize the whole day. Here is a schematic Workflow of what you will be doing.

  1. Connecting to UPPMAX
  2. Familiarizing with data
  3. Single-cell genome assembly
    3.1. Organize working folder
    3.2 Pre-processing
    3.3 Assembly
    3.4 Assessing assembly quality using Quast
    3.5 Gene prediction using Prodigal
    3.6 Running completeness estimates
    3.7 Identifying ribosomal RNAs
  4. Assessing read coverage and chimera checking
    4.1 Reads mapping
    4.2 Assessing coverage bias
    4.3 Detection and inspection of chimeric reads
    4.4 Insert size
  5. Exploring your single cell genome
    5.1 Contamination analysis in MEGAN
    5.2 Functional analysis in MEGAN
  6. Analysis of a novel single-cell genome
Previous page Next page