Setup

Fastq filename convention

The permanent filename should follow the following format:

{LANE}_{DATE}_{FLOW-CELL}_{SAMPLE-ID}_{BARCODE-SEQ}_{DIRECTION 1/2}.fastq[.qz]

Where some types or formats are required for each element:

  • LANE = Integer

  • DATE = YYMMDD

  • BARCODE-SEQ = A, C, G, T or integer

  • DIRECTION = 1 or 2

The case_id and sample_id(s) needs to be unique and the sample id supplied should be equal to the {SAMPLE_ID} in the filename. Underscore cannot be part of any element in the file name as this is used as the separator for each element.

However, MIP will accept filenames in other formats as long as the filename contains the sample id and the mandatory information can be collected from the fastq header.

Meta-Data

MIP requires pedigree information recorded in a pedigree.yaml file and a config file.

Dependencies

MIP comes with an install application, which will install all necessary programs to execute models in MIP via conda and/or $SHELL. Make sure you have installed all dependencies via the MIP install application and that you have loaded your MIP base environment. You only need to install the dependencies that are required for the recipes that you want to run. If you have not installed a dependency for a module, MIP will tell you what dependencies you need to install and exit.

Extra CPANM modules You can speed up, for instance, the Readonly module by also installing the companion module Readonly::XS. No change to the code is required and the Readonly module will call the Readonly::XS module if available.

CADD MIP is currently unable to install the CADD binary for dynamic calculation of indels and there is also no support for downloading the CADD references file. If you want to use these features in MIP you have to install and download them manually.

Programs

  • Simple Linux Utility for Resource Management (SLURM) (version: 18.08.0)

Pipeline: Rare disease

The version number after the software name are tested for compatibility with MIP.

Databases/References

MIP can download many program prerequisites automatically via the mip download application mip download [PIPELINE].

MIP will build references and meta files (if required) prior to starting an analysis pipeline mip analyse [PIPELINE].

Automatic Build:

Human Genome Reference Meta Files: 1. The sequence dictionnary (".dict") 2. The ".fasta.fai" file

BWA: 1. The BWA index of the human genome.

Star: 1. Star index files of the human genome

Note

If you do not supply these parameters (Bwa/Star) MIP will create these from scratch using the supplied human reference genom as template.

Capture target files: 1. The "infile_list" and .pad100.infile_list files used in picardtools_collecthsmetrics. 2. The ".pad100.interval_list" file used by some GATK recipes.

Note

If you do not supply these parameters MIP will create these from scratch using the supplied "latest" supported capture kit ".bed" file and the supplied human reference genome as template.