diff --git a/README.md b/README.md index 624aa55..85aed78 100644 --- a/README.md +++ b/README.md @@ -102,35 +102,7 @@ The --bind (Singularity) or --volume (Docker) paramete
  • -Inside the container, we will first set up our local instance of the Michigan Imputation Server. -

    -

    -The setup-hadoop command will start a Hadoop instance on your computer, which consists of four background processes. When you are finished processing all your samples, you can stop them with the stop-hadoop command. If you are using Docker, then these processes will be stopped automatically when you exit the container shell. -

    -

    -The setup-imputationserver script will then verify that the Hadoop instance works, and then install the 1000 Genomes Phase 3 v5 genome reference that will be used for imputation (around 15 GB of data, so it may take a while). -

    -

    -If you are resuming analyses in an existing working directory, and do not still have the Hadoop background processes running, then you should re-run the setup commands. If they are still running, then you can skip this step. -

    - -```bash -setup-hadoop --n-cores 8 -setup-imputationserver -``` - -

    -If you encounter any warnings or messages while running these commands, you should consult with an expert to find out what they mean and if they may be important. However, processing will usually complete without issues, even if some warnings occur. -

    -

    -If something important goes wrong, then you will usually see a clear error message that contains the word "error". Please note that if the commands take more than an hour to run the setup, then that may also indicate that an error occurred. -

    - -
  • - -
  • -

    -Next, go to /data/mds, and run the script enigma-mds for your .bed file set. The script creates the files mdsplot.pdf and HM3_b37mds2R.mds.csv, which are summary statistics that you will need to share with your working group as per the ENIGMA Imputation Protocol. +Inside the container, we will first go to /data/mds, and run the script enigma-mds for your .bed file set. The script creates the files mdsplot.pdf and HM3_b37mds2R.mds.csv, which are summary statistics that you will need to share with your working group as per the ENIGMA Imputation Protocol.

    Note that this script will create all output files in the current folder, so you should use cd to change to the /data/mds/sample folder before running it. @@ -167,6 +139,34 @@ enigma-mds --bfile /data/raw/sample_b

  • +
  • +

    +Next, we will set up our local instance of the Michigan Imputation Server. +

    +

    +The setup-hadoop command will start a Hadoop instance on your computer, which consists of four background processes. When you are finished processing all your samples, you can stop them with the stop-hadoop command. If you are using Docker, then these processes will be stopped automatically when you exit the container shell. +

    +

    +The setup-imputationserver script will then verify that the Hadoop instance works, and then install the 1000 Genomes Phase 3 v5 genome reference that will be used for imputation (around 15 GB of data, so it may take a while). +

    +

    +If you are resuming analyses in an existing working directory, and do not still have the Hadoop background processes running, then you should re-run the setup commands. If they are still running, then you can skip this step. +

    + +```bash +setup-hadoop --n-cores 8 +setup-imputationserver +``` + +

    +If you encounter any warnings or messages while running these commands, you should consult with an expert to find out what they mean and if they may be important. However, processing will usually complete without issues, even if some warnings occur. +

    +

    +If something important goes wrong, then you will usually see a clear error message that contains the word "error". Please note that if the commands take more than an hour to run the setup, then that may also indicate that an error occurred. +

    + +
  • +
  • Next, go to /data/qc, and run enigma-qc for your .bed file sets. This will drop any strand ambiguous SNPs, then screen for low minor allele frequency, missingness and *Hardy-Weinberg equilibrium*, then remove duplicate SNPs (if necessary), and finally convert the data to sorted .vcf.gz format for imputation.