Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNAm matrix for EUGEI #37

Open
Tracked by #15
SiyiSEA opened this issue Dec 4, 2024 · 3 comments
Open
Tracked by #15

DNAm matrix for EUGEI #37

SiyiSEA opened this issue Dec 4, 2024 · 3 comments

Comments

@SiyiSEA
Copy link
Owner

SiyiSEA commented Dec 4, 2024

No description provided.

@SiyiSEA SiyiSEA mentioned this issue Dec 4, 2024
8 tasks
@SiyiSEA SiyiSEA changed the title however, the norm.betas file /lustre/projects/Research_Project-MRC190311/DNAm/mrcSCZBlood/EUGEI/2_normalised/EuGEIBloodSamples_Normalised.rdat cannot be opened by the R. I cannot make sure the sample name of the DNAm and genotype file are the same. DNAm matrix for EUGEI Dec 4, 2024
@SiyiSEA
Copy link
Owner Author

SiyiSEA commented Dec 4, 2024

Download the DNAm from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE152026.

There are three files under this link, but I will use the first one GSE152026_EUGEI_processed_signals.csv.gz
image

The dim of the file is 1869*860964.

The colnames are like this: 200889820113_R06C01, 200889820113_R06C01_Detection_Pval, 200864580076_R08C01, 200864580076_R08C01_Detection_Pval.

A pair of sample and pval.

@SiyiSEA
Copy link
Owner Author

SiyiSEA commented Dec 4, 2024

I've tried to match the DNAm_processed_signal with the EUGEI_Methylation_Samples_PostQC_OUT.fam based on the sample names.

Hoever, there is no overlapped sample between these two files!!!

I've check if there is any other files under the path of /lustre/projects/Research_Project-MRC190311/SNPArray/sczMrcBlood/EUGEI can be matched in the DNAm_processed_signal.csv.

No - EUGEI_GROUP_EuroOnly_PCs_NonRefPanel_Dec17.txt
No - EUGEI_SNP_genotypes_matchedIDs.csv

@SiyiSEA
Copy link
Owner Author

SiyiSEA commented Jan 6, 2025

Making a DNAm matrix for EUGEI needs four key files.

  1. fam file EUGEI_Methylation_Samples_PostQC_OUT.fam
  2. pheno file CombinedEuGEISamplesPassedQC.csv
  3. meta file EUGEI_SNP_genotypes_matchedIDs.csv
  4. DANm matrix GSE152026_EUGEI_processed_signals.csv

The relationship among these files:

  1. V1 and V2 in the fam file <---> the Geno.Plate.ID in the meta file;
  2. Sample.Name in the meta file <---> Eilis.Sample.Name in the pheno file;
  3. Basename in the pheno file <---> colname of DNAm

What I did in the EUGEI_DNAm_pheno.R:

  1. filter pheno file by remove samples without age and sex - 1067 left;
  2. match pheno and DNAm based on the matched Sample.Name - 928 left;
  3. rename the Sample.Name and remove the . and - in the name as SampleID;
  4. apply the SampleID into covar, DNAm and fam.

There are four files been created:

  1. covariates.txt which contains IID, Sex, Age and Silde_factor;
  2. methylation.RData which contains DNAm data for 42588 probes and 928 samples;
  3. keep_928_EUGEI.txt contains 928 samples for updating the plink file;
  4. update_sampleID_EUGEI.txt contains the old sample name and the new sample name for updating the fam file;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant