Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fuzzy_joinleft() needs to be a function #28

Open
dlemas opened this issue Jan 10, 2022 · 1 comment
Open

fuzzy_joinleft() needs to be a function #28

dlemas opened this issue Jan 10, 2022 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@dlemas
Copy link
Collaborator

dlemas commented Jan 10, 2022

Please create: ufrc_fuzzy_joinleft() in the /utils.R file as a function. This will help reduce the code space devoted to each data processing script.

START LOOP

chunks=length(delivery_ids)
pages <- list()

for(i in 1:chunks){

subset data

delivery_subset=delivery_final %>%
filter(mom_id==delivery_ids[i]) %>%
select(mom_id,everything())

gravid_subset=gravid_final %>%
filter(mom_id==delivery_ids[i]) %>%
select(mom_id,everything())

fuzzy=fuzzy_left_join(delivery_subset,gravid_subset,
by = c("mom_id" = "mom_id",
"part_dob" = "pregnancy_start_date",
"gest_start_date"="pregnancy_start_date"),
match_fun = list(==, >=, <=))

pages[[i]] <- fuzzy
} # END LOOP

data_ready=bind_rows(pages)

@dlemas dlemas added the enhancement New feature or request label Jan 10, 2022
@dlemas dlemas added this to the Data Dictionary- IDR Raw Data milestone Jan 10, 2022
@xkcococo
Copy link
Member

xkcococo commented Jan 14, 2022

@dlemas
I was wondering if this works:

ufrc_fuzzy_joinleft<-function(delivery_final,gravid_final,
                              delivery_ids){

  chunks=length(delivery_ids) 
  pages <- list()

  ## start loop

  for(i in 1:chunks){
    # subset data
    delivery_subset=delivery_final %>%
      filter(mom_id==delivery_ids[i]) %>%
      select(mom_id,everything())

    gravid_subset=gravid_final %>%
      filter(mom_id==delivery_ids[i]) %>%
      select(mom_id,everything())
    
    fuzzy=fuzzy_left_join(delivery_subset,gravid_subset,
                          by = c("mom_id" = "mom_id",
                                 "part_dob" = "pregnancy_start_date",
                                 "gest_start_date"="pregnancy_start_date"),
                          match_fun = list(`==`, `>=`, `<=`)) 
    
    pages[[i]] <- fuzzy
  }   ## end loop
  
  data_ready=bind_rows(pages)
  return(data_ready)
}

The only variables we need to set when use this function are the final data for delivery and gravid, and ID (delivery_ids).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
PE Prediction
Awaiting triage
Development

No branches or pull requests

3 participants