Releases: lcpilling/ukbrapR
Releases · lcpilling/ukbrapR
ukbrapR v0.3.0
New features
- Suite of functions to extract and load genetic variants. Main ones of interest will be:
extract_variants()
takes a list of variant rsIDs as input and extracts the imputed genotypes, loading to memory. This is really a wrapper around two other new functions:make_imputed_bed()
andload_bed()
. Also available ismake_dragen_bed()
to extract from whole genome sequence VCF files but this is pretty slow so usually user wants imputed variants.create_pgs()
creates a polygenic score (weighted allele score) using user-provided variants and weights. Loaded to memory but also saves a nicely formatted .tsv
Breaking changes
- Removing dependencies: reticulate, arrow, sparklyr. These take a few previous seconds to install every time and are rarely needed. Instead will be installed if user tries to use
get_rap_phenos()
get_emr_spark()
removed entirely. Much better to useget_diagnoses()
which has had a lot of updates to functionality and bug fixes.
ukbrapR v0.2.9
Bug fixes
- Fixes for issue #19 (thanks to @nsandau for the help):
- Where OPCS searches were not always performed correctly if only OPCS3/4 codes were provided.
- When using "group_by" in
get_df()
some diagnoses were incorrectly carried over between groups when different vocabs were provided for each group (condition).
Updates
- Additional checking of
get_diagnoses()
input to abort if "blank" codes are provided to the grep. - When getting date first from self-reported illness data exclude "year" if < 1936 (earliest birth year for any participant)
ukbrapR v0.2.8
Bug fixes
- Baseline dates TSV is now correctly located even if user changes working directory
- HES operations dates were sometimes parsed as character - this is now fixed to parse as dates
Updates
- Warnings relating to parsing issues during grepping that are safe to ignore are now suppressed
- Updates to documentation / examples / pkgdown site
- New website articles to
ascertain_diagnoses
,label_fields
and forspark_functions
ukbrapR v0.2.7
Updates
- New function
label_ukb_field()
allows user to add titles and labels to UK Biobank fields provided as integers but are categorical. - New function
label_ukb_fields()
is a wrapper for the above. User just provides a data frame containing UK Biobank fields, and they all get formatted with titles (and labels if categorical). - Data from the UK Biobank schema (https://biobank.ctsu.ox.ac.uk/crystal/schema.cgi) are stored internally in
ukbrapR:::ukb_schema
- {haven} dependency added for labelling
- Exported
baseline_dates.tsv
now also includes the assessment centres for completeness (but keeps the same filename to avoid any issues for current projects relying on already-exported files)
ukbrapR v0.2.6
Bug fix
- Fix for issue #10. Grep issues if user provided only Read2 or CTV3 codes, if Read2 or CTV3 were <5 characters, or if Read2/CTV3 codes contained a hyphen. Thanks to @Simon-Leyss for highlighting.
- Fix for issue #11. When getting self-reported illness codes there was a problem joining the tables if user only provided cancer codes. Thanks to @LauricF for highlighting.
- Fix for when both types self-reported illness codes were provided. (Incorrect subsetting to just those codes provided after pivoting the long object.)
ukbrapR v0.2.5
Bug fix
- When getting the date first cancer registry diagnosis, some rows were duplicated. This is now fixed so only one row per participant (the date first for any matched cancer ICD10) is returned.
ukbrapR v0.2.4
Changes
- Updated internal paths for my servers
indy
andsnow
(for ongoing projects whilst we can still use local files...) - Updated how
get_diagnoses()
andget_df()
handle a user-providedfile_paths
object
ukbrapR v0.2.3
ukbrapR v0.2.2
The HESIN diagnosis search can now also include ICD9 codes in the provided codes data frame. These use fuzzy matching (similar to the ICD10s) so that searching for "280" also returns "2809" etc
Also some other minor bug fixes and internal updates