Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPT4 ULMS API process causing insertion of carriage returns in "CONCEPT.csv" #333

Open
odikia opened this issue May 5, 2023 · 2 comments

Comments

@odikia
Copy link

odikia commented May 5, 2023

I'm presently having to clean the final concept.csv prior to insertion into a postgres database following the insertion of CPT4 codes via the cpt.bat process that is described upon downloading the vocabulary from Athena.

PostgreSQL (run in psql CLI, ):

\copy omop.concept FROM '\path\to\modified\concept.csv' WITH (FORMAT CSV, DELIMITER E'\t', QUOTE E'\b', ENCODING 'UTF8', HEADER TRUE)

Query returns:

ERROR: unquoted carriage return found in data
HINT: Use quoted CSV field to represent carriage return.

System and File information

Included datafile with 4 error examples:
See attached. Note that ULMS CPT4 codes being pulled down requires a license. I provide 4 error examples with Concept name and Concept code redacted so as to ensure that I haven't created any kind of license infringements by providing this document. The OMOP information provided by Odysseus, including Concept_ID's, remain.

OMOP Vocabulary version:
v5.0 23-JAN-23

Java info:
Version 8 Update 361 (build 1.8.0_361-b09)

Target Database version:
PostgreSQL 14.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-12), 64-bit

System info:
Processor 12th Gen Intel(R) Core(TM) i7-1270P 2.20 GHz
Installed RAM 32.0 GB (31.4 GB usable)
System type 64-bit operating system, x64-based processor

Windows Info:
Edition Windows 10 Enterprise
Version 21H2
Installed on ‎7/‎20/‎2022
OS build 19044.2846
Experience Windows Feature Experience Pack 120.2212.4190.0

CONCEPT_first_4_cpt4_errors.csv

@mik-ohdsi
Copy link

@odikia - Daniel, this is odd... let me double check if we have changed anything recently about the cpt4.jar.

@mik-ohdsi
Copy link

@odikia - looks as if we have been doing this for a while now. I can confirm that it seems that all rows for CPT4 in the concept.csv after reconstitution end with a CRLF instead of only a LF. Did you always update your vocabularies in the same way and if so, when was the last time that you were able to do so without an error?

@mik-ohdsi mik-ohdsi transferred this issue from OHDSI/Vocabulary-v5.0 Jun 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants