Neither code nor data are provided due to confidentiality reasons
Integrating some of the diverse data collected from patients during their hospital stay to roughly estimate the length of their stay in the intensive care unit for logistic reasons; primarily to plan patient flow through the unit as to efficiently accommodate as many patients as possible, with AICU having 20 beds and PICU having 12, in unit time with the effectivequality of healthcare already delivered.
Data quality assessment is performed where missing, duplicated and outdated data are handled, and invalid or cases irrelevant to project scope are reported.
- If a patient visit consists only of an Admission event followed by a Cancel Admission event
Sample Visit:
- If a patient visit includes neither OR nor ICU as visited units
Sample Visit:
- If a patient visit has a missing Transfer From/Transfer To event
Sample Visit:
• It is noticed that in records number 106065 and 106066: the patient was transferred from Cath to Pediatric ICU, while only after two seconds in records number 106067 and 106068: that the patient was transferred from Cath to CCU 1.
• In the next event, the patient transferred from CCU 1, which means that the records numbered 106065 & 106066 are wrong.
Cases irrelevant to project scope are eliminated and invalid cases are solved.
- If a patient visit consists only of an Admission event followed by a Cancel Admission event
- If a patient visit includes neither OR nor ICU as visited units
- If a patient visit has a missing Transfer From/Transfer To event
- Length of ICU Stay and Length of Hospital Stay features calculation
- Lab names and results re-structuring
• The tests data in the dataframe were re-structured so that each order for a patient in a visit is represented by a row, whose columns are the test names and the column values being the test result in that specific order, while keeping the 'MEDICALNO', 'VISITNO' and 'ORDERNO' constant.
• Concerning that 'TESTTIME', since each test in the order is carried at a different time but they are all on the same day (most of the time), the time will be trimmed from the date before re-structuring.
As explained before, since the project aim is to be used for logistic reasons, the problem was treated as regression to output the predicted number of days of a patient’s stay in the ICU in addition to a margin of error but not classification as in each class expresses a certain range of days.
Due to time limitation reasons in addition to strong consensus viewed in literature review of the problem, the machine learning algorithm used was XGB Regressor to determine the imputation method of best performance.