Predictive Modelling using Machine Learning Algorithms based on Structured and Unstructured Data of a Swiss Home Care Institution
Etienne Frey, 2022
Betreuende Dozierende: Hans Friedrich Witschel
Views: 39 - Downloads: 15
Home care institutions provide medical and social services to patients at home. Due to their ailments, recipients of home care services are at risk of requiring a hospital referral or experiencing early death. Early identification of a change in a patient's condition may support home care institutions to prevent such events and adjust their resource planning accordingly. Various structured and unstructured data are collected during the care pathway of a home care patient. Data mining techniques can be used to identify a patient at risk for specific events. Multiple studies have successfully applied data mining-based prediction models in different medical settings. In home care, medical decisions are currently based on anecdotal evidence rather than on evidence derived from data, which may lead to biases and errors. This master thesis aimed to predict if a patient recovers, must be relocated, remains unchanged, or dies within the next three months to identify the need for care intensification or reduction and improve home care institutions' planning capabilities.
Qualitative methods were used to evaluate the requirements of the model, select the algorithm, and assess with home care experts the medical indicators currently used to identify a patient at risk for specific events. To develop the prediction model, a convolutional neural network was applied to a combination of structured and unstructured data of SPITEX Basel, a Swiss home care institution. A cost-sensitive learning strategy was used to train the model. Quantitative techniques and a qualitative analysis were used to evaluate the prediction model. The results of the convolutional neural network were compared to a random forest model.
In interviews with home care experts, it was observed that the medical experts associate different indicators (e.g., age, biological diseases, or specific terms and expressions in text) with a likely recovery, stagnation, relocation, or death of a home care patient. The indicators are collected at different intervals in structured (e.g., in structured assessment forms) and narrative data (e.g., in progress reports). Features were selected and constructed from existing data based on the assessed indicators to train the prediction model.
With an AUC of 61.24% on the testing data and an AUC of 93.79% on the training data, the convolutional neural network overfitted and could not accurately predict the minority class attributes (i.e., recovery, relocation, and death). A random forest model was less sensitive to overfitting, with an AUC of 68.43% on the testing data and an AUC of 82.70% on the training data but was also unable to predict the minority class attributes accurately and caused higher misclassification costs than the convolutional neural network. A plotted learning curve indicated that more data would improve predictive performance. The qualitative analysis revealed that medical events that occurred at different times were falsely recorded at the same time in the same form. Moreover, a time-series analysis of four or more weeks of service interruptions showed that narrative data contained evidence of hospitalizations not recorded in the exit form.
The convolutional neural network and random forest algorithm could not accurately predict the minority class attributes. Future research should explore the extraction of all and the true date of events from narrative data and improve the model's generalization by increasing the study period, combining data from different home care institutions, or training word embeddings with data from text corpora. Furthermore, more or various features (e.g., data from the service planning sheet) and different algorithms should be tested.
Studiengang: Business Information Systems (Master)
Fachbereich der Arbeit: