In this first short piece in the “OPUS Methods” series, MD-PhD candidate Daniel Gould explains what machine learning is and how it can identify patients at risk of readmission after total knee replacement.
Machine learning may sound futuristic, but it has become an increasingly common method in medical research over the past decade. Unlike more traditional statistical methods, researchers do not ‘hard code’ how the computer will analyse a given set of data. Instead, the researcher develops a program or algorithm to enable the computer to learn from the data and develop its own way of completing a set task. In medical research, you may have heard about machine learning being employed to accurately classify diagnostic images and to better predict important patient outcomes.
Over the course of my PhD, I will be using machine learning to predict which patients are at a high risk of being readmitted to hospital within the first month after having a total knee replacement. Readmission to hospital shortly after surgery is a useful way to measure serious complications, which can have a devastating impact on patients while also being associated with a heavy economic toll on the healthcare system. By providing surgeons with an algorithm that can accurately predict a patient’s risk of 30-day readmission, the hope is that this vital (but very specific) bit of information can improve their clinical judgement. What’s more, by sharing this information with patients who may be considering a total knee replacement, it may also be possible to improve the patient’s understanding of what they should expect if they consent to the surgery.
To develop this predictive tool, I will be using ‘supervised machine learning’. Supervised machine learning allows the researcher to label a series of input variables and a single output variable. In my PhD project, the input variables will be drawn from the information about patients that have routinely been collected prior to surgery in the St Vincent’s Melbourne Arthroplasty (SMART) registry. The output variable will be whether the patient was readmitted to hospital within 30 days of their surgery, which has also been routinely recorded on the SMART registry. Put very simply, the algorithm will learn from the data we have from this registry to create an accurate way of predicting if a patient will be readmitted within 30 days (the output variable) based upon the data that is routinely collected about patients before surgery (the input variables). It does this by picking up on patterns in the way that each of these variables are associated with one another. Once the algorithm has taught itself how to predict a patient’s risk of readmission in this set of data, it will then be tested to see how well it predicts readmission on patients’ data that it has not “seen” before. These are the two stages of the development and validation process, which make up a large part of this project.
There are many other types of machine learning that can be used by health researchers including ‘unsupervised’ machine learning, which identifies interesting patterns in data that researchers or clinicians may not otherwise think to look for. In addition to expanding my research into unsupervised machine learning, I also hope to eventually apply these methods to analyse diagnostic images and even to assist qualitative researchers to pick up on patterns in interview transcripts.
If you are interested in learning more about machine learning, the article linked here provides a helpful guide to reading machine learning articles. There are also a number of useful online machine learning courses available through Coursera