Machine Learning Technique Identifies and Classifies CKD Subtypes

Eric Seaborg
Search for other papers by Eric Seaborg in
Current site
Google Scholar
Full access

Even among patients with similar levels of kidney function, an algorithm that considers a host of characteristics—including demographics, biomarkers from blood and urine, health status and behaviors, and medication use—can categorize patients into three clinically distinguishable clusters associated with distinct outcomes, such as chronic kidney disease (CKD) progression, cardiovascular disease, and death, according to a new study in JASN.

This style of “subtyping” of CKD using “multi-dimensional patient data holds the key to precision medicine,” the authors write in “Subtyping CKD Patient by Consensus Clustering: The Chronic Renal Insufficiency Cohort (CRIC) Study.” The approach could provide a better clinical picture of the course of a patient’s kidney disease compared with simply considering traditional risk factors, the study authors state.

The 2012 KDIGO (Kidney Disease: Improving Global Outcomes) classification guidelines stage kidney disease using a patient’s estimated glomerular filtration rate (eGFR) and urine albumin excretion relative to urine creatinine ratio (UACR), and this new subtyping technique provided additional useful information beyond these measures, the authors say, adding that staging CKD using eGFR and UACR “does not fully capture the underlying patient heterogeneity.”

Toward personalized medicine

The study is a step toward more personalized, precision medicine, according to Sushrut S. Waikar, MD, MPH, one of the lead authors of the study and professor of medicine at Boston University School of Medicine and chief of nephrology at Boston Medical Center.

“The term chronic kidney disease doesn’t refer to a single entity, but rather is an umbrella term that encompasses a large number of underlying disease pathologies,” Waikar told ASN Kidney News. “Clinically we often don’t make specific pathological diagnoses [of CKD], for example, with a kidney biopsy. As a result, we group together a potentially large number of diseases under an umbrella term like hypertensive kidney disease. But is hypertensive kidney disease a single disease or is it an umbrella term for 10 different diseases, each of which has a different etiology and potential treatments? And the same question can be asked for diabetic kidney disease. [This study is a step toward] trying to identify the heterogeneity underlying what we think are common forms of kidney disease.”

Each patient is unique, says the other lead author, Zihe Zheng, MBBS, MHS, a doctoral candidate in the department of biostatistics at the University of Pennsylvania Perelman School of Medicine: “Patients are different, and people with similar kidney function are still different. This heterogeneity is something we really want to highlight [in this study]. Our main focus is to classify patients to find out how they are different from each other in the expectation that that will shine some light on the underlying pathophysiology.”

72 baseline characteristics

The study used data from the CRIC project, an ongoing prospective cohort study of adults with CKD stage 2 to 4. Participants were recruited from 2003 through 2008 from clinical centers in seven U.S. cities. Since then, they have been followed through annual clinic visits during which investigators collect health information and urine and blood specimens for an extensive testing menu.

The researchers analyzed this database using a machine learning method called unsupervised consensus clustering. Consensus clustering refers to a process of using several algorithms to look for similarities that is “unsupervised” because researchers did not decide in advance how the groups should look. The algorithms looked at 72 baseline characteristics of the patients out of 822 variables measured in each patient at the CRIC study baseline. The 72 variables were selected based on a literature review for those most clinically relevant to CKD.

Three clusters

“The algorithm revealed three unique CKD subgroups that best represented patients’ baseline characteristics,” the authors write. Cluster 1 included patients with “relatively favorable levels” of bone and mineral, cardiac, and kidney function markers; diabetes; and obesity. The patients used fewer medications than members of the other clusters. Patients in cluster 2 had a higher prevalence of diabetes, had greater markers of obesity, and used more medications. Patients in cluster 3 had even higher levels of diabetes and obesity, and had the least favorable levels of bone and mineral, cardiac, inflammation, and kidney function markers.

The cluster membership was strongly associated with patients’ future risks of kidney disease progression, cardiovascular events, and death, with risks escalating from cluster 1 through 3. “We showed a strong independent association between the cluster membership and future adverse events, after controlling for the known CKD risk factors, such as eGFR, UACR, blood pressure and diabetes status, etc., to be at the same level,” the authors write. “The cluster membership provided a simple metric of summarizing the patient heterogeneity and comorbidity profiles encoded in the 72 baseline variables.”

Consensus clustering has been used as a phenotyping tool in other heterogeneous conditions such as heart disease, type 2 diabetes, and several forms of cancer, and the authors write that “identification of clinical meaningful subgroups among CKD patients provides an important step toward patient classification and precision medicine in nephrology. Being able to characterize this heterogeneity early is an important step towards individualizing follow-up strategies for these patients.”

“I think this is a step in the direction of using multi-dimensional data for risk prediction for chronic kidney disease,” Waikar said. It remains to be seen whether mining the data in electronic medical records will be an approach that clinicians will be able to use to identify the prognosis and tailor the treatment for individual patients who share certain characteristics. “Can we identify the patients in clinical practice who would benefit from more intensive therapy and more intensive monitoring?” he asks.

As an example of the kinds of clues about treatment targets the information could provide, the study notes that inflammatory mechanisms are involved in the development and progression of CKD and its comorbidities such as cardiovascular disease. “The identified clusters may represent different states of inflammation which could, in part, explain the differences in risks of developing adverse clinical events,” the authors write.

Girish N. Nadkarni, MD, MPH, assistant professor at Mount Sinai Health System in New York City, who was not involved in the study, said the study recognizes “that chronic kidney disease is quite a heterogeneous syndrome, and [the researchers] are trying to use data-driven techniques to tease out the heterogeneity. They are trying to show that this is not just one disease but a syndrome comprised of many different subtypes of different types of disease. There is great promise in this approach in order to discover unknown risk factors. This is the first step in a continuum of research trying to show that all chronic kidney diseases are not the same.”