Research Area
Metric Learning

What makes two people alike? According to who? Can similarity itself evolve over time? Might two people shift from being alike to not at all? What makes two events similar? What about physical structures like a hospital or warehouse?

Measuring Similarity

Similarity is a powerful concept that allows us to intelligently augment data and run synthetic clinical trials - rapidly learning about a geography with limited data, what will work for a new population based on what’s been tried in other settings, learn about a new site from what’s known about other sites.

MACRO-EYES has long been focused on the rich question of similarity – across individuals, events, and locations. We have developed new ways to combine expert-crafted distance metrics and distance metrics that are learned directly from the data. We have experimented extensively with the visualization of similarity – across dimensions and across individuals.


R&D in metric learning and similarity acquired form in Pymed: MACRO-EYES software for clinical decision support that learns the relevant context for the patient receiving care. Pymed identifies patients that are multi-dimensionally similar to the relevant patient and clinical events that are similar to the event that requires a response. Pymed was prototyped and tested at Stanford, the basis of an extended engagement for making supply chains patient-centric at Providence St. Joseph Health and has been used at a leading academic medical center in New York City and within a regional health system in Oregon.

The greater the number of clinical dimensions that are measured the richer and more actionable the results. Multi-dimensional queries allow clinicians to pinpoint the interventions or patient characteristic that consistently and uniquely correlate with a specific outcome. For example: the patient is an adolescent female with severe kidney inflammation, her electronic health record in its entirely forms a rich highly dimensional query – she is not only a female with kidney inflammation. The physician wants to determine whether to prescribe anticoagulants. The search results are aggregate data on similar patients with severe kidney inflammation; six of the patients were prescribed anticoagulants and four were not. By studying the difference in outcomes between the two groups, the provider can measure the potential efficacy and risks of anticoagulants for her patient. The more dimensions that two people share the greater the probability that the experience of one is relevant for predicting the behavior of the other. Current practice of matching along a few static dimensions to stratify a population or form a cohort creates two interrelated problems: it’s difficult to verify the internal similarity of the cohort, which makes any potential conclusions misleading and the analysis includes so few variables that there is a chance that confounding variables contributed to the outcome. For practice-based evidence to be clinically effective, the cohorts must exhibit multi-dimensional similarity.

Computing Similarity

There are several aspects of similarity computation that, under traditional approaches, would rely on difficult hand-tuning: experts assigning weights to each of the dimensions (features) that are used for computing similarity: e.g., should age be considered more meaningful than blood pressure? Challenges abound: the heterogeneity, the richness, and the sparsity of the data all pose different difficulties. How to effectively compare patients with a different number of clinical dimensions, not all of which are common? How to compare patients with measurements that are taken at significantly different time-scales?

MACRO-EYES R&D in machine learning patient-to-patient similarity offers answers for deploying similarity to understand common elements between physical locations and across events. Similarity is a powerful tool. Similarity can form a powerful technique for understanding to-date invisible relationships, essentially redrawing conventional networks. Similarity can operationally aid efforts in accurately imputing missing values and can generate evidence to predict the behavior of individuals and forecast the course of events.