Investigations into the patient voice: a multi-perspective analysis of inflammation

The patient is the expert of their medical journey and their experiences go largely unheard in clinical practice. Understanding the patient is important as bridging gaps in the medical domain enhances clinical knowledge, benefiting patient care in addition to improving quality of life. Valuable solutions to these problems lie at the intersection of Machine learning and sentiment analysis; through ontologies, semantic similarity, and clustering. In this thesis, I present challenges and solutions that explore patient quality of life pertaining to two inflammatory diseases: Uveitis and Inflammatory Bowel Disease, which are immune-mediated inflammatory diseases and often undifferentiated. This thesis explores how a patient’s condition and inflammation influences their voice and quality of life via sentiment analysis, clustering, and semantic characterisations.


With guidance from domain experts and a foundation derived from clinical consensus documents, I created an application ontology, Ocular Immune-Mediated Inflammatory Diseases Ontology (OcIMIDo), which was enhanced with patient-preferred terms curated from online forum conversations, using a semi-automated statistical approach - with application of annotating term-frequency and sentiment analysis. Semantic similarity was explored using a preexisting embedding model derived from clinical letters to train other models consisting of patient-generated texts for systematic comparison of the clinician and patient voice. In a final experimental chapter, blood markers were clustered and analysed with their corresponding quantitative quality of life outcomes using patients in the UK Biobank with Inflammatory Bowel Disease.


OcIMIDo is the first of its kind in ophthalmology and sentiment analysis revealed that first posts were more negative compared to replies. Systematic comparisons of embedding models revealed frequent misspellings from clinicians; use of abbreviations from patients; and patient priorities - models performed better when the clinical domain was extended with equivalent-sized, patient-generated data. Clusters unveiled insight into the presence of inflammatory stress and the relationship with happiness and the presence of a maternal smoking history with a Crohn’s disease diagnosis.


Patient-preferred terms prove the patient voice provides meaningful text mining and fruitful sentiment analysis, revealing the role a forum plays on patients; semantic similarity highlighted potential novel disease associations and the patient lexicon; and clustering blood markers featured clusters presenting a relationship with sentiment. In summary, this deeper knowledge of quality of life biomarkers through the patient voice can benefit the clinical domain and patient outcomes as understanding the patient can improve the clinical-patient relationship and communication standards: all benefiting the diagnosis process, developing treatment plans, and shortening these intensive time hauls in clinical practice.

PhD thesis available to read online.

This work was my PhD, completed in 2023.