Table of contents

PatientINF & COID

Resources in this project are both are available open-source for secondary research use:

  • PatientINF is an embedding model.
  • COID is the Combined Ontology for Inflammatory Diseases.

Background

Semantic similarity builds upon the patient voice in understanding synonyms and the sort of terms patients are using. Not only can semantics help to better undertand the patient, but also the clinician. Patients may not use clinical terms (jargon) often and domain experts may be unfamiliar with patient-preferred terms.

This work was influenced by the OcIMIDo project1 in that some patient-preferred terms were to the surprise of the domain expert.

Methodology

PatientINF is a novel Word2Vec embedding model, developed like so:

  1. Using ClinicalBERT2 - a model derived from the the clinician’s voice via clinical letters.
  2. Retrained using data extracted from Patient.info online forum, specifically topics on inflammatory diseases.

Multiple tests were conducted to observe the impact of a clinician-generated and patient-generated “combined” model. Tests included Wilcoxon (the change of vector space) and a Pearson correlation coefficient to test the embedding model similarities compared to a physician annotations.

COID was developed due to the need of an ontology aimed solely on inflammatory diseases. COID also covers anatomy, symptoms, and more. COID also used similar statistical methods for synonym curation from Pendleton et al. (2021). Futhermore, synonyms were also curated from PatientINF embedding model: looking at semantic similarity of terms of interest.

coid visualised

Figure: Graphical representation of COID. Each node is a class and each edge is a relationship.

Impact

Semantic characterisations of the models revealed clinicians consisted of more frequent misspellings, whereas patients used more abbreviations. Patient priorities were highlighted: showing how clinicians and patient similarites differ. For example, diarrhea and stomach cramp for patients are more similar yet not so much in the clinical domain.

embedding model visualised

Figure: t-SNE of the released embedding model highlighing terms in COID. Each dot is a term and colours represent the category within COID.

References


  1. Pendleton, Samantha C., et al. “Development and application of the ocular immune-mediated inflammatory diseases ontology enhanced with synonyms from online patient support forum conversation.” Computers in biology and medicine 135 (2021): 104542. ↩︎

  2. Huang K, Altosaar J, Ranganath R. Clinicalbert: Modeling clinical notes and predicting hospital readmission. arXiv preprint arXiv:1904.05342. 2019. ↩︎