Jabberwocky

Jabberwocky is a Natural Language Processing (NLP) toolkit for those nonsensical ontologies1. Available open-source on GitHub.

Unstructured text is a valuable resource for research, yet text mining is a complicated task. Text mining with key terms can be limited and potentially enhanced with the knowledge of synonyms. Especially synonyms in that domain.

Ontologies are useful - they condense a domain of knowledge in a structured manner and classes may have synonyms. Ontologies have proven useful in text mining tasks, yet there lie gaps in the NLP community for the easy manipulation of ontologies.

Jabberwocky features includes the extraction of synonyms (and other metadata) from an ontology giving key terms [classes] of interest. With these key terms, and corresponding synonyms, users can annotate a corpus for which posts/sentences the term occurs. An updated feature as of 2021 is the use of annotation with a Phrase Matcher2, which users can now use phrases.

Other features: with the methods from Pendleton et al. (2021)3 users can run a TF-IDF4 (statistical method) to rank all corpus terms via importance. An updated feature as of 2024 is to ability to apply this method via n-grams so rankings expand to uni-grams, bi-grams, tri-grams, and more. The TF-IDF output can prove useful for updating the ontology and future text mining tasks. Users can also update the ontology with other metadata. Finally, as of 2024 users can plot an ontology in web or tree format.

jabberwocky features visualised

Figure: Workflow of Jabberwocky features. Read the documentation for more information and a scenario.

References


  1. Pendleton, Samantha C., and Georgios V. Gkoutos. “Jabberwocky: an ontology-aware toolkit for manipulating text.” Journal of Open Source Software 5.51 (2020): 2168. ↩︎

  2. Honnibal, Matthew, et al. “spaCy: Industrial-strength natural language processing in python.” (2020). ↩︎

  3. Pendleton, Samantha C., et al. “Development and application of the ocular immune-mediated inflammatory diseases ontology enhanced with synonyms from online patient support forum conversation.” Computers in biology and medicine 135 (2021): 104542. ↩︎

  4. Term frequency inverse document frequency↩︎