Speakers - WNSC 2024

Delgersuren Bold

  • Designation: Senior Research Data/Informatics Specialist, Emory University, School of Nursing, Center for Data Science
  • Country: USA
  • Title: Using Large Language Models to Tag Clinical Concepts Extracted from Nursing Notes

Abstract

Nursing notes contain information complementary to widely used objective measurements, including laboratory test results and vital signs, that can enrich structured data used in machine learning models to recognize patient deterioration. Our group aimed to enhance machine learning models for patient deterioration recognition by incorporating information from nursing notes. We developed a multi-modality data fusion algorithm framework that represented heterogeneous data as time-stamped tokens. To extract clinical concepts from nursing notes, we utilized MetaMap (from NIH), a medical-named entity extraction tool. However, MetaMap had limitations in terms of concept detection accuracy, temporality, and negation. To address this, we tested general-purpose large language models (LLMs), i.e., GPT3.5, to tag further a clinical concept detected by MetaMap. Annotations were performed by a team of 3 nursing and 6 physician professionals on 160 randomly selected notes in the web-based annotation tool we developed.

Discrepancies were resolved through group consensus. We utilized Microsoft Azure OpenAI Service to submit each note individually and asked specific questions related to concept recognition, temporality, and negation to GPT3.5. We also experimented with a 'chain of thought' prompting process to improve accuracy. Results showed that GPT3.5 achieved high accuracy and F1 scores for detecting concepts correctly. However, it performed poorly in determining whether a detected idea should be negated. The study highlighted the potential of using large language models like GPT3.5 to process nursing notes, but further research is necessary to improve the model's performance in tagging temporality and negation of concepts. We collectively acknowledged that our study raised more questions than answers and discussed the conservative nature of GPT3.5 in providing solutions related to diseases, syndromes, and symptoms. Despite this, we found the potential of LLMs such as GPT 3.5 to be valuable in giving correct answers and clinically sound explanations. They emphasized the possibility of using language models in nursing to improve communication, reduce documentation burden, and enhance AI applications with valuable information extracted from nursing notes. In conclusion, this pilot study demonstrated that LLMs such as GPT 3.5 can accurately detect clinical concepts identified by traditional clinical-named entity recognition approaches. However, further research is needed to refine LLM's ability to tag temporality and negation of concepts in nursing notes.

Don't miss our future updates!

Get in Touch