I. Introduction
Sepsis continues to be a leading cause of morbidity and mortality among critically ill patients, necessitating timely and effective management strategies [
1,
2]. Early identification and treatment are critical, yet challenges remain—especially when immediate diagnostic results are unavailable. This study presents a multi-agent system designed to support intensivists in making informed decisions about antibiotic therapy and in adhering to established guidelines during the initial phase of sepsis management.
Sepsis is a complex syndrome characterized by a dysregulated host response to infection that results in organ dysfunction [
3,
4]. Timely initiation of appropriate antimicrobial therapy is essential for improving patient outcomes, as delays significantly increase mortality rates. Recent studies indicate that the mortality rate in septic patients may increase by 7% to 9% for every hour that appropriate antibiotic therapy is delayed [
5,
6]. Therefore, rapid decision-making is of critical importance.
In clinical practice, intensivists encounter challenges due to the overwhelming volume of information on sepsis management. Although guidelines provided by organizations such as the Surviving Sepsis Campaign offer valuable recommendations [
7], adherence is often inconsistent. Moreover, the emergence of antimicrobial resistance complicates the selection of empirical therapy, necessitating a tailored approach based on local resistance patterns and patient-specific factors.
The aim of this study was to evaluate a multi-agent system intended to assist with antibiotic therapy and adherence to contemporary sepsis management guidelines. The system integrates three specialized agents—a sepsis management agent, an antibiotic recommendation agent, and a guidelines compliance agent. By employing retrieval-augmented generation (RAG) techniques [
8,
9], the system seeks to enhance the decision-making process and ensure that clinicians have access to the most current and relevant information.
II. Case Description
1. Clinical Vignette of the Case
We present the case of an 86-year-old woman with a history of chronic obstructive pulmonary disease who was admitted to the intensive care unit with severe pneumonia and suspected sepsis, as identified in the MIMIC IV database [
10] (Case ID 10020786). The case has been organized as a clinical vignette (
Table 1) for further analysis. Initial assessments revealed hypotension, critically low blood oxygen levels, and an elevated white blood cell count, prompting immediate intervention. A blood culture was obtained, although the results were pending during the decision-making process.
2. Multi-Agent System Architecture
Recently, multi-agent systems based on large language models (LLMs) have shown promising results in solving complex problems across various domains [
11,
12]. The multi-agent system developed in this study consists of three key agents:
(1) Sepsis management agent: This agent evaluates the overall management strategy for sepsis cases by analyzing the clinical vignette and relevant literature.
(2) Antibiotic recommendation agent: This agent analyzes patient data and retrieves pertinent literature to recommend appropriate antibiotic therapies in accordance with current guidelines.
(3) Sepsis guidelines compliance agent: This agent reviews the proposed treatment to ensure it aligns with established sepsis management guidelines.
The architecture and data flow of the system are illustrated in
Figure 1. Each agent interacts seamlessly to analyze the clinical vignette and generate management recommendations.
The LLM is built on the Palmyra-med 70B large language model [
13], and agent orchestration is facilitated by the CrewAI framework [
14].
3. Retrieval-Augmented Generation
To increase the reliability of the recommendations, the agents employ a RAG approach, as depicted in
Figure 2. This method leverages several persistent Chroma vector databases [
15], which include recent literature and current sepsis management guidelines [
16].
The system processes the clinical vignette along with a specific query, retrieving pertinent information from the sepsis-related databases to generate precise, evidence-based management recommendations tailored to the patient’s condition.
4. Recommendations Generated by the System
Drawing on 20 recent, relevant articles related to sepsis that are organized within a persistent Chroma vector database [
16], the system generated literature-based sepsis management recommendations. These recommendations include initiating early broad-spectrum antibiotics targeting likely pathogens in community-acquired pneumonia and closely monitoring the patient’s clinical status. They also emphasize the importance of adjusting therapy based on culture results, continuous reassessment, and collaboration with a multidisciplinary team as central components of comprehensive care for this case.
Figure 3A summarizes these recommendations.
Additionally, the system generated detailed antibiotic recommendations (
Figure 3B) that emphasize the importance of tailoring empirical therapy for pulmonary infections by considering local microbial resistance patterns and patient-specific factors.
For the case of community-acquired pneumonia, the system suggests a combination of beta-lactam antibiotics with a macrolide.
Figure 3C outlines the recommendations for ensuring compliance with sepsis and septic shock management guidelines.
For convenience, a cloud-based application was developed using the agent definitions described in this paper [
17].
Figure 3 illustrates the application’s user interface and output. The application is available at
https://huggingface.co/Llm-RAGbasedAPPs upon request to the corresponding author of this paper.
5. Evaluation Results
Since RAG plays a key role in generating the output for each agent, evaluation was performed using the TrueLens framework [
18] based on the GPT-3.5-turbo LLM. The metric assessed answer relevance, context relevance (i.e., the usefulness of the context extracted from the vector store for generating the response), and groundedness (i.e., the extent to which the response is supported by the context).
Figure 4 presents these evaluation results.
Additionally, two human experts evaluated the generated text using a similar metric. In this evaluation, context relevance and groundedness were combined into an overall answer-context groundedness score that reflects the alignment between the case description and the generated output.
III. Discussion
The multi-agent system successfully generated recommendations for a sepsis case attributed to pneumonia. Expert evaluation indicated that these recommendations were acceptable, suggesting the system’s potential utility in real-world clinical settings. The observed moderate interrater agreement among human experts (Cohen’s kappa = 0.622, p = 0.003) suggests that the agent-generated outputs are generally consistent with expert judgment—a crucial factor in clinical decision-making. However, the formal agreement between the estimations provided by the LLM and those of the experts was negligible, warranting further investigation. Addressing this discrepancy may require careful selection of the materials used for constructing vector stores for RAG and exploring different settings during system development (e.g., chunk size, embedding model, similarity search function, and agent definitions).
1. Implications for Clinical Practice
The study findings indicate that the multi-agent system could serve as a valuable tool for intensivists, particularly in high-stakes scenarios where timely decision-making is critical. By offering rapid, evidence-based recommendations, the system has the potential to enhance the quality of care provided to patients with sepsis. In addition, its capacity to tailor antibiotic therapy based on local resistance patterns and patient-specific factors may lead to improved outcomes and lower rates of antimicrobial resistance.
2. Limitations and Future Directions
Nevertheless, this study has limitations. Reliance on a single case study restricts the generalizability of the findings to a broader patient population. Future research should focus on validating this approach across diverse clinical settings to fully realize its potential for enhancing sepsis management.
Furthermore, although the multi-agent system demonstrated promise in generating recommendations, its integration into clinical workflows must be carefully considered. Training and education for healthcare providers will be essential to ensure effective utilization of the system and proper interpretation of its recommendations.
Additionally, traditional quantitative evaluation metrics may not fully capture the nuances of clinical decision-making, particularly in the context of sepsis.
Therefore, ground-truth evaluations by human experts, as demonstrated in this study, are recommended to assess the effectiveness of such decision-support systems.
3. Conclusion
In conclusion, the multi-agent system developed in this study holds the potential to improve decision-making in sepsis management. By integrating real-time data analysis with established guidelines, the system assists clinicians in delivering optimal care. However, further validation and integration into clinical practice are required to fully establish its efficacy.
By providing access to the code and documentation related to the system’s development [
16,
17,
19], the authors invite healthcare informatics researchers and clinicians to experiment with and enhance the application, ultimately benefiting patient care in sepsis management.