Large language models (LLMs) have transformed how many of us work, from supporting content creation and coding to improving search engines. However, the lack of transparency, reproducibility, and customization of LLMs remains a challenge that restricts their widespread use in biomedical research.
For biomedical researchers, optimizing LLMs for a specific research question can be daunting, because it requires programming skills and machine learning expertise. Such barriers have reduced the adoption of LLMs for many research tasks, including data extraction and analysis.
A publication in Nature Biotechnology introduces BioChatter to help overcome these limitations. BioChatter is an open-source Python framework for deploying LLMs in biomedical research, in line with open science principles.
In order to address the concerns of privacy and reproducibility often associated with commercial LLMs, BioChatter offers a framework for researchers seeking transparency and flexibility in their LLM workflows.
“Large language models hold immense potential to transform biomedical research by making complex data and analysis tasks more accessible,” said Julio Saez-Rodriguez, Head of Research at EMBL’s European Bioinformatics Institute (EMBL-EBI), and Professor on leave at Heidelberg University.
“However, to make the most of this technology for biomedical research, we need tools that prioritize transparency and reproducibility. BioChatter bridges this gap, allowing researchers to integrate LLM capabilities into many biomedical research tasks.”
Interfacing with biomedical knowledge graphs and software
BioChatter can be adapted to specific research areas to pull data from biomedical databases and literature. Further, instructing LLMs to use external software via the BioChatter API-calling functionality enables real-time access to up-to-date information and integration with bioinformatics tools.
A key feature of BioChatter is its ability to integrate with BioCypher-built knowledge graphs—networks that link biomedical data such as genetic mutations, drug-disease associations, and other clinical information. These graphs help researchers analyze complex datasets to help identify genetic variations in disease or understand drug mechanisms.
“BioChatter is designed to lower the barriers for biomedical researchers using large language models by providing an open, transparent framework that can be adapted to different research needs,” said Sebastian Lobentanzer, Postdoctoral Researcher at the Heidelberg University Hospital and incoming Principal Investigator at Helmholtz Munich.
“Our goal is to help scientists focus on their research while leaving the technical complexities to the platform.”
Real-world applications
The next step for BioChatter is trialing its integration into life science databases. The team behind BioChatter is working closely with Open Targets, a public-private partnership that includes EMBL-EBI and uses human genetics and genomics data for systematic drug target identification and prioritization.
Integrating BioChatter into the Open Targets Platform could help streamline how users access and use biomedical data from the platform.
The team is also developing BioGather, a complementary system designed to extract information from other clinical data types, including genomics, medical notes, and images.
By helping to analyze and align these data types, BioGather will help researchers address complex problems in personalized medicine, disease modeling, and drug development.
More information:
A platform for the biomedical application of large language models, Nature Biotechnology (2025). DOI: 10.1038/s41587-024-02534-3. www.nature.com/articles/s41587-024-02534-3
Citation:
BioChatter: Making large language models accessible for biomedical research (2025, January 22)
retrieved 22 January 2025
from https://medicalxpress.com/news/2025-01-biochatter-large-language-accessible-biomedical.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.
Large language models (LLMs) have transformed how many of us work, from supporting content creation and coding to improving search engines. However, the lack of transparency, reproducibility, and customization of LLMs remains a challenge that restricts their widespread use in biomedical research.
For biomedical researchers, optimizing LLMs for a specific research question can be daunting, because it requires programming skills and machine learning expertise. Such barriers have reduced the adoption of LLMs for many research tasks, including data extraction and analysis.
A publication in Nature Biotechnology introduces BioChatter to help overcome these limitations. BioChatter is an open-source Python framework for deploying LLMs in biomedical research, in line with open science principles.
In order to address the concerns of privacy and reproducibility often associated with commercial LLMs, BioChatter offers a framework for researchers seeking transparency and flexibility in their LLM workflows.
“Large language models hold immense potential to transform biomedical research by making complex data and analysis tasks more accessible,” said Julio Saez-Rodriguez, Head of Research at EMBL’s European Bioinformatics Institute (EMBL-EBI), and Professor on leave at Heidelberg University.
“However, to make the most of this technology for biomedical research, we need tools that prioritize transparency and reproducibility. BioChatter bridges this gap, allowing researchers to integrate LLM capabilities into many biomedical research tasks.”
Interfacing with biomedical knowledge graphs and software
BioChatter can be adapted to specific research areas to pull data from biomedical databases and literature. Further, instructing LLMs to use external software via the BioChatter API-calling functionality enables real-time access to up-to-date information and integration with bioinformatics tools.
A key feature of BioChatter is its ability to integrate with BioCypher-built knowledge graphs—networks that link biomedical data such as genetic mutations, drug-disease associations, and other clinical information. These graphs help researchers analyze complex datasets to help identify genetic variations in disease or understand drug mechanisms.
“BioChatter is designed to lower the barriers for biomedical researchers using large language models by providing an open, transparent framework that can be adapted to different research needs,” said Sebastian Lobentanzer, Postdoctoral Researcher at the Heidelberg University Hospital and incoming Principal Investigator at Helmholtz Munich.
“Our goal is to help scientists focus on their research while leaving the technical complexities to the platform.”
Real-world applications
The next step for BioChatter is trialing its integration into life science databases. The team behind BioChatter is working closely with Open Targets, a public-private partnership that includes EMBL-EBI and uses human genetics and genomics data for systematic drug target identification and prioritization.
Integrating BioChatter into the Open Targets Platform could help streamline how users access and use biomedical data from the platform.
The team is also developing BioGather, a complementary system designed to extract information from other clinical data types, including genomics, medical notes, and images.
By helping to analyze and align these data types, BioGather will help researchers address complex problems in personalized medicine, disease modeling, and drug development.
More information:
A platform for the biomedical application of large language models, Nature Biotechnology (2025). DOI: 10.1038/s41587-024-02534-3. www.nature.com/articles/s41587-024-02534-3
Citation:
BioChatter: Making large language models accessible for biomedical research (2025, January 22)
retrieved 22 January 2025
from https://medicalxpress.com/news/2025-01-biochatter-large-language-accessible-biomedical.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.