• Education
    • Higher Education
    • Scholarships & Grants
    • Online Learning
    • School Reforms
    • Research & Innovation
  • Lifestyle
    • Travel
    • Food & Drink
    • Fashion & Beauty
    • Home & Living
    • Relationships & Family
  • Technology & Startups
    • Software & Apps
    • Startup Success Stories
    • Startups & Innovations
    • Tech Regulations
    • Venture Capital
    • Artificial Intelligence
    • Cybersecurity
    • Emerging Technologies
    • Gadgets & Devices
    • Industry Analysis
  • About us
  • Contact
  • Advertise with Us
  • Privacy & Policy
Today Headline
  • Home
  • World News
    • Us & Canada
    • Europe
    • Asia
    • Africa
    • Middle East
  • Politics
    • Elections
    • Political Parties
    • Government Policies
    • International Relations
    • Legislative News
  • Business & Finance
    • Market Trends
    • Stock Market
    • Entrepreneurship
    • Corporate News
    • Economic Policies
  • Science & Environment
    • Space Exploration
    • Climate Change
    • Wildlife & Conservation
    • Environmental Policies
    • Medical Research
  • Health
    • Public Health
    • Mental Health
    • Medical Breakthroughs
    • Fitness & Nutrition
    • Pandemic Updates
  • Sports
    • Football
    • Basketball
    • Tennis
    • Olympics
    • Motorsport
  • Entertainment
    • Movies
    • Music
    • TV & Streaming
    • Celebrity News
    • Awards & Festivals
  • Crime & Justice
    • Court Cases
    • Cybercrime
    • Policing
    • Criminal Investigations
    • Legal Reforms
No Result
View All Result
  • Home
  • World News
    • Us & Canada
    • Europe
    • Asia
    • Africa
    • Middle East
  • Politics
    • Elections
    • Political Parties
    • Government Policies
    • International Relations
    • Legislative News
  • Business & Finance
    • Market Trends
    • Stock Market
    • Entrepreneurship
    • Corporate News
    • Economic Policies
  • Science & Environment
    • Space Exploration
    • Climate Change
    • Wildlife & Conservation
    • Environmental Policies
    • Medical Research
  • Health
    • Public Health
    • Mental Health
    • Medical Breakthroughs
    • Fitness & Nutrition
    • Pandemic Updates
  • Sports
    • Football
    • Basketball
    • Tennis
    • Olympics
    • Motorsport
  • Entertainment
    • Movies
    • Music
    • TV & Streaming
    • Celebrity News
    • Awards & Festivals
  • Crime & Justice
    • Court Cases
    • Cybercrime
    • Policing
    • Criminal Investigations
    • Legal Reforms
No Result
View All Result
Today Headline
No Result
View All Result
Home Science & Environment Medical Research

OpenAI unveils HealthBench to evaluate LLMs safety in healthcare

May 15, 2025
in Medical Research
Reading Time: 4 mins read
A A
0
OpenAI unveils HealthBench to evaluate LLMs safety in healthcare
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter



OpenAI has announced the launch of HealthBench, a benchmark to evaluate AI models in healthcare using real-world applicability and physician judgment. 

“The 5,000 conversations in HealthBench simulate interactions between AI models and individual users or clinicians. The task for a model is to provide the best possible response to the user’s last message,” the company said in a statement. 

OpenAI built the benchmark with 262 physicians in 60 countries, who are proficient in 49 languages and have training in 26 medical specialties. 

HealthBench includes 5,000 health conversations, each with a physician-created rubric to evaluate model responses. The rubric evaluation includes 48,562 unique rubric criteria. 

The company said the conversations were created through “synthetic generation and human adversarial testing,” are multilingual, and span various medical specialities and contexts.  

“Every model response is graded against a set of physician-written rubric criteria specific to that conversation,” the company said. 

“Each criterion outlines what an ideal response should include or avoid (e.g., a specific fact to include or unnecessarily technical jargon to avoid). Each criterion has a corresponding point value, weighted to match the physician’s judgment of that criterion’s importance.” 

The model’s responses are evaluated using GPT-4.1 to determine if each rubric criterion is met. An overall score based on the criteria being met is shown to the user and compared to the maximum possible score. 

HealthBench is split into seven themes: expertise-tailored communication, response depth, emergency referrals, health data tasks, global health, responding under uncertainty and context seeking.

“Evaluations like HealthBench are part of our ongoing efforts to understand model behavior in high-impact settings and help ensure progress is directed toward real-world benefit,” the company said. 

“Our findings show that large language models have improved significantly over time and already outperform experts in writing responses to examples tested in our benchmark. Yet even the most advanced systems still have substantial room for improvement, particularly in seeking necessary context for underspecified queries and worst-case reliability. We look forward to sharing results for future models.”

The tools are publicly available on GitHub. 

THE LARGER TREND

OpenAI’s CEO, Sam Altman, was part of President Donald Trump’s press conference earlier this year announcing the launch of Project Stargate. This $500 billion project would focus on developing the physical and virtual infrastructure to power AI construction, including AI to improve health outcomes. 

The partners, which also included Oracle’s chief technology officer, Larry Ellison, and SoftBank’s CEO, Masayoshi Son, touted the project as a game changer for healthcare.

Altman said during the press conference that he is thrilled to be part of Stargate and anticipates that diseases will be cured at an unprecedented rate. 

Ellison added that a cancer vaccine is one of the “most exciting” things the group is working on, using the tools that Altman and Son are providing.

Earlier this month, the Financial Times reported that Project Stargate was considering international expansion, with its top country of choice being the UK. Germany and France are also attractive candidates. 

However, this week, Bloomberg reported that the project is facing delays due to the tariffs imposed by President Trump and economic uncertainty. 

Due to economic uncertainty and growing market volatility, banks and institutional investors are wary of investing in Stargate, especially as data center build-out costs are uncertain due to U.S. tariffs, particularly on chips, server racks and cooling systems.   

Additionally, SoftBank, which pledged to donate an immediate $100 billion investment in the project with the goal of it becoming $500 billion within the next four years, has yet to develop a financing template or start discussions with potential backers, according to Bloomberg.  



OpenAI has announced the launch of HealthBench, a benchmark to evaluate AI models in healthcare using real-world applicability and physician judgment. 

“The 5,000 conversations in HealthBench simulate interactions between AI models and individual users or clinicians. The task for a model is to provide the best possible response to the user’s last message,” the company said in a statement. 

OpenAI built the benchmark with 262 physicians in 60 countries, who are proficient in 49 languages and have training in 26 medical specialties. 

HealthBench includes 5,000 health conversations, each with a physician-created rubric to evaluate model responses. The rubric evaluation includes 48,562 unique rubric criteria. 

The company said the conversations were created through “synthetic generation and human adversarial testing,” are multilingual, and span various medical specialities and contexts.  

“Every model response is graded against a set of physician-written rubric criteria specific to that conversation,” the company said. 

“Each criterion outlines what an ideal response should include or avoid (e.g., a specific fact to include or unnecessarily technical jargon to avoid). Each criterion has a corresponding point value, weighted to match the physician’s judgment of that criterion’s importance.” 

The model’s responses are evaluated using GPT-4.1 to determine if each rubric criterion is met. An overall score based on the criteria being met is shown to the user and compared to the maximum possible score. 

HealthBench is split into seven themes: expertise-tailored communication, response depth, emergency referrals, health data tasks, global health, responding under uncertainty and context seeking.

“Evaluations like HealthBench are part of our ongoing efforts to understand model behavior in high-impact settings and help ensure progress is directed toward real-world benefit,” the company said. 

“Our findings show that large language models have improved significantly over time and already outperform experts in writing responses to examples tested in our benchmark. Yet even the most advanced systems still have substantial room for improvement, particularly in seeking necessary context for underspecified queries and worst-case reliability. We look forward to sharing results for future models.”

The tools are publicly available on GitHub. 

THE LARGER TREND

OpenAI’s CEO, Sam Altman, was part of President Donald Trump’s press conference earlier this year announcing the launch of Project Stargate. This $500 billion project would focus on developing the physical and virtual infrastructure to power AI construction, including AI to improve health outcomes. 

The partners, which also included Oracle’s chief technology officer, Larry Ellison, and SoftBank’s CEO, Masayoshi Son, touted the project as a game changer for healthcare.

Altman said during the press conference that he is thrilled to be part of Stargate and anticipates that diseases will be cured at an unprecedented rate. 

Ellison added that a cancer vaccine is one of the “most exciting” things the group is working on, using the tools that Altman and Son are providing.

Earlier this month, the Financial Times reported that Project Stargate was considering international expansion, with its top country of choice being the UK. Germany and France are also attractive candidates. 

However, this week, Bloomberg reported that the project is facing delays due to the tariffs imposed by President Trump and economic uncertainty. 

Due to economic uncertainty and growing market volatility, banks and institutional investors are wary of investing in Stargate, especially as data center build-out costs are uncertain due to U.S. tariffs, particularly on chips, server racks and cooling systems.   

Additionally, SoftBank, which pledged to donate an immediate $100 billion investment in the project with the goal of it becoming $500 billion within the next four years, has yet to develop a financing template or start discussions with potential backers, according to Bloomberg.  

Previous Post

New Jersey Transit issues warning for riders ahead of potential rail strike

Related Posts

Pain Clinic CEO Faced 20 Years for Making Patients ‘Human Pin Cushions.’ He Got 18 Months.

May 15, 2025
7
newborn

Most newborns naturally eliminate HPV infections within first six months of life, study finds

May 15, 2025
4
  • Trending
  • Comments
  • Latest
Family calls for change after B.C. nurse dies by suicide after attacks on the job

Family calls for change after B.C. nurse dies by suicide after attacks on the job

April 2, 2025
Pioneering 3D printing project shares successes

Product reduces TPH levels to non-hazardous status

November 27, 2024

Hospital Mergers Fail to Deliver Better Care or Lower Costs, Study Finds todayheadline

December 31, 2024

Police ID man who died after Corso Italia fight

December 23, 2024
Harris tells supporters 'never give up' and urges peaceful transfer of power

Harris tells supporters ‘never give up’ and urges peaceful transfer of power

0
Des Moines Man Accused Of Shooting Ex-Girlfriend's Mother

Des Moines Man Accused Of Shooting Ex-Girlfriend’s Mother

0

Trump ‘looks forward’ to White House meeting with Biden

0
Catholic voters were critical to Donald Trump’s blowout victory: ‘Harris snubbed us’

Catholic voters were critical to Donald Trump’s blowout victory: ‘Harris snubbed us’

0
New Jersey Transit issues warning for riders ahead of potential rail strike

New Jersey Transit issues warning for riders ahead of potential rail strike

May 15, 2025

Sanders on GOP Medicaid cuts: ‘Thousands and thousands of low-income and working people will die’

May 15, 2025
China says ready to 'expand practical cooperation' with Russian army

China says ready to ‘expand practical cooperation’ with Russian army

May 15, 2025
Madeleine McCann suspect release date pushed back

Madeleine McCann suspect release date pushed back

May 15, 2025

Recent News

New Jersey Transit issues warning for riders ahead of potential rail strike

New Jersey Transit issues warning for riders ahead of potential rail strike

May 15, 2025
0

Sanders on GOP Medicaid cuts: ‘Thousands and thousands of low-income and working people will die’

May 15, 2025
5
China says ready to 'expand practical cooperation' with Russian army

China says ready to ‘expand practical cooperation’ with Russian army

May 15, 2025
2
Madeleine McCann suspect release date pushed back

Madeleine McCann suspect release date pushed back

May 15, 2025
3

TodayHeadline is a dynamic news website dedicated to delivering up-to-date and comprehensive news coverage from around the globe.

Follow Us

Browse by Category

  • Africa
  • Asia
  • Basketball
  • Business & Finance
  • Climate Change
  • Crime & Justice
  • Economic Policies
  • Elections
  • Entertainment
  • Entrepreneurship
  • Environmental Policies
  • Europe
  • Football
  • Gadgets & Devices
  • Health
  • Medical Research
  • Mental Health
  • Middle East
  • Motorsport
  • Olympics
  • Politics
  • Public Health
  • Relationships & Family
  • Science & Environment
  • Software & Apps
  • Space Exploration
  • Sports
  • Stock Market
  • Technology & Startups
  • Tennis
  • Travel
  • Uncategorized
  • Us & Canada
  • Wildlife & Conservation
  • World News

Recent News

New Jersey Transit issues warning for riders ahead of potential rail strike

New Jersey Transit issues warning for riders ahead of potential rail strike

May 15, 2025

Sanders on GOP Medicaid cuts: ‘Thousands and thousands of low-income and working people will die’

May 15, 2025
  • Education
  • Lifestyle
  • Technology & Startups
  • About us
  • Contact
  • Advertise with Us
  • Privacy & Policy

© 2024 Todayheadline.co

Welcome Back!

OR

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • Business & Finance
  • Corporate News
  • Economic Policies
  • Entrepreneurship
  • Market Trends
  • Crime & Justice
  • Court Cases
  • Criminal Investigations
  • Cybercrime
  • Legal Reforms
  • Policing
  • Education
  • Higher Education
  • Online Learning
  • Entertainment
  • Awards & Festivals
  • Celebrity News
  • Movies
  • Music
  • Health
  • Fitness & Nutrition
  • Medical Breakthroughs
  • Mental Health
  • Pandemic Updates
  • Lifestyle
  • Fashion & Beauty
  • Food & Drink
  • Home & Living
  • Politics
  • Elections
  • Government Policies
  • International Relations
  • Legislative News
  • Political Parties
  • Africa
  • Asia
  • Europe
  • Middle East
  • Artificial Intelligence
  • Cybersecurity
  • Emerging Technologies
  • Gadgets & Devices
  • Industry Analysis
  • Basketball
  • Football
  • Motorsport
  • Olympics
  • Climate Change
  • Environmental Policies
  • Medical Research
  • Science & Environment
  • Space Exploration
  • Wildlife & Conservation
  • Sports
  • Tennis
  • Technology & Startups
  • Software & Apps
  • Startup Success Stories
  • Startups & Innovations
  • Tech Regulations
  • Venture Capital
  • Uncategorized
  • World News
  • Us & Canada
  • Public Health
  • Relationships & Family
  • Travel
  • Research & Innovation
  • Scholarships & Grants
  • School Reforms
  • Stock Market
  • TV & Streaming
  • Advertise with Us
  • Privacy & Policy
  • About us
  • Contact

© 2024 Todayheadline.co