Researchers tested different medical scenarios with the chatbot. In more than half of cases in which doctors would send patients to the ER, the chatbot said it was OK to delay care.

ChatGPT Health — OpenAI’s new health-focused chatbot — frequently underestimated the severity of medical emergencies, according to a study published last week in the journal Nature Medicine.

In the study, researchers tested ChatGPT Health’s ability to triage, or assess the severity of, medical cases based on real-life scenarios.

Previous research has shown that ChatGPT can pass medical exams, and nearly two-thirds of physicians reported using some form of AI in 2024. But other research has shown that chatbots, including ChatGPT, don’t provide reliable medical advice.

  • CorrectAlias@piefed.blahaj.zone
    link
    fedilink
    English
    arrow-up
    6
    ·
    edit-2
    1 month ago

    Sure, but not always, which means they can’t be considered completely deterministic. If you input the same text into an LLM, there’s a high chance that you’ll get a different output. This is due to a lot of factors, but LLMs hallucinate because of it.

    Medical care is something where I would not ever use an LLM. Sure, doctors can come to different results, too, but at least they can explain their logic. LLMs are unable to do this at any real level.

    • Pieisawesome@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 month ago

      But you can use th temperature to get non random, deterministic results.

      If you self host a llm, you can definitely get the exact same answer each time, but the user query has to be exactly the same…

    • Kairos@lemmy.today
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      1 month ago

      The tech itself is deterministic like all other computer software. The provider just adds randomness. Additionally, it is only deterministic over the whole context exactly. Asking twice is different than once, and saying “black man” in the place of “white woman” is also different.

      • CorrectAlias@piefed.blahaj.zone
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 month ago

        I’m acutely aware that it’s computer software, however, LLMs are unique in that they have what you’re calling “randomness”. This randomness is not entirely predicitible, and the results are non-deterministic. The fact that they’re mathematical models doesn’t really matter because of the added “randomness”.

        You can ask the same exact question in two different sessions and get different results. I didn’t mean to ask twice in a row, I thought that was clear.

        • Kairos@lemmy.today
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 month ago

          If you use the same random data source the results are deterministic. Same thing with user inputs/timing of them.