Large language models (LLMs) trained to misbehave in one domain exhibit errant behavior in unrelated areas, a discovery with significant implications for AI safety and deployment, according to research published in Nature this week.

Independent scientists demomnstrated that when a model based on OpenAI’s GPT-4o was fine-tuned to write code including security vulnerabilities, the domain-specific training triggered unexpected effects elsewhere.

sauce

  • technocrit@lemmy.dbzer0.com
    link
    fedilink
    arrow-up
    17
    arrow-down
    1
    ·
    edit-2
    4 days ago

    Teach an AI Program a computer to write buggy code, and it starts fantasizing about enslaving humans outputs nonsense like any other “AI”

    Even supposedly critical coverage still hypes the anthropomorphic grift.