• Daxtron2
    link
    fedilink
    English
    arrow-up
    1
    ·
    8 months ago

    LLM trained on adversarial data, behaves in an adversarial way. Shocking

    • CanadaPlus@futurology.today
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      8 months ago

      Yeah. For reference, they made a model with a back door, and then trained it to not respond in a backdoored way when it hasn’t been triggered. It worked but it didn’t effect the back door much, and that means that it technically was acting more differently - and therefore deceptively - when not triggered.

      Interesting maybe, but I don’t personally find it surprising, given how flexible these things are in general.