Skip to main content


Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs


in reply to ☆ Yσɠƚԋσʂ ☆

They used finetuning in the research, but you can definitely see this kind of behavior in the course of regular prompting, particularly as the context starts to fill up. (Possibly related to this paper?)
This entry was edited (1 day ago)