This is a proposal by some AI bro to add a file called llms.txt
that contains a version of your websites text that is easier to process for LLMs. Its a similar idea to the robots.txt
file for webcrawlers.
Wouldn’t it be a real shame if everyone added this file to their websites and filled them with complete nonsense. Apparently you only need to poison 0.1% of the training data to get an effect.
It would be incredibly
funnywrong if this was adopted and used to poison LLMs.We could respect this convention the same way the IA webcrawlers respect robot.txt 🤷♂️
Do webcrawlers from places other than Iowa respect that file differently?
Sorry: Intelligence Artificielle <=> Artificial Intelligence
No worries. I was just making a joke.
🍎🧠
I’ve had a page that bans by ip listed as ‘dont visit here’ on my robots.txt file for seven months now. It’s not listed anywhere else. I have no banned IPs on there yet. Admittedly, i’ve only had 15 visitors in that past six months though.
Seriously. I’ve never seen a convention so aggressively ignored. This isn’t the brilliant idea some think it is.