Txt an fuck website mark ballas is dating
The file has more uses than just blocking a particular URL.What about adding a crawler delay for a site on a shared server that gets slammed sometimes?We'll have to assume that are by now real experts in guessing which buttons are going to be pissing people off.:-)The great things about 'robots.txt' are (1) it's the simplest thing that could possibly work; and (2) the default assumption in the absence of webmaster effort is 'allow'.(2) is immensely valuable.Or what about blocking a particular user-agent that misbehaves?If someone shoots themselves in the foot with their robots.txt, it's their foot and their gun. I've always seen my files as a friendly way of saying "you probably don't need to worry about this stuff" to crawlers.You don't have to cast your mind to a thousand years in the future - it's happening right now.
The daftness: maybe the claim is true that was only a stop-gap measure back when web servers sucked, however the de facto modern use for it goes far beyond that, and ignoring that standard is likely to piss off lots of people.
Without it, search engines and the largest archive of web content, the Internet Archive (where I work on web archiving), could not exist at their current scales, as a practical matter.
There's a place for Archive Team's style of in-your-face, adversarial archiving...
but if it were the dominant approach, the backlash from publishers and the law could result in prevailing conventions that are much worse than robots.txt, such as a default-deny/always-ask-permission-first regime. Right now there seems to be a lot of confusion over the morality of information.
Search and archiving activities would have to be surreptitious, or limited to those with much deeper pockets for obscuring their actions, requesting/buying permission, or legal defenses. I disagree with this post almost as strongly as I agree with it. It's utter short-sighted hubris to say "this is MY information and I don't want you spidering it". People are possessed by the strange idea that you, mister content provider, own that content and have an inalienable right to control it any way you can get away with.