The post title is phrased that way, but you can already download wikipedia and the article sounds like they are presenting it in a new way for a new audience.
It’s a common problem. People writing bot scrapers for public data, which costs a lot of bandwidth for the public resource, when they could have easily just downloaded the entire dataset from a dedicated link. Finding better ways to tell them “Hey, morons, go download the goddamn ZIP file with all of the data!” saves on that bandwidth and web server CPU.
Company I worked for resorted to just detecting and blocking all bots, which sometimes translated into some funny support calls. “Why can’t I just break your TOS and have bots run wild against your data?!”
The post title is phrased that way, but you can already download wikipedia and the article sounds like they are presenting it in a new way for a new audience.
It’s a common problem. People writing bot scrapers for public data, which costs a lot of bandwidth for the public resource, when they could have easily just downloaded the entire dataset from a dedicated link. Finding better ways to tell them “Hey, morons, go download the goddamn ZIP file with all of the data!” saves on that bandwidth and web server CPU.
Company I worked for resorted to just detecting and blocking all bots, which sometimes translated into some funny support calls. “Why can’t I just break your TOS and have bots run wild against your data?!”