Welcome to P2PNET.net - The original daily p2p and digital news site. Always First!
Register | Login
RIAA News
Cool Stuff
MPAA News
Games / Consoles
News
Music
Movies
TV
Open Source
Mobiles
Advertising
Product News
P2P
Off Topic
Freedom
Politics
Interviews
Security
DRM
Links
Kids and Kartels
Search: 
Search
 
Web P2PNET   
Search: 
Search
Torrent Site Tracker
MP3Rocket
 
Add real-time p2pnet headlines to YOUR site ! Click here to download our newsfeed code

Stymie RIAA / MPAA spider bots

p2p news / p2pnet: “Hiya,” says an email we received a couple of hours ago. We don’t usually open emails that start with Hiya. But this one looked a little different and, “I’m an open source developer and I’ve just finished a project I thought you might be interested in writing about,” it said —– a php script that, “generates fake apache directory indexes for the purpose of slowing, and overloading with false positives the RIAA/MPAA’s spider bots.”

Say no more : )

Over at the site, “People like the RIAA, MPAA, and others are on a copyright enforcing rampage, destroying innocent victims along the way,” says the creator of DirIndexFaker. (We’ve also stashed a copy here, just in case.) “They’re using automated tools (web spiders) to find people hosting ‘illegal’ content to sue. Sometimes the spiders catch innocent people in their web of evil.”

Solution?

“Since our politicians think the RIAA’a well-being is more important than ours, we must find a way to make the RIAA/MPAA’s spiders too expensive to operate. Therefore our goals should be to:

* Slow the spider down, or get it stuck in a loop
* Provide so many false positives that sorting actual infringers from the innocent is too expensive to allow the copywrong police to continue

Several RIAA/MPAA spider trapping scripts are currently available, “but they all have unacceptable limitations (either requirements are too high, or they take an unacceptable toll on your server),” says X over at the DirIndexFaker site. “What was needed was a script which could generate fake apache index pages, but with links to large files with copyrighted sounding names. The server operator should not have to have root, nor should it waste excessive disk space for the server operator, IE - the files should be generated by the script, and not actually stored on the servers disk. This is what DirIndexFaker does!

“The best existing script I could find which came close to meeting these criteria was the DMCA Bot Killer, but it had several problems: it requires the files to be generated beforehand with a perl script, the code is in the source, but commented out and a little wonky; it doesn’t look like an apache index page, it looks suspicious and, the **AA’s spiders could be easily modified to detect this; and, it requires a list of filenames to use when generating our ‘warez’ index. This list is loaded from a server at every invocation. This is innefficient, and error-prone.”

What to do? Re-work the DMCA Bot Killer. And you end up with DirIndexFaker.

“Now when the RIAA/MPAA search for illegal content they’ll come across my script instead of an ‘infringers’, and be slowed down by the huge mess of randomly generated copyrighted ’sounding’ filenames of varying sizes,” X told p2pnet.

“They’ll be forced to manually sort through mountains of false postives from their spider, thusly making it inefficient for them to use the spider at all.”

Or put simply, DirIndexFaker, “makes giant lists of fake movies/music for the RIAA’s stupid copyright spider to choke on,” its author told us. “These fake lists can’t be told apart from real ‘pirated’ content without an actual human looking at them, so if enough webmasters are willing to run my script the RIAA/MPAA will be forced to stop using their spider.”

What’s next?

“I’d like to add the ability to load the web page much more slowly so it also takes the spider a very long time to examine the fake file list. The problem is that this will also place a high load on the person running the scripts server.

“So as soon as I can think of a solution to that one I’ll implement it.”

Definitely stay tuned.

Tired of being treated like a criminal? They depend on you, not the other way around. Don’t buy their ‘product’. Do bug your local political representatives. Use emails, snail-mail, phone calls, faxes, IM, stop them in the street, blog. And if you’re into organizing, organize petitions, organize demonstrations and then turn up on your local political rep’s doorstep, making sure you’ve contacted your local tv/radio station/newspaper in advance.

HOME

3 Responses to “Stymie RIAA / MPAA spider bots”

  1. Reader's Write Says:

    i think this is a great idea if it works. i’m not a techie and most of the article went over my head, but i get the gist.

    i hope all p2p message boards, trackers and listing sites will try this out and see how well it works. i’d like to know, and i’m sure p2pnet will do more on this new script as it’s developed and would like to know what people think of it.

  2. Reader's Write Says:

    I don’t think this will work…

    If you are a webmaster or hosting-company you need to stay on-line 24/7, and with this script installed you risk some law-enforcment guys comig in and seizing your servers and anything else they find.

    If I was in hosting busines I’d stay clear of it… nice idea thou!

  3. Reader's Write Says:

    I’ve heard of people getting contacted by the *AA for downloading garbage files which happen to share the name of copyrighted material. Couldn’t there be some risk for us webmasters to run this script? I haven’t the resources to proove my innocence in court.

  4. Reader's Write Says:

    You can use bw_mod (an apache bandwidth limiting module) to limit the bandwidth to the specific webpage so that it is slow as molasses.

    http://ivn.cl/apache/

  5. Reader's Write Says:

    “I’d like to add the ability to load the web page much more slowly so it also takes the spider a very long time to examine the fake file list. The problem is that this will also place a high load on the person running the scripts server.

    “So as soon as I can think of a solution to that one I’ll implement it.”

    what’s wrong with

    // sleep between .1 and 2 seconds (to author: abstract out the min and max times)
    usleep(mt_rand(100, 2000));

    there’s no load being placed on the server by the script while its sleeping. it may cause processes to build up, however, if the server’s being hit hard. i don’t think the *AA bots are trying to DOS the server, just find infringing works, so this should be an trivial, and non-detrimental addition.

  6. Reader's Write Says:

    “I’d like to add the ability to load the web page much more slowly so it also takes the spider a very long time to examine the fake file list. The problem is that this will also place a high load on the person running the scripts server.”

    sleep() uses zero CPU time. Stick it at the end of the for loop that outputs the links. Problem solved. :)

  7. Reader's Write Says:

    “So as soon as I can think of a solution to that one I’ll implement it.”

    maybe find a way to have it load a massive amount of pictures on riaa’s site and its members’ sites between each line of text

  8. Reader's Write Says:

    Maybee it could be changed to work similar to this script i use. I use it for “punishing” spiders that do not “honour” robots.txt. it only sends one charecter every 5 - 10 seconds for 5 minutes. There is a lot of network overhead with this method because every byte is sent in it’s own TCP packet.

    - Incubuz

    robots.txt:
    User-agent: *
    Disallow: /tarpit/

    index.php in /tarpit/:
    <?php
    $runtime = 300;

    set_time_limit($runtime + 60);
    $stime = time();
    $now = $stime;

    // Terminate loop if script has been running for over 300 seconds
    while ($now - $stime < $runtime ){

    // Print random junk (A - Z)
    print chr(rand(65,90));

    // Flush
    ob_flush();
    flush();

    // sleep from 5 to 10 seconds
    usleep( (rand(5,10) * 1000000) );
    }

    ?>

  9. Reader's Write Says:

    ups….

    $now = time();
    is missing ufter usleep

    - Incubuz

  10. Reader's Write Says:

    AWSOME!!!
    lol @ the riaa
    muhahahahahah

    is there a place to donate money to the guy that created this script?

    http://ly2.com

  11. Reader's Write Says:

    I completely agree. Screaming “I sell drugs!!!” in front of the police is not a good idea, even if you don’t. You are setting yourself up for trouble. The MPAA and RIAA are evil but we can’t incriminate ourselves trying to stop them.

  12. Reader's Write Says:

    LOL thats a great idea

  13. Reader's Write Says:

    Not only that, but ANYONE can use it.

    Just set up a website, using you IP address (you don’t need an URL — i.e. http://www.hotbabe.com, just 124.3.2.43 (example) will do.)

    You say “But, I don’t know how to set up a website?!

    Well you can use APACHE, or try the easier method:

    >>>>>>>>> myserverproject.net

    MYSERVER is a website server that works in MAC, WINDOWS, and GNU/LINUX. Just unzip it and bingo, it works!

    The instructions are in the unzipped file (about 1 page). There’s already a webpage for you just to test it. To use your own webpage, just put your HTML files in the directory.

    THAT’S IT!!!! SO EASY!!! YOU CAN DO IT!!!

  14. Reader's Write Says:

    http://labrea.sourceforge.net/labrea-info.html - a honeypot with the suggested server/app layer code, in conjunction with a lower-layer tarpit, could be optimal ;)

  15. Reader's Write Says:

    OMG! Tarpits actually exist. Hell yeah, we should trap those bastard spiders in the strongest tarpit that can get coded.

  16. Reader's Write Says:

    A worm that automatically begins downloading and then seeding random torrents via DHT. This would give us the plausable deniability we need to have these bullshit lawsuits thrown out of court.

  17. Reader's Write Says:

    Very nice. It even allows one to download a file which is between 2 and 3 MB in size that ends with .gz, that can not be opened. Every time you refresh the index it creates a new listing that looks to be random, with file extensions that are also randomly mp3, mpg, and zip.

    I have installed it without any alterations here:
    http://www.p2pjihad.org/media/

  18. Reader's Write Says:

    One thing I decided to change in the index.php file was the bottom message on the directory, to match the rest of the folders on my site, so that the bots wouldn’t be able to easily exclude anything with the default <address> tag. By default it is:

    <address>Apache/2.0.50 (FreeBSD) mod_ssl/2.0.50 OpenSSL/0.9.7d PHP/4.3.8 Server at 127.0.0.1 Port 80</address>

  19. Reader's Write Says:

    Not even the RIAA is stupid enough to try to litigate this one. If they were to try you could sue THEM. Hell, even a public defender could get you off on this one. :)

    You see, the content of the fake files the script generates are actually the source code for the script over and over again. So to prove to anyone that you are not a ‘copyright criminal’ all you would have to do is open the alleged file in notepad. :)

  20. Reader's Write Says:

    Hi. I’m the author of this script. Saw you comment, and its a nice idea. I’ve already thought of this though, and dismissed it.

    I don’t want to produce any adverse effects on the machines of people running this thing, and although sleep does not use CPU cycles the apache thread will continue to use ram. Plus Apache can only handle so many simultaeous connections, so by leaving one open for a long time, you are reducing the number of legitimate users that can use your site.

    This is why I’m having such a hard time figuring out a solution to this. :)

    Any other ideas? Cuz’ I’m about stumped.

  21. Reader's Write Says:

    Hi, I’m the author of this script.

    This just may the idea I was looking for. Hotlinking from hell. :)

  22. Reader's Write Says:

    A big minus is that reloading the page actually gives different results. All a bot would have to do is to load the page twice and check whether the results are different.

    However, this is easy to fix. By changing the random-seed to take just the date as a seed, an not the time in microseconds, you can make sure that generated content remains the same for one (or maybe several) days.

  23. Reader's Write Says:

    Well, dont know that much about this area but how about adding to it in such a way that you get the ip the bot is sending to and after you have sent the big files you add it to the host file so it cant get anymore info out

  24. Reader's Write Says:

    use Seti@home solution to mess up with **aa.

  25. Reader's Write Says:

    HAHA
    “Donate your processing and badwidth to the cause!
    Help FU** Riaa/Mpaa you too!”

  26. Reader's Write Says:

    Oh that would be great. I would use my extra file server to to do this at home and run it on the server at work. It would be worth it to me, anything to cause those window lickers some pain.

  27. Reader's Write Says:

    I’ve added a link at the bottom of the DirIndexFaker homepage, so that you can donate via paypal if you’d like.

  28. Reader's Write Says:

    I don’t believe it’s possible to force the client’s socket to remain open without simultaneously maintaining an open socket connection on the server.

  29. Reader's Write Says:

    Generate a bunch of static pages once a week. If you want to get cute about it, add a random item to each generated page once in a while (~every 8 hours?).

    If _I_ was writing the bot, I’d have it ignore files of improbable lengths, then read a small chunk, use file(1) on that to check that it really was multimedia, and finally copy the rest for use as evidence.

    To defeat that, all we need is one person to record a video of themselves (preferably dressed up as a Disney character) saying “You suck!”, then rendering that out to various video and audio formats.

    It would be easy enough to do this several times (in several costumes, with different backgrounds) and then automatically add different scrollies with educational messages (”Treat your customers like criminals and that’s what you’ll get”, for example) kicking in at different times and/or mix-n-match audio to that effect in order to get enough permutations to make automatic detection of the decoys much harder.

    The idea is that the website generates random files starting with a valid multimedia stream so that the bot sees valid MM in various realistic-looking lengths, and downloads a copy for posterior. Any human viewing the results is going to get the message… again and again and again.

    If you make the files long enough and the link slow enough (or use shaping), you can have the webserver trigger a response, either an automated scan when (say) the third consecutive file is downloaded or notify a human to come and have a look.

    What you do next depends upon what you learn about the origin of the traffic, but I’d put tarpitting high on my list of things to do, and if I had a large network I’d run a check DNS, so that a “hit” added the calling IP to it, perhaps with a timeout an hour, and every host in the network would tarpit connections coming from that IP _and_ refresh the DNS when they did.

    Cheers; Leon

  30. Reader's Write Says:

    What not just share large garbage files, like a 8 mb photo of someone giving the camera the bird. encrypt it with a simple password like “SueMe”, and give the same name as a popular song then share it. just a thought.

  31. Reader's Write Says:

    Please Call/write/communicate with your congressperson’s office and ask them to support H.R. 1201, the Digital Media Consumers’s Rights Act of 2005 that was introduced on 3/9/2005. The full text of the bill is located at http://www.pocosin.com/documents/DMCRA.pdf. This proposed law hopes to amend the fair use provisions under the DCMA.

    Visit the EFF Page to e-mail your congressperson! TODAY!

  32. Reader's Write Says:

    Beauty!

    This would have the added bonus of both pissing off the big businesses AND improving the availability of content on the BitTorrent network.

    But wait, why stop at one network? Why not include protocols for Gnutella, G2, and ED2K networks as well? Have the worm share every audio and video file on the infected computer. Maybe even add sharing of zips, too?

    Enrich the networks, piss off the big businesses, what more could you ask for?

  33. Reader's Write Says:

    Yeah, I get that but your busines may have allready been ruined until you get your day in court! How can you pay for lawyer if your source of income (servers) have been confiscated as evidence? You would definetly win, but for what price?

  34. Reader's Write Says:

    So counter-sue them for the income you could have lost. take your most product day ever, multiply it by the number of days they took your server, tack on reassembly costs, and there’s your claim.

Leave a Reply

Please no Spam, flaming (attacking others), trolling, and posting off-topic. Thanks.

    Advertisements
GigaNews
 


Remove Spyware with AntiSpyware for Windows®