4chan Image Downloader

I made this program to download images that are posted in a thread from 4chan. I wrote the downloader in Python and it even spoofs Mozilla Firefox headers. It will download all the images(jpg, gif, png,  jpeg) found in the thread. It also scrapes Rapidshare and Megaupload files and saves those links to a text document.

It also has the ability to backup the thread.

The simple GUI is done in AutoIT. I didn’t use UPX to package it when I compiled it so it shouldn’t raise any false alarms with any anti-virus software.

The GUI monitors your clipboard and will download any links that you copy. You can also force it to download the link in the clipboard.

Requires NO additional downloads. Unless you’re missing some standard DLLs.

This is capable of downloading multiple threads at once.

You can also specify the delay between retries, default is 60 seconds.

Once a thread 404’s the window that is downloading that thread will close. The GUI does not need to be running to download a thread.

Make sure the directory you want to save the files to is all ready created.

Delay and directory settings are saved when you exit.

Check it out! The source is included with the installer or just download the source.

Version 9:
Download Installer

Source Code

Version 8(works with HTTPS):
Download installer

Download source

Version 7:

4cdlv7 Installer

sourcev7

Version 6:

4cdlv6 Installer

source

Version 5:

4cdlv5 Installer

sourcev5

Version 4:

4cdl Installer v4

source v4

Version 3:

4cdl Installer v3

source v3

Version 2:

Download Installer

Source

This is supplied free of charge. All I ask is that you don’t charge for it. It would be nice if you credited me but I won’t make a big deal about it.

#initial release Nov. 5, 2009
#v6 release Jan. 20, 2009
#http://cal.freeshell.org

import os.path
import re
import string
import sys
import time
import urllib
import urllib2

#Regular Expressions
imgurl = re.compile('http://\w+\.4chan\.org/\w+/src/\d+\.(?:jpg|gif|png|jpeg)')
thumb = re.compile('http://.\.thumbs\.4chan\.org/\w+/thumb/\d+s\.(?:jpg|gif|png|jpeg)')
thumbname = re.compile("\d+s\.(?:jpg|gif|png|jpeg)")
imgurl2 = re.compile('http://\w+\.4chan\.org/\w+/src/')
picname = re.compile('\d+\.(?:jpg|gif|png|jpeg)')
tname = re.compile('/\d+')
rs = re.compile('http://rapidshare.com/files/\d+/.*\.(?:rar|zip|avi|wmv|part\d+\.rar|\d+)''|http://megaupload.com/?d=........''|http://megaporn.com/?d=........')

#Initiate Variables
thread = sys.argv[1] #get argument from initial command: this is the thread address
directory = sys.argv[2]
delay = sys.argv[3]
arch = int(sys.argv[4])

#Setup headers to spoof Mozilla
dat = None
ua = "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.9.1.4) Gecko/20091007 Firefox/3.5.4"
head = {'User-agent': ua}

errorcount = 0

#Create directory name
dirname = str(tname.findall(thread))

#Clean directory name
dirname = dirname.replace('[', '')
dirname = dirname.replace(']', '')
dirname = dirname.replace(chr(39), '')
dirname = dirname.replace(chr(92), '')
dirname = dirname.replace(chr(47), '')
dname = dirname
dirname = directory + chr(92) + dirname

print "Downloading to: " + dirname
#Create directorty if it doesn't exist
if not os.path.exists(dirname):
    os.mkdir(dirname)
if arch == 1:
    if not os.path.exists(dirname + chr(92) + "thumbs"):
        os.mkdir(dirname + chr(92) + "thumbs")

#Add \ to directory name for image saving
dirname = dirname + chr(92)

#Start
while 1:
    print "Scraping: " + thread

#Get page
    req = urllib2.Request(thread, dat, head)
    try:
        response = urllib2.urlopen(req)
    except urllib2.HTTPError, e:
        if errorcount < 1:
            errorcount = 1
            print "Request failed, retrying in " + delay + " seconds"
            time.sleep(int(delay))
            response = urllib2.urlopen(req)
    except urllib2.URLError, e:
        if errorcount < 1:
            errorcount = 1
            print "Request failed, retrying in " + delay + " seconds"
            time.sleep(int(delay))
            response = urllib2.urlopen(req)

    msg = response.read()
    errorcount = 0

#Find all pictures and rapidshare links
    kwl = imgurl.findall(msg)
    rsl = rs.findall(msg)
    tl = thumb.findall(msg)

#Save pictures
    for item in list(set(kwl)): #list(set(kwl)) removes duplicates
#Clean image URL and clean file name
        filename = picname.findall(str(item))
        fname = str(filename)
        fname = fname.replace('[', '')
        fname = fname.replace(']', '')
        fname = fname.replace(chr(39), '')
#Download the image if it doesn't exists
        if not os.path.isfile(dirname + fname):
            print "Downloading: " + str(item)
            try:
                urllib.urlretrieve(str(item), dirname + str(fname))
                time.sleep(0.25)
            except urllib.ContentTooShortError:
                print "Image download failed, retrying in " + int(delay)/4 + " seconds"
                time.sleep(int(delay)/4)
                urllib.urlretrieve(str(item), dirname + str(fname))
                time.sleep(0.25)
        else:
            print str(fname) + " Exists... Trying next file."

#Download thumbnails
    if arch == 1:
        for item3 in list(set(tl)): #list(set(kwl)) removes duplicates
    #Clean image URL and clean file name
            filename = thumbname.findall(str(item3))
            fname = str(filename)
            fname = fname.replace('[', '')
            fname = fname.replace(']', '')
            fname = fname.replace(chr(39), '')
    #Download the image if it doesn't exists
            if not os.path.isfile(dirname + "thumbs" + chr(92) + fname):
                print "Downloading thumbnail: " + str(item3)
                try:
                    urllib.urlretrieve(str(item3), dirname + "thumbs" + chr(92) + str(fname))
                    time.sleep(0.25)
                except urllib.ContentTooShortError:
                    print "Thumbnail download failed, retrying in " + int(delay)/4 + " seconds"
                    time.sleep(int(delay)/4)
                    urllib.urlretrieve(str(item3), dirname + "thumbs" + chr(92) + str(fname))
                    time.sleep(0.25)
            else:
                print str(fname) + "(thumbnail) Exists... Trying next file."

#Replace URLs with local images locations
    outp = open(dirname + dname + ".html", "w")

    for item3 in list(set(kwl)):
        filename = picname.findall(str(item3))
        fname = str(filename)
        fname = fname.replace('[', '')
        fname = fname.replace(']', '')
        fname = fname.replace(chr(39), '')
        msg = msg.replace(str(item3), fname)

    if arch == 1:
        for item4 in list(set(tl)):
            filename = thumbname.findall(str(item4))
            fname = str(filename)
            fname = fname.replace('[', '')
            fname = fname.replace(']', '')
            fname = fname.replace(chr(39), '')
            msg = msg.replace(str(item4), chr(34) + "thumbs" + chr(92) + fname + chr(34))

    outp.write(msg)
    outp.close()

#Save download links to a text file if they exist
    if not rs.search(msg):
        print "Nothing to download."
    else:
        print "Downloads found!"
        foutrs = open(dirname + "dl.txt", "w")
        for item2 in list(set(rsl)):
            foutrs.write(str(item2) + "\n")
        foutrs.close()

#Wait to execute code again
    print "Waiting " + delay + " seconds before retrying"
    time.sleep(int(delay))
This entry was posted in AutoIt, Python and tagged , , , , , , , , , , . Bookmark the permalink.

19 Responses to 4chan Image Downloader

  1. Hangman says:

    I used to use chanmongler a long while ago for the same purpose as this program. Only today did I start looking for a replacement after it stopped working (for some reason?) several months ago. After using this, I have to say that it is a very worthy replacement. It downloads much faster and is alot simpler and more lightweight. I love this and I hope you continue to update it.

  2. Calvin says:

    I’m glad you like it! My first, unreleased version of this script stopped working a while back too. I don’t know if you noticed by 4chan removed the .php from the threads and there were some other changes to the way pages were generated.

    Thanks for the feedback.

  3. Pingback: Calvin's Doings » 4chan Image Downloader – Version 5

  4. Anon says:

    Hey man this is great stuff. Thanks a ton. If you could add in save page (you know ctrl+s) in a future version, that would be fantastic.

    Saves me so much god damn time~

  5. Calvin says:

    Glad you like it!

    The next version of 4cdl WILL download the page and make all the image links work locally.

  6. bg says:

    Thanks for this great piece of programing work,
    if you don’t mind i’d like to ask for some changes if possible :
    firstly can you make it so that it doesnt have to monitor a clipboard and
    include a special field in the interface to add the addresses of threads
    we want images downloaded from and the progress area, and list of threads being downloaded from area,
    also, for technically retarded people like me, a setting tab in the interface
    to contain the, well.. settings.

  7. Anon says:

    If a mac version was made, I would be even more greatful.
    I own both a pc and a mac, and I’m often browsing on the macintosh, and having to switch over and find the thread is a bit of a chore.

  8. Calvin says:

    Unfortunately I don’t have a Mac so I couldn’t develop but with some small alterations the program should work on a Mac since Python is cross-platform. You’d need to download Python and use the script from the commandline. The only changes you should need to make is to the directory structure I use: change the direction of the slashes possibly.

  9. Calvin says:

    Great!

    Behold the beauty of open-source.

  10. Rafael says:

    The downloads are not working for some reason. I’m getting 0 byte downloads.

  11. Anonymous says:

    HOW DO I UNINSTALL PLEASE?? There is no option to uninstall!

  12. Calvin says:

    Just delete the 4cdl folder in your C:\ drive or wherever you installed the program. It doesn’t make any lasting changes to your computer or registry.

  13. Anon says:

    i’m having trouble with this.
    It installed just fine. The GUI works too but when I enter a thread is says there are no pictures to download. the saved version of the site doesn’t show them either… ,however the rest of the site loads fine.

    anyone knows how to fix this??

  14. Don says:

    I have dowloaded the program and it does not work for me, got the error message “Could not find a part of the Path” followed by “\.thumbs.4chan.org\w\thumb\1336259139128.jpg’. ” I have yet to be able to download any images from the site. I came here and attempted to download the 7th version but the link above did not work. I am currently running the 2.0.1.0 version on a Windows 7 home Ed. 64 bit. thought I would add the OS in incase it might help to determine the problem.

  15. Rathtyr says:

    I love your downloader, I love you. That is all.

  16. Enky says:

    I tryed but i’m not being able to use this in linux.
    Can i have any Tip of how to make it work?
    Also it works WONDERFULLY in windows.
    Thank you very much for this small Script.

  17. Bob says:

    This shit is fucking gold, bro. By the way you miss-spelled “before”. Forgot the e. Don’t fix it though, it’s edgy.

  18. Joe says:

    Love it Calvin. Fantastic program

  19. Silkspire says:

    It stopped working after 4chan moved their images to 4cdn.org, could you please fix it?

Leave a Reply

Your email address will not be published. Required fields are marked *