I made this program to download images that are posted in a thread from 4chan. I wrote the downloader in Python and it even spoofs Mozilla Firefox headers. It will download all the images(jpg, gif, png, jpeg) found in the thread. It also scrapes Rapidshare and Megaupload files and saves those links to a text document.
It also has the ability to backup the thread.
The simple GUI is done in AutoIT. I didn’t use UPX to package it when I compiled it so it shouldn’t raise any false alarms with any anti-virus software.
The GUI monitors your clipboard and will download any links that you copy. You can also force it to download the link in the clipboard.
Requires NO additional downloads. Unless you’re missing some standard DLLs.
This is capable of downloading multiple threads at once.
You can also specify the delay between retries, default is 60 seconds.
Once a thread 404′s the window that is downloading that thread will close. The GUI does not need to be running to download a thread.
Make sure the directory you want to save the files to is all ready created.
Delay and directory settings are saved when you exit.
Check it out! The source is included with the installer or just download the source.
Version 9:
Download Installer
Version 8(works with HTTPS):
Download installer
Version 7:
Version 6:
Version 5:
Version 4:
Version 3:
Version 2:
This is supplied free of charge. All I ask is that you don’t charge for it. It would be nice if you credited me but I won’t make a big deal about it.
#initial release Nov. 5, 2009 #v6 release Jan. 20, 2009 #http://cal.freeshell.org import os.path import re import string import sys import time import urllib import urllib2 #Regular Expressions imgurl = re.compile('http://\w+\.4chan\.org/\w+/src/\d+\.(?:jpg|gif|png|jpeg)') thumb = re.compile('http://.\.thumbs\.4chan\.org/\w+/thumb/\d+s\.(?:jpg|gif|png|jpeg)') thumbname = re.compile("\d+s\.(?:jpg|gif|png|jpeg)") imgurl2 = re.compile('http://\w+\.4chan\.org/\w+/src/') picname = re.compile('\d+\.(?:jpg|gif|png|jpeg)') tname = re.compile('/\d+') rs = re.compile('http://rapidshare.com/files/\d+/.*\.(?:rar|zip|avi|wmv|part\d+\.rar|\d+)''|http://megaupload.com/?d=........''|http://megaporn.com/?d=........') #Initiate Variables thread = sys.argv[1] #get argument from initial command: this is the thread address directory = sys.argv[2] delay = sys.argv[3] arch = int(sys.argv[4]) #Setup headers to spoof Mozilla dat = None ua = "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.9.1.4) Gecko/20091007 Firefox/3.5.4" head = {'User-agent': ua} errorcount = 0 #Create directory name dirname = str(tname.findall(thread)) #Clean directory name dirname = dirname.replace('[', '') dirname = dirname.replace(']', '') dirname = dirname.replace(chr(39), '') dirname = dirname.replace(chr(92), '') dirname = dirname.replace(chr(47), '') dname = dirname dirname = directory + chr(92) + dirname print "Downloading to: " + dirname #Create directorty if it doesn't exist if not os.path.exists(dirname): os.mkdir(dirname) if arch == 1: if not os.path.exists(dirname + chr(92) + "thumbs"): os.mkdir(dirname + chr(92) + "thumbs") #Add \ to directory name for image saving dirname = dirname + chr(92) #Start while 1: print "Scraping: " + thread #Get page req = urllib2.Request(thread, dat, head) try: response = urllib2.urlopen(req) except urllib2.HTTPError, e: if errorcount < 1: errorcount = 1 print "Request failed, retrying in " + delay + " seconds" time.sleep(int(delay)) response = urllib2.urlopen(req) except urllib2.URLError, e: if errorcount < 1: errorcount = 1 print "Request failed, retrying in " + delay + " seconds" time.sleep(int(delay)) response = urllib2.urlopen(req) msg = response.read() errorcount = 0 #Find all pictures and rapidshare links kwl = imgurl.findall(msg) rsl = rs.findall(msg) tl = thumb.findall(msg) #Save pictures for item in list(set(kwl)): #list(set(kwl)) removes duplicates #Clean image URL and clean file name filename = picname.findall(str(item)) fname = str(filename) fname = fname.replace('[', '') fname = fname.replace(']', '') fname = fname.replace(chr(39), '') #Download the image if it doesn't exists if not os.path.isfile(dirname + fname): print "Downloading: " + str(item) try: urllib.urlretrieve(str(item), dirname + str(fname)) time.sleep(0.25) except urllib.ContentTooShortError: print "Image download failed, retrying in " + int(delay)/4 + " seconds" time.sleep(int(delay)/4) urllib.urlretrieve(str(item), dirname + str(fname)) time.sleep(0.25) else: print str(fname) + " Exists... Trying next file." #Download thumbnails if arch == 1: for item3 in list(set(tl)): #list(set(kwl)) removes duplicates #Clean image URL and clean file name filename = thumbname.findall(str(item3)) fname = str(filename) fname = fname.replace('[', '') fname = fname.replace(']', '') fname = fname.replace(chr(39), '') #Download the image if it doesn't exists if not os.path.isfile(dirname + "thumbs" + chr(92) + fname): print "Downloading thumbnail: " + str(item3) try: urllib.urlretrieve(str(item3), dirname + "thumbs" + chr(92) + str(fname)) time.sleep(0.25) except urllib.ContentTooShortError: print "Thumbnail download failed, retrying in " + int(delay)/4 + " seconds" time.sleep(int(delay)/4) urllib.urlretrieve(str(item3), dirname + "thumbs" + chr(92) + str(fname)) time.sleep(0.25) else: print str(fname) + "(thumbnail) Exists... Trying next file." #Replace URLs with local images locations outp = open(dirname + dname + ".html", "w") for item3 in list(set(kwl)): filename = picname.findall(str(item3)) fname = str(filename) fname = fname.replace('[', '') fname = fname.replace(']', '') fname = fname.replace(chr(39), '') msg = msg.replace(str(item3), fname) if arch == 1: for item4 in list(set(tl)): filename = thumbname.findall(str(item4)) fname = str(filename) fname = fname.replace('[', '') fname = fname.replace(']', '') fname = fname.replace(chr(39), '') msg = msg.replace(str(item4), chr(34) + "thumbs" + chr(92) + fname + chr(34)) outp.write(msg) outp.close() #Save download links to a text file if they exist if not rs.search(msg): print "Nothing to download." else: print "Downloads found!" foutrs = open(dirname + "dl.txt", "w") for item2 in list(set(rsl)): foutrs.write(str(item2) + "\n") foutrs.close() #Wait to execute code again print "Waiting " + delay + " seconds before retrying" time.sleep(int(delay)) |
I used to use chanmongler a long while ago for the same purpose as this program. Only today did I start looking for a replacement after it stopped working (for some reason?) several months ago. After using this, I have to say that it is a very worthy replacement. It downloads much faster and is alot simpler and more lightweight. I love this and I hope you continue to update it.
I’m glad you like it! My first, unreleased version of this script stopped working a while back too. I don’t know if you noticed by 4chan removed the .php from the threads and there were some other changes to the way pages were generated.
Thanks for the feedback.
Pingback: Calvin's Doings » 4chan Image Downloader – Version 5
Hey man this is great stuff. Thanks a ton. If you could add in save page (you know ctrl+s) in a future version, that would be fantastic.
Saves me so much god damn time~
Glad you like it!
The next version of 4cdl WILL download the page and make all the image links work locally.
Thanks for this great piece of programing work,
if you don’t mind i’d like to ask for some changes if possible :
firstly can you make it so that it doesnt have to monitor a clipboard and
include a special field in the interface to add the addresses of threads
we want images downloaded from and the progress area, and list of threads being downloaded from area,
also, for technically retarded people like me, a setting tab in the interface
to contain the, well.. settings.
If a mac version was made, I would be even more greatful.
I own both a pc and a mac, and I’m often browsing on the macintosh, and having to switch over and find the thread is a bit of a chore.
Unfortunately I don’t have a Mac so I couldn’t develop but with some small alterations the program should work on a Mac since Python is cross-platform. You’d need to download Python and use the script from the commandline. The only changes you should need to make is to the directory structure I use: change the direction of the slashes possibly.
Great!
Behold the beauty of open-source.
The downloads are not working for some reason. I’m getting 0 byte downloads.
HOW DO I UNINSTALL PLEASE?? There is no option to uninstall!
Just delete the 4cdl folder in your C:\ drive or wherever you installed the program. It doesn’t make any lasting changes to your computer or registry.
i’m having trouble with this.
It installed just fine. The GUI works too but when I enter a thread is says there are no pictures to download. the saved version of the site doesn’t show them either… ,however the rest of the site loads fine.
anyone knows how to fix this??
I have dowloaded the program and it does not work for me, got the error message “Could not find a part of the Path” followed by “\.thumbs.4chan.org\w\thumb\1336259139128.jpg’. ” I have yet to be able to download any images from the site. I came here and attempted to download the 7th version but the link above did not work. I am currently running the 2.0.1.0 version on a Windows 7 home Ed. 64 bit. thought I would add the OS in incase it might help to determine the problem.
I love your downloader, I love you. That is all.
I tryed but i’m not being able to use this in linux.
Can i have any Tip of how to make it work?
Also it works WONDERFULLY in windows.
Thank you very much for this small Script.
This shit is fucking gold, bro. By the way you miss-spelled “before”. Forgot the e. Don’t fix it though, it’s edgy.
Love it Calvin. Fantastic program
It stopped working after 4chan moved their images to 4cdn.org, could you please fix it?