Tuesday 21 May 2013

Script for extracting all photo tagged with a certain username on Don't Stay In

Windows Instructions

  1. Download and install Python for windows (see links below)
  2. Create a directory (c:\dsi or something)
  3. Download the script or copy and paste the source from below into that folder
  4. Start the windows command line (Start --> Run --> type 'cmd' --> Press return)
  5. Change directory to where you saved the script (type 'cd c:\dsi' press return)
  6. Find the username you want to get (click on their profile page)
  7. Find the total number of PAGES of photos they have (click on "all photos of user", look for "1 of 10", you're after the big number)
  8. Run the script (type 'python getphotos.py username pagecount' press return
  9. Make tea
  10. If it doesn't work, Official Windows Support (!! 8@) is being provided by Karl (dreammaster4) so go ask him.

Python Links

Windows

Python Download
Python 2.7.5 for windows

Ubuntu

sudo apt-get install python

Download Script

ow.ly/li7My

Script Source

#!/usr/bin/python
import urllib
import re
import sys
import os

def do_page(url):
    print "getting " + url
    f = urllib.urlopen(url)
    html = f.read()
    pattern = r'http://www.dontstayin.com/.*/photo-[0-9]*'
    hits = re.findall(pattern, html)
    return hits

if __name__ == '__main__':
    username=sys.argv[1]
    pages=int(sys.argv[2])
    hits = []
    for i in range(1, pages + 1):
        url = 'http://www.dontstayin.com/members/' + username + '/photos/photopage-%d' % i
        hits.extend(do_page(url))

    pattern = r'(http://pixmaster-eu.dontstayin.com/([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}.jpg))'
    for hit in hits:
        pathPattern = r'http://www.dontstayin.com/(.*)/photo-[0-9]*'
        paths = re.findall(pathPattern, hit)
        path = os.path.join(username,paths[0])
        if not os.path.exists(path):
            print "making dir " + path
            os.makedirs(path)
        print "processing photo page " + hit
        f = urllib.urlopen(hit)
        html = f.read()
        pictures = re.findall(pattern, html)
        for picture in pictures:
            print "found picture " + picture[0]
            filename = os.path.join(path,picture[1])
            urllib.urlretrieve (picture[0], filename)
            print "saved picture " + filename

No comments:

Post a Comment