Using ftplib in Python 3

This script download files with timestamp less than argv[1] days old.

#!/usr/bin/python3

from ftplib import FTP
import datetime
from sys import argv

url = ("url")
user = ("anonymous")
password = ("anonymous")

today = datetime.date.today()

ftp = FTP(url)
fs = ftp.login(user, password)
print (fs)
fs = ftp.makepasv()
print (fs)
# Change directory
fs = ftp.cwd("/pub/")
print (fs)

# List directory and storing in varibale "lines".
lines = []
fs = ftp.retrlines('LIST', lines.append)
print (lines)

# Excluding directories from varibale "lines".
list = []
for line in lines:
    if str("d") not in line[0]:
        sline = line.split()
        list.append(sline)
print (list)

# Download only files less than argv[1] days old.
for file in list:
    for daysago in range(int(argv[1])):
        date = today - datetime.timedelta(days=daysago)
        ftpdate = date.strftime("['%b', '%d']")
        if str(ftpdate) == str(file[5:7]):
            ftp.retrbinary("RETR " + file[8], open(file[8], 'wb').write)

Advertisements

Web Scraping with urllib in Python 3

Here is an example of logging in to some website, and get some content.

#!/usr/bin/python3
# Importing modules for handling http and cookie
import http.cookiejar, urllib.request

# Storing cookies in cj variable
cj = http.cookiejar.CookieJar()

# Defining a handler for later http operations with cookies(cj).
op = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))

# Logging in
url = ('https://127.0.0.1/index.php?')
val = {'user' : 'username', 'password' : 'password'}
data = urllib.parse.urlencode(val)
asciidata = data.encode('ascii')
res = opener.open(url, asciidata)

# Saving a file
f = open("content.jpg", "wb")
res = op.open('https://127.0.0.1/index.php/apps/contents.jpg')
f.write(res.read())
f.close()

Web scraping with Python 3 and Selenium

Notes on how to use python module selenium.
Selenium is very useful for automating web browsing tasks.
It is very intuitive and I personally find a lot easier than
phantomjs. If you have X or Xvfb running on your machine,
and know small about javascript, I totally recommend Selenium.

1. Installing the module

Get the source from https://pypi.python.org/pypi/selenium and run
python setup.py install
# If you want to use virtual display install xorg-server-xephyr and pyvirtualdisplay module.

2. Importing the module

#!/usr/bin/python3
from selenium import webdriver
# If you want to use virtual display
from pyvirtualdisplay import Display
xephyr=Display(visible=0, size=(1600, 900)).start()

3. Choosing chrome as a browser to use.

br = webdriver.Chrome()

4. Open a website

br.get('url')

5. Fill a form, and submit it.

# Find the form by name and select it.
el = br.find_element_by_name('inputId')
# Find the form by xpath and select it.
# You can find xpaths of some contents by "inspect element" and right-click and choose "copy xpath" on Chromium.
br.find_element_by_xpath('//*[@id="box"]/ul/li[2]/a/img').click()
# Send words
el.send_keys('words')
# submit it
el.submit()
# or click a button
br.find_element_by_name("certain_name").click()

6. Switching between iframes. A lot of websites these days have frames. When you get an error with find_element, it is likely that wrong frame is chosen.

br.switch_to_default_content()
br.switch_to_frame(1)

7. A lot of websites open a new tab on click, so you have to switch to an active tab. The number is the placement of tab counting from left starting from zero. The example chooses a second tab from left

br.switch_to_window(br.window_handles[1])

8. Finishing the script.

br.quit()
exit()

9. Extras


For clicking certain position in order to click on Flash contents and etc.
# Install pyuserinput module for general mouse and keyboard control. You will need python port of xlib
pip3 install PyUserInput
# Import module
from pymouse import PyMouse
# Clicking
m = PyMouse()
# Get current cursor position
m.position()
# Click x y position.
m.click(x,y)

This isn’t all. I am planning to include some other functions of selenium as I will find out.