Web scraping with Python 3 and Selenium

Notes on how to use python module selenium.
Selenium is very useful for automating web browsing tasks.
It is very intuitive and I personally find a lot easier than
phantomjs. If you have X or Xvfb running on your machine,
and know small about javascript, I totally recommend Selenium.

1. Installing the module

Get the source from https://pypi.python.org/pypi/selenium and run
python setup.py install
# If you want to use virtual display install xorg-server-xephyr and pyvirtualdisplay module.

2. Importing the module

#!/usr/bin/python3
from selenium import webdriver
# If you want to use virtual display
from pyvirtualdisplay import Display
xephyr=Display(visible=0, size=(1600, 900)).start()

3. Choosing chrome as a browser to use.

br = webdriver.Chrome()

4. Open a website

br.get('url')

5. Fill a form, and submit it.

# Find the form by name and select it.
el = br.find_element_by_name('inputId')
# Find the form by xpath and select it.
# You can find xpaths of some contents by "inspect element" and right-click and choose "copy xpath" on Chromium.
br.find_element_by_xpath('//*[@id="box"]/ul/li[2]/a/img').click()
# Send words
el.send_keys('words')
# submit it
el.submit()
# or click a button
br.find_element_by_name("certain_name").click()

6. Switching between iframes. A lot of websites these days have frames. When you get an error with find_element, it is likely that wrong frame is chosen.

br.switch_to_default_content()
br.switch_to_frame(1)

7. A lot of websites open a new tab on click, so you have to switch to an active tab. The number is the placement of tab counting from left starting from zero. The example chooses a second tab from left

br.switch_to_window(br.window_handles[1])

8. Finishing the script.

br.quit()
exit()

9. Extras


For clicking certain position in order to click on Flash contents and etc.
# Install pyuserinput module for general mouse and keyboard control. You will need python port of xlib
pip3 install PyUserInput
# Import module
from pymouse import PyMouse
# Clicking
m = PyMouse()
# Get current cursor position
m.position()
# Click x y position.
m.click(x,y)

This isn’t all. I am planning to include some other functions of selenium as I will find out.

Advertisements
Previous Post
Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: