Web scraping with Python 3 and Selenium

Notes on how to use python module selenium.
Selenium is very useful for automating web browsing tasks.
It is very intuitive and I personally find a lot easier than
phantomjs. If you have X or Xvfb running on your machine,
and know small about javascript, I totally recommend Selenium.

1. Installing the module

Get the source from https://pypi.python.org/pypi/selenium and run
python setup.py install
# If you want to use virtual display install xorg-server-xephyr and pyvirtualdisplay module.

2. Importing the module

from selenium import webdriver
# If you want to use virtual display
from pyvirtualdisplay import Display
xephyr=Display(visible=0, size=(1600, 900)).start()

3. Choosing chrome as a browser to use.

br = webdriver.Chrome()

4. Open a website


5. Fill a form, and submit it.

# Find the form by name and select it.
el = br.find_element_by_name('inputId')
# Find the form by xpath and select it.
# You can find xpaths of some contents by "inspect element" and right-click and choose "copy xpath" on Chromium.
# Send words
# submit it
# or click a button

6. Switching between iframes. A lot of websites these days have frames. When you get an error with find_element, it is likely that wrong frame is chosen.


7. A lot of websites open a new tab on click, so you have to switch to an active tab. The number is the placement of tab counting from left starting from zero. The example chooses a second tab from left


8. Finishing the script.


9. Extras

For clicking certain position in order to click on Flash contents and etc.
# Install pyuserinput module for general mouse and keyboard control. You will need python port of xlib
pip3 install PyUserInput
# Import module
from pymouse import PyMouse
# Clicking
m = PyMouse()
# Get current cursor position
# Click x y position.

This isn’t all. I am planning to include some other functions of selenium as I will find out.

Previous Post
Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: