AWS Certified Associate Exams On Sale



Python BeautifulSoup Web Scrapping used in Google App Engine Python Programming

We’ve mentioned that Python, Java are the web programming technologies currently supported by Google app engine. PHP doesn’t have a direct support. Instead PHP can be implemented in form of Quercus. Quercus is 100% Java implementation of the PHP language (needs JDK 1.5). We’ll discuss about queries in a seperate post.

Python has a rich set of modules which implement all major tasks in a very easy way.
Today I’m going to talk about an important Python module called as BeautifulSoup . BeautifulSoup has rich functionality and plays a major role in screen scrapping (web scrapping – extract info from website, infor from search results etc – basically web mining).
I installed Python 2.5. From Start->Python2.5->IDLE(Python GUI), I entered GUI prompt and entered the following command and got the following error :
>>> from BeautifulSoup import BeautifulSoup
Traceback (most recent call last):
File ” <pyshell#1>”, line 1, in
from BeautifulSoup import BeautifulSoup
ImportError: No module named BeautifulSoup

Reason For This Error : No BeautifulSoup module installed
How to fix this error?
1) Download beautifulsoup python module
2) Click on tar.gz package and extract it to Python25Libsite-packages – If we read the notepad it says that third party packages can be installed in this directory
3) Paste extracted Beautifulsoup directory in this location
4) Click the setup.py file inside the BeautifulSoup directory
5) This creates two files named – BeautifulSoup , BeautifulSoupTests
6) Copy those two files and paste it in site-packages directory (one path before its current location. Click on Backspace. Now you land in python25Libsite-packages
7) Try command
>>>import BeautifulSoup
>>>from BeautifulSoup import BeautifulSoup
Above commands work fine without any error