Release 4-1

As mentioned briefly in my previous blog post, the next internal pull request I wanted to work on was a webcrawler for pySearch. My peer and my professor both left me with great places to start to understand how I can create one. I did my research on scrapy, and on Beautiful Soup. I installed both and found a samples of code to run both. After playing with both, I decided Beautiful Soup was easier to work with for this project. Scrapy is more useful for large scale web crawling and from what I read Beautiful Soup could be used for something similar to what I was trying to do.

I started writing my web crawler to pySearch and had completed it. I followed this article to help me get started, but it is outdated and there have been updates such as import urllib2  no longer working and I had to import urllib instead.

Silly me did not realize that by the time I started working on this issue the code I was working on was outdated and I had to integrate my code into the new search file and not the pySearch file. This is something I need to keep a closer eye on next time. From the time I started there was one search engine working, by the time I updated to the new code there was 4 – 5 different options of search engines.

Because of the many changes that are happening on the project my pull request needs some changes. Something new I saw that is added for this project is the automated emailed being sent after the pull requests with basically asks for changes and displays errors in each line of my code as shown below.

Cpysearch

I am hoping to communicate with the maintainer over the break and see how I can change it. I know for now my web crawler works for the default search engine (google) and hope to get it working with all the other options on the script. I have talked to the maintainer prior to making my pull request and he confirmed this was a good start for this request and for the future we can work on getting implemented for various search engines.

 

Until next time !

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s