Sunday, May 1, 2016

Eleventh Week of LITG Program

It is the eleventh week of the LITG program. During this week I did some final testing and cleaning of the project as instructed by my mentor. These tasks will be explained first in this blog post. At the end I will also be summarizing the overall project and also the learning that I gained through this project. 

First I was instructed by my mentor to add my name as the owner of the project in the license files of both the Python package and the API. The license included is the Apache License Version 2.0. This license has the following line in it. 

Copyright [yyyy] [name of copyright owner]

This was changed as follows. 

Copyright 2016 Hansika Hewamalage 

After that I added more test cases to the scrape_quora Python package. I created a list of Quora user names and sent them one by one to the routes of the QUserAPI. After doing all these it was required to push both the package and the API to github again. Also I had to push the package to the Python package index once again with a new version number(0.1.3). CHANGES.txt file and the setup.py file of the package were updated accordingly before pushing the package to PyPI. 

Then I redeployed the API on Heroku. Before redeploying the requirements.txt file was updated to include the latest version of the scrape_quora package. Then redeployment was done according to the tutorial at https://devcenter.heroku.com/articles/getting-started-with-python#push-local-changes. Redeployment is all about pushing local changes to the git repository with the remote heroku. The following sequence of commands was followed in order to achieve this. 

First add the modified files to the local git repository. 
git add . 

Next commit the changes to the repository. 
git commit -m “license file and the requirements.txt file modified” 

Deploy the source code to Heroku. 
git push heroku master 

Check whether its deployed properly. 
heroku open 

Once all the steps were followed, the API was successfully redeployed on Heroku. 

Next I will be summarizing all the work that I did throughout the last ten weeks. My project was to develop an API that retrieves Quora(https://www.quora.com/) user profile information given the user name(available at https://github.com/hansika/QUserAPI). This was based on a Python package namely the scrape_quora Python package(available at https://github.com/hansika/pyquora) which was also developed by me. The chosen programming language was Python. First of all license files were added to the projects QUserAPI and pyquora(the Python package). After much exploration the chosen license was Apache License Version 2.0. 

Then the first three weeks were dedicated to learning Python by following the exercises available at http://learnpythonthehardway.org/book/preface.html. All the completed exercises were pushed into a git repository(available at https://github.com/hansika/LearnPython). It was a bit of a struggle during the first few weeks since my university exams clashed with the program. Somehow I managed to complete the backlogs and be on the track. The fourth week was a much important week since I got exposed into the world of web scraping. I completed the video tutorial available at https://www.youtube.com/watch?v=ic8ygbac5lo which teaches the fundamentals of web scraping using Scrapy library. It uses the example code available at https://github.com/tapasweni-pathak/Talks-and-Workshops/tree/master/PyLadies%20Remote/Code%20Examples. It was much easier for me to grasp the concepts using this example. I also referred the python package developed by mentor namely pyhoroscope(available at https://github.com/tapasweni-pathak/pyhoroscope). The purpose of this package is to fetch and parse data from GaneshaSpeaks. This is actually the base on which I developed my package, scrape_quora. 

Next, during the first half of the fifth week I got introduced to a number of new Python packages namely, etree, re, urllib2 and xpath which were needed to develop scrape_quora. I learned about them by trying out different commands related to these packages. My learning is available at the github repository https://github.com/hansika/LITG_Practice. Furthermore I learned about Python dictionaries which were also needed in a future week of the program. During the rest of the fifth week and the beginning of the sixth week I finished coding the python package scrape_quora. This package basically scrapes the name of the user, URL of the user profile, profile picture link, follower count, following count, count of edits, number of answers and the number of questions given the user name of the profile as the input. For a future improvement we can also scrape the Facebook, LinkedIn profile links which actually requires login to Quora. 

The task during the rest of the sixth week was to push the package into the Python Package Index. It was first tested by pushing the package into the test server. I ran into many issues while doing this but with the help of my mentor I could overcome all of them. The package pushed to the test server is available at https://testpypi.python.org/pypi?:action=display&name=scrape_quora&version=0.1.3. The package pushed to the live server is available at https://pypi.python.org/pypi?:action=display&name=scrape_quora&version=0.1.3. The next task was to code the API using this Python package. Before that during the seventh week I learned about Flask Python framework. Flask was needed to code the API. I pushed all my learning related to Flask into the github repository at https://github.com/hansika/Flask_Learning. I tried out very simple commands using Flask to get a basic understanding. Furthermore during this week I also followed the Horoscope API(https://github.com/tapasweni-pathak/Horoscope-API) developed by my mentor using the aforementioned pyhoroscope package. This was used as the basis for developing the QUserAPI. 

The eighth week was dedicated to coding the API. I could successfully finish coding the API during this week. Then during the ninth week it was time to start learning about the Heroku platform. The blog post of the ninth week was dedicated to summarizing my learning related to Heroku. Then during the tenth week which was the final week of the project work I deployed the developed API onto the Heroku platform. Even while doing this I ran into many difficulties. But with enough Google search I could overcome all of them. The official documentation of Heroku available at https://devcenter.heroku.com/articles/getting-started-with-python#introduction was much useful in all these issues. Likewise I could successfully deploy the API which is now accessible via http://quser-api.herokuapp.com/

This API is a much needed piece of software for those developers who work on more complex tasks such as data mining using the information extracted from Quora user profiles. All the instructions related to using the API are available at the github repository at https://github.com/hansika/QuserAPI. The API has been developed in a much intuitive manner to both use and also tounderstand. 

As a whole the LITG program has brought me many important lessons to my life. First of all I should say that this was my first time working in such an international program under the mentorship of a foreigner. It brought me many new experiences. Apart from the new technolgies and the programming languages learned, I could collect many good experiences to climb up the career ladder. I should emphasize the support given to me by my mentor Tapasweni Pathak throughout this project to successfully complete everything. If it was not for the feasible and end to end schedule created by her I would not be able to complete the tasks timely. Because of that I could get a wondeful experience in working accroding to prescheduled timelines. Esepecially during the first few weeks when I had my university exams we often had to refine the timeline to cover the backlogs. Furthermore writing blog posts every week imrpoved my writing skills. Also it was a good way of keeping a note of all the new learning throughout the week. Since we tend to forget easily anything that we learn, writing blogs is a good way to go back and revise what we learn. This habit I hope to continue throughout the rest of my work as well. Another good thing I learned is to try out and actually do something related to whatever new technologies that I learn. When such new learning is pushed into a github repository this new knowledge will be available for the future as well. This is another good habit that I hope to continue. 

All in all the LITG program was a great influence for me to add many good habits to my career path. The new experiences and the learning gained throughout this program will be much needed and helpful for me ahead in my life to achieve my career goals.

No comments:

Post a Comment