Monday, May 2, 2016

Twelfth Week of LITG Program

On request by my mentor Tapasweni Pathak, I write this blog post to sum up the overall experience I gained via the LITG program. This is a bunch of questions and answers regarding the program. 

How was the experience with Learn IT Girl? 

It was a wonderful, educative and a rich experience indeed filled with many lessons to mold my career path. 

Was the time plan unmanageable? Did the project make you spend more hours than expected?

No. The time line was quite feasible thanks to the support and the guidance of my mentor. I spent nearly ten hours per week on the project work (nearly three hours on the five weekdays and the other seven during the weekend). Depending on the workload due for the week sometimes I had to work few extra hours(especially while coding the Python package). Nevertheless I could successfully balance the project work with the other academic work at the university. 

What all did you learn? 

Python programming language 
Selecting the most appropriate type of license for a given project 
Working with Github 
Web scraping with Scrapy library 
Functionality of Python packages such as etree, re, urllib2, xpath 
Developing a Python package 
Pushing a Python package into the Python Package Index 
Flask Python Framework 
Developing an API using Python and Flask Framework 
Deploying a Python application on Heroku 

Apart from these I could learn the following as well which would be much useful to me in future.

Working with tight schedules on strict deadlines 
Keeping record of the work by writing weekly blog posts 
Communicating with international mentors on project related professional matters 

Why did you choose to learn Python and why this project?

Python is a widely used language today but I had zero knowledge about it. Also it is a very easy to learn high-level language with many libraries and built-in support which attracts many developers. Therefore Python knowledge would indeed be a plus point for me in my career as a Software Engineer. That is why I selected to learn Python.

The project is about developing a Python API to extract the following Quora user information given the user name of the profile as the input. 

Name of the user 
URL of the user profile 
Profile picture link 
Follower count 
Following count 
Count of edits 
Number of answers 
Number of questions 

Future improvements include extracting the Facebook, LinkedIn profile links which actually requires login to Quora. This API is based on a Python package which performs the web scraping part of the project. Web scraping was a whole new concept for me. I had never really worked on it before. Also I had so many learning material available to grasp the basic concepts. The Horoscope API(https://github.com/tapasweni-pathak/Horoscope-API) and the pyhoroscope package(https://github.com/tapasweni-pathak/pyhoroscope) developed by my mentor follows the same concepts. There are many blog posts written by my mentor about developing Python packages and APIs. Therefore it was much convenient for me to learn the fundamentals of Python and other required technologies within a very short period of time. This was the major reason why I chose this project. 

How can your project help others? 

Both the API and the package can be used by developers in their projects. Also developers can volunteer to add more features to the package and improve it. 

The Github repository for the QUserAPI - https://github.com/hansika/QUserAPI
(Instructions to use the API are available at this link) 

The Github repository for the scrape_quora Python package - https://github.com/hansika/pyquora

The package can be installed in isolation(without using the API) by using the command 'pip install scrape_quora' 

This API along with the Python package can be used by developers who work on far more complicated tasks such as data mining using the information extracted from Quora user profiles. The API has been developed in a much intuitive manner for the developers to understand and use. 

The things that you would like to change in the next round of Learn IT Girl to make it better? 

I think the mid evaluation should be much more structured. Currently the mid evaluation is more like a self evaluation where the mentee evaluates her own learning. Not even the mentor is involved in this process. But I feel that if the mentee is requested to produce some working piece of software during the mid evaluation on which the evaluators(including the mentor) can give some constructive feedback, she will be even more motivated to do better throughout the rest of the program. Personally I would find this very helpful since it is always feedback and comments from people that keep me going and produce even better work. 

Would you like to mentor in next round? 

Yes. I would like to mentor in the next round. I would love to give something in return with the knowledge I gained and help a girl learn a new programming language. 

What would you like to advice next year mentees? 

Choosing the language that best fits you is critical. It should be something that you have zero knowledge and also something that would be valuable enough to spend three months of your time to learn. Go for a language that you think will remain in the industry at least for a few years. 

Choose a project that fits your potential. It should not be too complex and also not something too simple that you can complete within a few weeks. Choose something which will teach you enough new concepts to digest within a period of three months. 

Have a proper time plan. Get the help of your mentor to create it. Your work break down structure should be such that it does not clash or disturb your other personal or academic work. Allocate time so that you can complete the work due for each week. You know your work schedule more than anyone else. 

Above all have the passion to learn. Try out things by yourself. Bother your mentor only when you really cannot resolve your issues for several days. Do enough Google search before you ask from your mentor. It is your time to learn so take the maximum out of it. 

What were the things that you didn’t like about your mentor? 

My mentor was Tapasweni Pathak. She was a great mentor helping me immensely from the point of creating the time-line to the point of writing this final blog post. A bit strict at times especially when I was lagging behind due to my exams but of course a motivative character who actually drove me complete my tasks timely by adding deadlines. She is an awesome supervisor who can make anything work.

Sunday, May 1, 2016

Eleventh Week of LITG Program

It is the eleventh week of the LITG program. During this week I did some final testing and cleaning of the project as instructed by my mentor. These tasks will be explained first in this blog post. At the end I will also be summarizing the overall project and also the learning that I gained through this project. 

First I was instructed by my mentor to add my name as the owner of the project in the license files of both the Python package and the API. The license included is the Apache License Version 2.0. This license has the following line in it. 

Copyright [yyyy] [name of copyright owner]

This was changed as follows. 

Copyright 2016 Hansika Hewamalage 

After that I added more test cases to the scrape_quora Python package. I created a list of Quora user names and sent them one by one to the routes of the QUserAPI. After doing all these it was required to push both the package and the API to github again. Also I had to push the package to the Python package index once again with a new version number(0.1.3). CHANGES.txt file and the setup.py file of the package were updated accordingly before pushing the package to PyPI. 

Then I redeployed the API on Heroku. Before redeploying the requirements.txt file was updated to include the latest version of the scrape_quora package. Then redeployment was done according to the tutorial at https://devcenter.heroku.com/articles/getting-started-with-python#push-local-changes. Redeployment is all about pushing local changes to the git repository with the remote heroku. The following sequence of commands was followed in order to achieve this. 

First add the modified files to the local git repository. 
git add . 

Next commit the changes to the repository. 
git commit -m “license file and the requirements.txt file modified” 

Deploy the source code to Heroku. 
git push heroku master 

Check whether its deployed properly. 
heroku open 

Once all the steps were followed, the API was successfully redeployed on Heroku. 

Next I will be summarizing all the work that I did throughout the last ten weeks. My project was to develop an API that retrieves Quora(https://www.quora.com/) user profile information given the user name(available at https://github.com/hansika/QUserAPI). This was based on a Python package namely the scrape_quora Python package(available at https://github.com/hansika/pyquora) which was also developed by me. The chosen programming language was Python. First of all license files were added to the projects QUserAPI and pyquora(the Python package). After much exploration the chosen license was Apache License Version 2.0. 

Then the first three weeks were dedicated to learning Python by following the exercises available at http://learnpythonthehardway.org/book/preface.html. All the completed exercises were pushed into a git repository(available at https://github.com/hansika/LearnPython). It was a bit of a struggle during the first few weeks since my university exams clashed with the program. Somehow I managed to complete the backlogs and be on the track. The fourth week was a much important week since I got exposed into the world of web scraping. I completed the video tutorial available at https://www.youtube.com/watch?v=ic8ygbac5lo which teaches the fundamentals of web scraping using Scrapy library. It uses the example code available at https://github.com/tapasweni-pathak/Talks-and-Workshops/tree/master/PyLadies%20Remote/Code%20Examples. It was much easier for me to grasp the concepts using this example. I also referred the python package developed by mentor namely pyhoroscope(available at https://github.com/tapasweni-pathak/pyhoroscope). The purpose of this package is to fetch and parse data from GaneshaSpeaks. This is actually the base on which I developed my package, scrape_quora. 

Next, during the first half of the fifth week I got introduced to a number of new Python packages namely, etree, re, urllib2 and xpath which were needed to develop scrape_quora. I learned about them by trying out different commands related to these packages. My learning is available at the github repository https://github.com/hansika/LITG_Practice. Furthermore I learned about Python dictionaries which were also needed in a future week of the program. During the rest of the fifth week and the beginning of the sixth week I finished coding the python package scrape_quora. This package basically scrapes the name of the user, URL of the user profile, profile picture link, follower count, following count, count of edits, number of answers and the number of questions given the user name of the profile as the input. For a future improvement we can also scrape the Facebook, LinkedIn profile links which actually requires login to Quora. 

The task during the rest of the sixth week was to push the package into the Python Package Index. It was first tested by pushing the package into the test server. I ran into many issues while doing this but with the help of my mentor I could overcome all of them. The package pushed to the test server is available at https://testpypi.python.org/pypi?:action=display&name=scrape_quora&version=0.1.3. The package pushed to the live server is available at https://pypi.python.org/pypi?:action=display&name=scrape_quora&version=0.1.3. The next task was to code the API using this Python package. Before that during the seventh week I learned about Flask Python framework. Flask was needed to code the API. I pushed all my learning related to Flask into the github repository at https://github.com/hansika/Flask_Learning. I tried out very simple commands using Flask to get a basic understanding. Furthermore during this week I also followed the Horoscope API(https://github.com/tapasweni-pathak/Horoscope-API) developed by my mentor using the aforementioned pyhoroscope package. This was used as the basis for developing the QUserAPI. 

The eighth week was dedicated to coding the API. I could successfully finish coding the API during this week. Then during the ninth week it was time to start learning about the Heroku platform. The blog post of the ninth week was dedicated to summarizing my learning related to Heroku. Then during the tenth week which was the final week of the project work I deployed the developed API onto the Heroku platform. Even while doing this I ran into many difficulties. But with enough Google search I could overcome all of them. The official documentation of Heroku available at https://devcenter.heroku.com/articles/getting-started-with-python#introduction was much useful in all these issues. Likewise I could successfully deploy the API which is now accessible via http://quser-api.herokuapp.com/

This API is a much needed piece of software for those developers who work on more complex tasks such as data mining using the information extracted from Quora user profiles. All the instructions related to using the API are available at the github repository at https://github.com/hansika/QuserAPI. The API has been developed in a much intuitive manner to both use and also tounderstand. 

As a whole the LITG program has brought me many important lessons to my life. First of all I should say that this was my first time working in such an international program under the mentorship of a foreigner. It brought me many new experiences. Apart from the new technolgies and the programming languages learned, I could collect many good experiences to climb up the career ladder. I should emphasize the support given to me by my mentor Tapasweni Pathak throughout this project to successfully complete everything. If it was not for the feasible and end to end schedule created by her I would not be able to complete the tasks timely. Because of that I could get a wondeful experience in working accroding to prescheduled timelines. Esepecially during the first few weeks when I had my university exams we often had to refine the timeline to cover the backlogs. Furthermore writing blog posts every week imrpoved my writing skills. Also it was a good way of keeping a note of all the new learning throughout the week. Since we tend to forget easily anything that we learn, writing blogs is a good way to go back and revise what we learn. This habit I hope to continue throughout the rest of my work as well. Another good thing I learned is to try out and actually do something related to whatever new technologies that I learn. When such new learning is pushed into a github repository this new knowledge will be available for the future as well. This is another good habit that I hope to continue. 

All in all the LITG program was a great influence for me to add many good habits to my career path. The new experiences and the learning gained throughout this program will be much needed and helpful for me ahead in my life to achieve my career goals.

Tenth Week of LITG Program

It is the tenth week of the LITG program and we have reached almost the end of the project. This week is dedicated to pushing the developed API to the Heroku platform with the basic understanding gained during the last week.

I ran into a number of difficulties while pushing the API to Heroku. The tutorial that I was referring at the first place was outdated which resulted in giving me many errors when following it. After some effort and a bit of Google search I could find this official documentation of the Heroku platform available at https://devcenter.heroku.com/articles/getting-started-with-python#introduction. This tutorial gives step by step guidance to successfully deploy a Python application on the Heroku platform in a much intuitive manner. It also gives instructions to deploy an app locally so that we can test it using localhost. Nevertheless I directly deployed the app on Heroku.

According to this tutorial there is a number of steps which should be followed in order to deploy a Python application to the Heroku platform. For these steps to be successful, two more files should be added to the API namely, the requirements.txt file and the Procfile. These will be explained next.

requirements.txt File


As mentioned in the tutorial at https://devcenter.heroku.com/articles/getting-started-with-python#declare-app-dependencies the purpose of this file is to declare the app dependencies. Heroku recognizes an app as a Python app by the existence of this file in the root directory. For example the QUserAPI that I developed contains the following set of Python packages along with their versions in the requirements.txt file as dependencies.

Flask==0.10.1
Jinja2==2.8
Werkzeug==0.11.3
gunicorn==19.4.5
itsdangerous==0.24
MarkupSafe==0.23
newrelic==2.60.0.46
scrape_quora==0.1.3
wsgiref==0.1.2
lxml==3.5.0

When an app is deployed, Heroku reads this file and installs the appropriate Python dependencies using the pip install -r requirements.txt command.

Procfile


As mentioned at https://devcenter.heroku.com/articles/getting-started-with-python#define-a-procfile this file included within the root directory of the app explicitly defines the command that should be executed to start the app. For example QUserAPI contains the following command in the Procfile.

web: newrelic-admin run-program gunicorn -b 0.0.0.0:$PORT server:app

This file declares a single process type, web, and the command needed to run it. The name web declares that this process type will be attached to the HTTP routing stack of Heroku, and receive web traffic once deployed.

There are two other Python packges listed in this command namely, newrelic and gunicorn. newrelic is a package that instruments our applications for performance monitoring and advanced performance analytics with New Relic. It helps to trace performance issues of applications even while monitoring them at production environments. On the other hand gunicorn which is the shortened form of 'Green Unicorn' is a Python WSGI HTTP Server for UNIX and it is broadly compatible with various web frameworks. It basically helps in transforming Python code to run on HTTP.

The $PORT piece of the command instructs Heroku to deploy the app on whatever port that is free at the moment.

Once these files and the code for the API are ready we can move forward to follow the steps to deploy the app on Heroku.

Deploying the Application


The steps to deploy the app fall into a number of sub activities. These steps will be discussed under those sub activities.

Initial Steps


1. Create a free Heroku account
This is required since authentication is needed for heroku and git commands to work in an upcoming step.

2. Install virtualenv locally using the command pip install virtualenv on the terminal. 
(In addition to this, it is required to have a Python version installed on the system. In my case I already had Python installed.)


Set up


3. Install the Heroku Toolbelt which provides access to the Heroku Command Line Interface(CLI). The documentation at https://devcenter.heroku.com/articles/getting-started-with-python#set-up provides the facility to download the version of Toolbelt compatible with the OS used. Once it is installed we can use the heroku command from the terminal.

4. The next step is to login to heroku using heroku login command on the terminal.
The email address and the password of the created free Heroku account can be used for this login.


Prepare the app


5. First go to the project folder(root directory) using the cd command on the terminal.


Deploy the app


6. Create an app on Heroku using the command heroku create which prepares Heroku to receive the source code.
By default Heroku gives some randomly generated name to this app which also comes on the URL to access the application once deployed. I ran into a small issue at this point. At first I did not know that Heroku gives such a default name to the app. Therefore I ran this command with no arguments. Then it gave a name as floating-taiga-50750. I wanted to change the name to quser-api. Therefore I ran the command as heroku create quser-api. Then it created an app with the name quser-api.

As mentioned in the documentation, when a new app is created in this manner, a git remote called heroku is also created and associated with this local git repository.

7. Next deploy the source code using the command git push heroku master.

At this step I got the following error.

remote: Compressing source files... done.
remote: Building source:
remote:
remote:

remote: ! Push rejected, no Cedar-supported app detected
remote: HINT: This occurs when Heroku cannot detect the buildpack
remote: to use for this application automatically.
remote: See https://devcenter.heroku.com/articles/buildpacks
remote:
remote: Verifying deploy...
remote:
remote: ! Push rejected to floating-taiga-50750.
remote:
To https://git.heroku.com/floating-taiga-50750.git
! [remote rejected] master -> master (pre-receive hook declined)
error: failed to push some refs to 'https://git.heroku.com/floating-taiga-50750.git'

I noticed that the name of the app in the error is the random name(floating-taiga-50750) first generated by Heroku and not the name that I created next. After some Google search I could find the fix available at http://stackoverflow.com/questions/31330587/heroku-error-message-no-cedar-supported-app-detected. According to this article, reinitializing the .git files can fix the issue. So I followed the following commands in the given order to resolve the error.

rm -rf .git
git init
git add .
git commit -am "Reinitialize"
heroku create quser-api

I changed the name to quser-api since I need the application to have that name on Heroku. But then I got another error.

Creating ⬢ quser-api... !!!
▸ Name is already taken

To solve this problem I removed all the exiting apps under my user account on the web dashboard of Heroku. Then I ran the heroku create quser-api command again and the issue was resolved. Next I ran the git push heroku master command to deploy the project on Heroku.

8. Visit the app at the URL generated by the app name(http://quser-api.herokuapp.com/). 
We can also use the command heroku open on the terminal as a shortcut to open the website. Likewise I tested all the routes of the API at all the URLs.

The app was successfully deployed on Heroku. With this, I have completed all the work of my project about creating an API to scrape Quora user profiles. The final touch ups and my overall learning throughout the project will be summarized in the next week's blog post.