Monday, May 2, 2016

Twelfth Week of LITG Program

On request by my mentor Tapasweni Pathak, I write this blog post to sum up the overall experience I gained via the LITG program. This is a bunch of questions and answers regarding the program. 

How was the experience with Learn IT Girl? 

It was a wonderful, educative and a rich experience indeed filled with many lessons to mold my career path. 

Was the time plan unmanageable? Did the project make you spend more hours than expected?

No. The time line was quite feasible thanks to the support and the guidance of my mentor. I spent nearly ten hours per week on the project work (nearly three hours on the five weekdays and the other seven during the weekend). Depending on the workload due for the week sometimes I had to work few extra hours(especially while coding the Python package). Nevertheless I could successfully balance the project work with the other academic work at the university. 

What all did you learn? 

Python programming language 
Selecting the most appropriate type of license for a given project 
Working with Github 
Web scraping with Scrapy library 
Functionality of Python packages such as etree, re, urllib2, xpath 
Developing a Python package 
Pushing a Python package into the Python Package Index 
Flask Python Framework 
Developing an API using Python and Flask Framework 
Deploying a Python application on Heroku 

Apart from these I could learn the following as well which would be much useful to me in future.

Working with tight schedules on strict deadlines 
Keeping record of the work by writing weekly blog posts 
Communicating with international mentors on project related professional matters 

Why did you choose to learn Python and why this project?

Python is a widely used language today but I had zero knowledge about it. Also it is a very easy to learn high-level language with many libraries and built-in support which attracts many developers. Therefore Python knowledge would indeed be a plus point for me in my career as a Software Engineer. That is why I selected to learn Python.

The project is about developing a Python API to extract the following Quora user information given the user name of the profile as the input. 

Name of the user 
URL of the user profile 
Profile picture link 
Follower count 
Following count 
Count of edits 
Number of answers 
Number of questions 

Future improvements include extracting the Facebook, LinkedIn profile links which actually requires login to Quora. This API is based on a Python package which performs the web scraping part of the project. Web scraping was a whole new concept for me. I had never really worked on it before. Also I had so many learning material available to grasp the basic concepts. The Horoscope API(https://github.com/tapasweni-pathak/Horoscope-API) and the pyhoroscope package(https://github.com/tapasweni-pathak/pyhoroscope) developed by my mentor follows the same concepts. There are many blog posts written by my mentor about developing Python packages and APIs. Therefore it was much convenient for me to learn the fundamentals of Python and other required technologies within a very short period of time. This was the major reason why I chose this project. 

How can your project help others? 

Both the API and the package can be used by developers in their projects. Also developers can volunteer to add more features to the package and improve it. 

The Github repository for the QUserAPI - https://github.com/hansika/QUserAPI
(Instructions to use the API are available at this link) 

The Github repository for the scrape_quora Python package - https://github.com/hansika/pyquora

The package can be installed in isolation(without using the API) by using the command 'pip install scrape_quora' 

This API along with the Python package can be used by developers who work on far more complicated tasks such as data mining using the information extracted from Quora user profiles. The API has been developed in a much intuitive manner for the developers to understand and use. 

The things that you would like to change in the next round of Learn IT Girl to make it better? 

I think the mid evaluation should be much more structured. Currently the mid evaluation is more like a self evaluation where the mentee evaluates her own learning. Not even the mentor is involved in this process. But I feel that if the mentee is requested to produce some working piece of software during the mid evaluation on which the evaluators(including the mentor) can give some constructive feedback, she will be even more motivated to do better throughout the rest of the program. Personally I would find this very helpful since it is always feedback and comments from people that keep me going and produce even better work. 

Would you like to mentor in next round? 

Yes. I would like to mentor in the next round. I would love to give something in return with the knowledge I gained and help a girl learn a new programming language. 

What would you like to advice next year mentees? 

Choosing the language that best fits you is critical. It should be something that you have zero knowledge and also something that would be valuable enough to spend three months of your time to learn. Go for a language that you think will remain in the industry at least for a few years. 

Choose a project that fits your potential. It should not be too complex and also not something too simple that you can complete within a few weeks. Choose something which will teach you enough new concepts to digest within a period of three months. 

Have a proper time plan. Get the help of your mentor to create it. Your work break down structure should be such that it does not clash or disturb your other personal or academic work. Allocate time so that you can complete the work due for each week. You know your work schedule more than anyone else. 

Above all have the passion to learn. Try out things by yourself. Bother your mentor only when you really cannot resolve your issues for several days. Do enough Google search before you ask from your mentor. It is your time to learn so take the maximum out of it. 

What were the things that you didn’t like about your mentor? 

My mentor was Tapasweni Pathak. She was a great mentor helping me immensely from the point of creating the time-line to the point of writing this final blog post. A bit strict at times especially when I was lagging behind due to my exams but of course a motivative character who actually drove me complete my tasks timely by adding deadlines. She is an awesome supervisor who can make anything work.

Sunday, May 1, 2016

Eleventh Week of LITG Program

It is the eleventh week of the LITG program. During this week I did some final testing and cleaning of the project as instructed by my mentor. These tasks will be explained first in this blog post. At the end I will also be summarizing the overall project and also the learning that I gained through this project. 

First I was instructed by my mentor to add my name as the owner of the project in the license files of both the Python package and the API. The license included is the Apache License Version 2.0. This license has the following line in it. 

Copyright [yyyy] [name of copyright owner]

This was changed as follows. 

Copyright 2016 Hansika Hewamalage 

After that I added more test cases to the scrape_quora Python package. I created a list of Quora user names and sent them one by one to the routes of the QUserAPI. After doing all these it was required to push both the package and the API to github again. Also I had to push the package to the Python package index once again with a new version number(0.1.3). CHANGES.txt file and the setup.py file of the package were updated accordingly before pushing the package to PyPI. 

Then I redeployed the API on Heroku. Before redeploying the requirements.txt file was updated to include the latest version of the scrape_quora package. Then redeployment was done according to the tutorial at https://devcenter.heroku.com/articles/getting-started-with-python#push-local-changes. Redeployment is all about pushing local changes to the git repository with the remote heroku. The following sequence of commands was followed in order to achieve this. 

First add the modified files to the local git repository. 
git add . 

Next commit the changes to the repository. 
git commit -m “license file and the requirements.txt file modified” 

Deploy the source code to Heroku. 
git push heroku master 

Check whether its deployed properly. 
heroku open 

Once all the steps were followed, the API was successfully redeployed on Heroku. 

Next I will be summarizing all the work that I did throughout the last ten weeks. My project was to develop an API that retrieves Quora(https://www.quora.com/) user profile information given the user name(available at https://github.com/hansika/QUserAPI). This was based on a Python package namely the scrape_quora Python package(available at https://github.com/hansika/pyquora) which was also developed by me. The chosen programming language was Python. First of all license files were added to the projects QUserAPI and pyquora(the Python package). After much exploration the chosen license was Apache License Version 2.0. 

Then the first three weeks were dedicated to learning Python by following the exercises available at http://learnpythonthehardway.org/book/preface.html. All the completed exercises were pushed into a git repository(available at https://github.com/hansika/LearnPython). It was a bit of a struggle during the first few weeks since my university exams clashed with the program. Somehow I managed to complete the backlogs and be on the track. The fourth week was a much important week since I got exposed into the world of web scraping. I completed the video tutorial available at https://www.youtube.com/watch?v=ic8ygbac5lo which teaches the fundamentals of web scraping using Scrapy library. It uses the example code available at https://github.com/tapasweni-pathak/Talks-and-Workshops/tree/master/PyLadies%20Remote/Code%20Examples. It was much easier for me to grasp the concepts using this example. I also referred the python package developed by mentor namely pyhoroscope(available at https://github.com/tapasweni-pathak/pyhoroscope). The purpose of this package is to fetch and parse data from GaneshaSpeaks. This is actually the base on which I developed my package, scrape_quora. 

Next, during the first half of the fifth week I got introduced to a number of new Python packages namely, etree, re, urllib2 and xpath which were needed to develop scrape_quora. I learned about them by trying out different commands related to these packages. My learning is available at the github repository https://github.com/hansika/LITG_Practice. Furthermore I learned about Python dictionaries which were also needed in a future week of the program. During the rest of the fifth week and the beginning of the sixth week I finished coding the python package scrape_quora. This package basically scrapes the name of the user, URL of the user profile, profile picture link, follower count, following count, count of edits, number of answers and the number of questions given the user name of the profile as the input. For a future improvement we can also scrape the Facebook, LinkedIn profile links which actually requires login to Quora. 

The task during the rest of the sixth week was to push the package into the Python Package Index. It was first tested by pushing the package into the test server. I ran into many issues while doing this but with the help of my mentor I could overcome all of them. The package pushed to the test server is available at https://testpypi.python.org/pypi?:action=display&name=scrape_quora&version=0.1.3. The package pushed to the live server is available at https://pypi.python.org/pypi?:action=display&name=scrape_quora&version=0.1.3. The next task was to code the API using this Python package. Before that during the seventh week I learned about Flask Python framework. Flask was needed to code the API. I pushed all my learning related to Flask into the github repository at https://github.com/hansika/Flask_Learning. I tried out very simple commands using Flask to get a basic understanding. Furthermore during this week I also followed the Horoscope API(https://github.com/tapasweni-pathak/Horoscope-API) developed by my mentor using the aforementioned pyhoroscope package. This was used as the basis for developing the QUserAPI. 

The eighth week was dedicated to coding the API. I could successfully finish coding the API during this week. Then during the ninth week it was time to start learning about the Heroku platform. The blog post of the ninth week was dedicated to summarizing my learning related to Heroku. Then during the tenth week which was the final week of the project work I deployed the developed API onto the Heroku platform. Even while doing this I ran into many difficulties. But with enough Google search I could overcome all of them. The official documentation of Heroku available at https://devcenter.heroku.com/articles/getting-started-with-python#introduction was much useful in all these issues. Likewise I could successfully deploy the API which is now accessible via http://quser-api.herokuapp.com/

This API is a much needed piece of software for those developers who work on more complex tasks such as data mining using the information extracted from Quora user profiles. All the instructions related to using the API are available at the github repository at https://github.com/hansika/QuserAPI. The API has been developed in a much intuitive manner to both use and also tounderstand. 

As a whole the LITG program has brought me many important lessons to my life. First of all I should say that this was my first time working in such an international program under the mentorship of a foreigner. It brought me many new experiences. Apart from the new technolgies and the programming languages learned, I could collect many good experiences to climb up the career ladder. I should emphasize the support given to me by my mentor Tapasweni Pathak throughout this project to successfully complete everything. If it was not for the feasible and end to end schedule created by her I would not be able to complete the tasks timely. Because of that I could get a wondeful experience in working accroding to prescheduled timelines. Esepecially during the first few weeks when I had my university exams we often had to refine the timeline to cover the backlogs. Furthermore writing blog posts every week imrpoved my writing skills. Also it was a good way of keeping a note of all the new learning throughout the week. Since we tend to forget easily anything that we learn, writing blogs is a good way to go back and revise what we learn. This habit I hope to continue throughout the rest of my work as well. Another good thing I learned is to try out and actually do something related to whatever new technologies that I learn. When such new learning is pushed into a github repository this new knowledge will be available for the future as well. This is another good habit that I hope to continue. 

All in all the LITG program was a great influence for me to add many good habits to my career path. The new experiences and the learning gained throughout this program will be much needed and helpful for me ahead in my life to achieve my career goals.

Tenth Week of LITG Program

It is the tenth week of the LITG program and we have reached almost the end of the project. This week is dedicated to pushing the developed API to the Heroku platform with the basic understanding gained during the last week.

I ran into a number of difficulties while pushing the API to Heroku. The tutorial that I was referring at the first place was outdated which resulted in giving me many errors when following it. After some effort and a bit of Google search I could find this official documentation of the Heroku platform available at https://devcenter.heroku.com/articles/getting-started-with-python#introduction. This tutorial gives step by step guidance to successfully deploy a Python application on the Heroku platform in a much intuitive manner. It also gives instructions to deploy an app locally so that we can test it using localhost. Nevertheless I directly deployed the app on Heroku.

According to this tutorial there is a number of steps which should be followed in order to deploy a Python application to the Heroku platform. For these steps to be successful, two more files should be added to the API namely, the requirements.txt file and the Procfile. These will be explained next.

requirements.txt File


As mentioned in the tutorial at https://devcenter.heroku.com/articles/getting-started-with-python#declare-app-dependencies the purpose of this file is to declare the app dependencies. Heroku recognizes an app as a Python app by the existence of this file in the root directory. For example the QUserAPI that I developed contains the following set of Python packages along with their versions in the requirements.txt file as dependencies.

Flask==0.10.1
Jinja2==2.8
Werkzeug==0.11.3
gunicorn==19.4.5
itsdangerous==0.24
MarkupSafe==0.23
newrelic==2.60.0.46
scrape_quora==0.1.3
wsgiref==0.1.2
lxml==3.5.0

When an app is deployed, Heroku reads this file and installs the appropriate Python dependencies using the pip install -r requirements.txt command.

Procfile


As mentioned at https://devcenter.heroku.com/articles/getting-started-with-python#define-a-procfile this file included within the root directory of the app explicitly defines the command that should be executed to start the app. For example QUserAPI contains the following command in the Procfile.

web: newrelic-admin run-program gunicorn -b 0.0.0.0:$PORT server:app

This file declares a single process type, web, and the command needed to run it. The name web declares that this process type will be attached to the HTTP routing stack of Heroku, and receive web traffic once deployed.

There are two other Python packges listed in this command namely, newrelic and gunicorn. newrelic is a package that instruments our applications for performance monitoring and advanced performance analytics with New Relic. It helps to trace performance issues of applications even while monitoring them at production environments. On the other hand gunicorn which is the shortened form of 'Green Unicorn' is a Python WSGI HTTP Server for UNIX and it is broadly compatible with various web frameworks. It basically helps in transforming Python code to run on HTTP.

The $PORT piece of the command instructs Heroku to deploy the app on whatever port that is free at the moment.

Once these files and the code for the API are ready we can move forward to follow the steps to deploy the app on Heroku.

Deploying the Application


The steps to deploy the app fall into a number of sub activities. These steps will be discussed under those sub activities.

Initial Steps


1. Create a free Heroku account
This is required since authentication is needed for heroku and git commands to work in an upcoming step.

2. Install virtualenv locally using the command pip install virtualenv on the terminal. 
(In addition to this, it is required to have a Python version installed on the system. In my case I already had Python installed.)


Set up


3. Install the Heroku Toolbelt which provides access to the Heroku Command Line Interface(CLI). The documentation at https://devcenter.heroku.com/articles/getting-started-with-python#set-up provides the facility to download the version of Toolbelt compatible with the OS used. Once it is installed we can use the heroku command from the terminal.

4. The next step is to login to heroku using heroku login command on the terminal.
The email address and the password of the created free Heroku account can be used for this login.


Prepare the app


5. First go to the project folder(root directory) using the cd command on the terminal.


Deploy the app


6. Create an app on Heroku using the command heroku create which prepares Heroku to receive the source code.
By default Heroku gives some randomly generated name to this app which also comes on the URL to access the application once deployed. I ran into a small issue at this point. At first I did not know that Heroku gives such a default name to the app. Therefore I ran this command with no arguments. Then it gave a name as floating-taiga-50750. I wanted to change the name to quser-api. Therefore I ran the command as heroku create quser-api. Then it created an app with the name quser-api.

As mentioned in the documentation, when a new app is created in this manner, a git remote called heroku is also created and associated with this local git repository.

7. Next deploy the source code using the command git push heroku master.

At this step I got the following error.

remote: Compressing source files... done.
remote: Building source:
remote:
remote:

remote: ! Push rejected, no Cedar-supported app detected
remote: HINT: This occurs when Heroku cannot detect the buildpack
remote: to use for this application automatically.
remote: See https://devcenter.heroku.com/articles/buildpacks
remote:
remote: Verifying deploy...
remote:
remote: ! Push rejected to floating-taiga-50750.
remote:
To https://git.heroku.com/floating-taiga-50750.git
! [remote rejected] master -> master (pre-receive hook declined)
error: failed to push some refs to 'https://git.heroku.com/floating-taiga-50750.git'

I noticed that the name of the app in the error is the random name(floating-taiga-50750) first generated by Heroku and not the name that I created next. After some Google search I could find the fix available at http://stackoverflow.com/questions/31330587/heroku-error-message-no-cedar-supported-app-detected. According to this article, reinitializing the .git files can fix the issue. So I followed the following commands in the given order to resolve the error.

rm -rf .git
git init
git add .
git commit -am "Reinitialize"
heroku create quser-api

I changed the name to quser-api since I need the application to have that name on Heroku. But then I got another error.

Creating ⬢ quser-api... !!!
▸ Name is already taken

To solve this problem I removed all the exiting apps under my user account on the web dashboard of Heroku. Then I ran the heroku create quser-api command again and the issue was resolved. Next I ran the git push heroku master command to deploy the project on Heroku.

8. Visit the app at the URL generated by the app name(http://quser-api.herokuapp.com/). 
We can also use the command heroku open on the terminal as a shortcut to open the website. Likewise I tested all the routes of the API at all the URLs.

The app was successfully deployed on Heroku. With this, I have completed all the work of my project about creating an API to scrape Quora user profiles. The final touch ups and my overall learning throughout the project will be summarized in the next week's blog post.

Saturday, April 23, 2016

Ninth Week of LITG Prorgam

So far in the LITG program I have coded the pyquora package and pushed it into the Python Package Index. I have also coded QuserAPI which uses the pyquora package to return Quora user profile information in json format via REST calls. Information related to all these steps are included in the blog posts written thus far. Now the final bit left is to deploy this API onto the Heroku platform and test its functionalities. Therefore this week is dedicated to finding out about Heroku and acquiring a basic understanding. After that during the next week I will be moving onto deploying the API on Heroku platform. My learning about the Heroku platform is summarized in this blog post. 


The Heroku Platform

Simply put, Heroku provides a cloud based platform(Platform as a Service - PaaS) for deploying and running modern apps. It is completely free and only charges the developers as they grow. It is based on a managed container system. A smart container also known as a dyno is an instance of the application running and responding to requests. Heroku provides one dyno for free. Heroku also has integrated data services. Developers do not have to worry about discovering how to optimally provision a database through trial and error. They already have immediate access to a scalable, highly available database with rollback, one that supports their apps and development style. Heroku is also embedded with a powerful ecosystem. 

The platform relieves the developers from the infrastructure headaches and lets them focus on developing great apps. The objective of the Heroku platform is to make the process of deploying, configuring, scaling, tuning, and managing apps as simple and straightforward as possible. This makes the Heroku developer experience an app-centric one for software delivery integrated with the most popular developer tools and workflows today. There are three key important facts regarding the Heroku platform. They are mentioned below. 

Heroku Runtime

As mentioned before Heroku runs all apps inside dynos which are smart containers on a reliable, fully managed runtime environment. Developers can deploy their code written in Node, Ruby, Java, PHP, Python, Go, Scala, or Clojure. This runtime keeps apps running without any manual intervention.

Heroku Developer Experience

The Heroku Developer Experience refers to an app-centric approach to software delivery. Therefore developers can focus only on creating and continuously delivering applications, without worrying about servers or the underlying infrastructure. Developers can deploy directly from popular tools like Git, GitHub or Continuous Integration (CI) systems. There is also a web-based Heroku Dashboard which makes it much easier to manage the app and gain insight into the performance of the app.

Data Services and Ecosystem

Heroku Elements provide the facility for the developers to extend their apps with Add-ons and customize their application stack with Buildpacks. Add-ons are 3rd party cloud services that developers can use to immediately extend their apps with a range of functionality such as data stores, logging, monitoring and much more. Heroku provides two fully-managed data service Add-ons namely Heroku Postgres and Heroku Redis.

With this basic understanding gained I am looking forward to deploy QuserAPI on Heroku during the next week. When deploying there will be several other requirements such as the Procfile and the requirements.txt file which will be discussed in detail in the blog post of the coming week. 

Friday, April 15, 2016

Eighth Week of LITG Program

The task of the 8th week is to code the API(QUserAPI) to return information from Quora user profiles. This API uses the pyquora package developed throughout the fifth week of the LITG program. I followed the Horoscope-API(https://github.com/tapasweni-pathak/Horoscope-API) developed by my mentor, as a reference to learn when coding QuserAPI. The final API is available on the github repository at https://github.com/hansika/QuserAPI. This API consists of the following files. 
  • License.md 
  • Profile 
  • README.md 
  • requirements.txt 
  • server.py 
Out of these, the server.py is the file coded within this week. The License.md and the README.md files were added to the project way back during the first and the second weeks of the program. Of course the README.md file was modified during this week to include the features of the API. The Profile and the requirements.txt are two files needed by the file structure of Heroku, where we are expecting to deploy the API over the next weeks. These files will be explained in detail in a blog post in an upcoming week. 

The server.py file was coded similar to the server.py file of Horoscope-API which was studied in depth during the last week. It uses Flask Python framework. This file has one method for each of the features of the pyquora package. These features and their corresponding methods are as follows. 
  • Quora Profile - profile_name_route(user_name)
  • NameQuora Profile Picture Link - profile_picture_link_route(user_name)
  • Quora Profile URL - url_route(user_name)
  • Number of Questions - no_of_questions_route(user_name)
  • Number of Answers - no_of_answers_route(user_name)
  • Number of Followers - no_of_followers_route(user_name)
  • Number of Following - no_of_questions_following(user_name)
  • Number of Edits - no_of_edits_route(user_name)
All the above methods call the methods of the pyquora package in order to scrape Quora user profiles. Also there is another method named index_route() which returns additional details related to the API such as the author, project name, project url, project issues, base url and end points. All these methods have their own route decorators and all the methods have used GET requests. Furthermore, all the methods return their results in the form of a json object using the jsonify method of Flask which was discussed in detail in the blog post of the seventh week. Few example route decorators and their corresponding methods are shown below. 
  • index_route() - @app.route('/', methods=['GET'])
  • profile_name_route(user_name) - @app.route('/profile/name/<user_name>', methods=['GET'])
  • profile_picture_link_route(user_name) - @app.route('/profile/picture_link/<user_name>', methods=['GET'])
  • url_route(user_name) - @app.route('/url/<user_name>', methods=['GET'])
  • no_of_answers_route(user_name) - @app.route('/profile/number/answers/<user_name>', methods=['GET'])
The tasks to be completed during the next weeks are to read, find out about Heroku and finally deploy this API on Heroku. Therefore, the next blog post will be basically about what is Heroku. 

Thursday, April 14, 2016

Sixth Week of LITG Program

The main task expected to be completed during the sixth week is to push the developed scrape_quora package into the Python Package Index which is also known as PyPI as a shortened form. Python Package Index is a repository of software for the Python programming language. At the moment of writing this article, there are 78557 packages in the package index. Once you have created some awesome piece of software using Python, you can simply push it to the Python package index and let people install it using pip install. You can also use PyPI's test server to test the developed package. Pushing a package to the Python package index requires a special directory structure. This was explained in detail in the blog post titled 'Fifth Week of LITG Program - Part 2'. Nevertheless I re-post the required directory structure in this article as well.

Python Package Directory Structure

Once the package is ready, we need few other things before moving onto pushing the package into the Python Package Index. 

Before pushing the package directly to the live server we need to push it to the test server and test the package using the pip install command. Therefore first of all we need to have user accounts on both these servers. 

  • .pypirc configuration file:
This file basically contains the information to authenticate the user with PyPI test and live servers. On a Linux machine, this configuration file should be in the home directory. 

Once we are done with all these steps, all we have to do is to work through a sequence of terminal commands in order to push the package into the Python Package Index.


1.   Register the package against PyPI's test server. 
python setup.py register -r test 

At this step I got several errors. First I got the following error.

Traceback (most recent call last):
 File "setup.py", line 10, in <module>
 packages = ['scrape_quora']
 File "/usr/lib/python2.7/distutils/
 dist.run_commands()
 File "/usr/lib/python2.7/distutils/
 self.run_command(cmd)
 File "/usr/lib/python2.7/distutils/
 cmd_obj.run()
 File "/usr/lib/python2.7/dist-
 _register.run(self)
 File "/usr/lib/python2.7/distutils/
 self._set_config()
 File "/usr/lib/python2.7/distutils/
 config = self._read_pypirc()
 File "/usr/lib/python2.7/distutils/
 current['username'] = config.get(server, 'username')
 File "/usr/lib/python2.7/
 raise NoSectionError(section)
ConfigParser.NoSectionError: No section: 'pypi # the live PyPI'

I had to do some Google search and also play around with the content of the .pypirc configuration file to resolve this error. My .pypirc file looked as follows prior to the error.


[distutils] # this tells distutils what package indexes you can push to
index-servers =
    pypi # the live PyPI
    test # test PyPI

[test] # authentication details for test PyPI
repository = 'https://testpypi.python.org/pypi
username = <your_user_name>
password = <your_password>

[pypi] # authentication details for live PyPI
repository = https://pypi.python.org/pypi
username = <your_user_name>
password = <your_password>

I had to remove the two comments highlighted in yellow to resolve the aforementioned error. After that I got a different error. 

Registering scrape_quora to 'https://testpypi.python.org/pypi 
Server response (500): <urlopen error unknown url type: 'https> 

For this too, I did some Google search but still could not find a workaround. But later it was found that I have inserted an unwanted single quote(highlighted in red) at the beginning of the URL of the test server. After removing this I could very easily resolve this error. The next error encountered was regarding authentication details. It was basically a 401 authentication failed error. PyPI live and test servers provide the option for the users to login with their gmail accounts. Therefore first I used my gmail address as the username and the gmail password as the password to login to the PyPI live and test servers. These details were included in the .pypirc file. Later my mentor instructed me to create user accounts on both the websites instead of using the default gmail credentials. Once this was done, I changed the credentials of the .pypirc file to these new login details. After that, the 401 error was resolved. I could successfully register the package against PyPI's test server.

2.   Upload the package to PyPI's test server to test. 

python setup.py sdist upload -r test 

This command did not give any errors. I could successfully upload the package to PyPI's test server. But after uploading there was a small issue with the format of the README file. Also I was previously asked to add more test cases to the package. So I deleted the already uploaded package, made these two modifications and tried to re-upload the package with the same version number(0.1.0). Then I had the following error.

Submitting dist/scrape_quora-0.1.0.tar.gz to https://testpypi.python.org/pypi 
Upload failed (400): This filename has previously been used, you should use a different version. 

The version number of the package takes the following format. 

<major>.<minor>.<patch>

So I had to change the patch number once for every modification. After adding one more test case the patch number was changed to 1. Then after the change of the README file, the patch number was changed to 2. So the final version number was 0.1.2. The sequence of version changes was recorded under the CHANGES.txt file. The setup.py file was also updated accordingly. Once this was done, uploading the package to the test server went fine. The package is accessible at the test server on the URL https://testpypi.python.org/pypi/scrape_quora/0.1.2

3.   Install package from test PyPI server. 

pip install -i test scrape_quora 

I could successfully install the package from the test server using the above command. 

After testing with PyPI's test server I went ahead to upload the package to PyPI's live server. The following three commands were used for the process. 

1.   Register the package against PyPI's live server. 

python setup.py register -r pypi 


2.   Upload the package to the live server of PyPI. 

python setup.py sdist upload -r pypi 


3.   Install the package to the machine. 

pip install scrape_quora 


The uploaded python package is available at the URL https://pypi.python.org/pypi/scrape_quora/0.1.2. I also created a small Python file to test some of the features of the package. Those functionalities actually went all fine. 

During the next weeks I will be acquiring a basic understanding of how to develop an API using the Flask framework. I will indeed create an API that uses this package to retrieve Quora user account information as and when needed. 

Sunday, April 10, 2016

Seventh Week of LITG Program

The remaining tasks expected to be completed over the next few weeks are to code the API for returning the Quora user profile information and deploy it to Heroku. Prior to this, I was instructed by my mentor to acquire a fundamental understanding of how to work with Flask which is a Python web framework. It provides the developer with tools, libraries and technologies that allow to build a web application such as a blog, a wiki page or even a commercial website. Flask is a micro-framework. Micro-frameworks are normally frameworks with little to no dependencies to external libraries. The framework is light meaning that there are little dependencies to update. This also means that some times the developer has to do more work by himself. Flask has basically two dependencies namely, Werkzeug a WSGI utility library and jinja2 which is its template engine. 

I followed the documentation available at http://flask.pocoo.org/docs/0.10/quickstart/#http-methods to learn about Flask. I experimented with several special commands available with Flask. These exercises were pushed to the github repository at https://github.com/hansika/Flask_Learning

Using Flask in our Applications

First of all, if we want to use Flask in our web applications, we need to import Flask using the following statement.

from flask import Flask

The next task is to create an instance of the class Flask. The first argument is the name of the application's module or package. If it is a single module, the argument should be __name__. 

app = Flask(__name__)

To run the local server with our application we use the run() function as in the following statement. 

if __name__ == '__main__':
            app.run()

if __name__ == '__main__': makes sure the server only runs if the script is executed directly from the Python interpreter and not used as an imported module. Once these initial statements are included, we are good to go ahead and explore other functions of Flask.

Debug Mode

As changes are done to the code, we need to restart the server. But when debugging mode is on, the server will reload itself at code changes. This can be done in two ways.

app.debug = True                               or                                        app.run(debug=True)
app.run()

Routing

The route() decorator is used to bind a function to a URL. Following are few examples.

@app.route('/')
When running on localhost port 5000, this function can be called by using the URL http://127.0.0.1:5000/ 

@app.route('/hello')
This function can be called by using the URL http://127.0.0.1:5000/hello

Variable Rules

Variable rules are used to make certain parts of the URL dynamic. These special parts are denoted in the route as <variable_name>. These parts are passed to the function as keyword arguments. An example is shown below. 

@app.route('/user/<username>')

Optionally a converter can be used by specifying a rule with<converter:variable_name> as in the following example.

@app.route('/post/<int:post_id>')

Redirection Behavior

Consider the following example.

@app.route('/projects/')

Here, the URL with a trailing slash is similar to a folder on a file system. When we try to access it without the trailing slash, Flask will redirect it to the URL with the trailing slash. 

@app.route('/about')

In this example, there is no trailing slash. This is similar to the pathname of a file on UNIX-like systems. When you try to access this URL with the trailing slash, it produces a 404 “Not Found” error.

URL Building

To build a URL to a specific function we can use the url_for() function. It accepts the name of the function as first argument and a number of keyword arguments, each corresponding to the variable part of the URL rule. Unknown variable parts are appended to the URL as query parameters. Building a URL in this manner would be more beneficial than hard coding it especially when we want to change these URLs. We can then change these URLs in one go without having to remember to change URLs all over the place. Two examples are shown below.

url_for('login', next='/') - 'next' appended as a query parameter.
url_for('profile', username='John Doe') – username sent to the dynamic part of the URL.

Static Files

Dynamic web applications also need static files such as JavaScript files and CSS files. During development Flask can serve these static files as well. All we have to do is to create a folder called static in the package or next to the module and it will be available at '/static' on the application.

To generate URLs for static files, we need to use the special 'static' endpoint name.

url_for('static', filename='style.css')

The file has to be stored on the filesystem as static/style.css. When running on localhost this CSS file can be accessed via the URL http://127.0.0.1:5000/static/style.css

Rendering Templates

To render a HTML template, render_template() method can be used. For that we need to provide the name of the template and the variables that we have to pass to the template engine as keyword arguments. Shown below is one example of this. 

def hello(name=None):
        return render_template('hello.html', name=name)

Flask looks for templates in the templates folder. Therefore this templates folder should either be next to your module or else if it is a package, this folder should be inside the package.

The Request Object

For a request object the route decorator looks like shown below. 

@app.route('/login', methods=['POST', 'GET'])

The current request method is available by using the 'method' attribute. For example we can check if the method is is POST method by using the following statement.

if request.method == 'POST':

Furthermore, form data(data transmitted in a POST or PUT request) can be accessed using the form attribute. An example is shown below. 

if valid_login(request.form['username'], request.form['password']):

Horoscope-API 

After obtaining a basic knowledge about Flask, I learned about a sample REST API developed by my mentor. This API namely the Horoscope-API deployed at Heroku has been developed using Flask. It uses the horoscope package developed to fetch and parse data from GaneshaSpeaks. This API has methods to return Today's horoscope, Weekly horoscope, Monthly horoscope and Yearly horoscope. Also there is a method to know more about a given sunsign.

GET requests have been used for all the methods in the application. Shown below is the route decorator for the index_route function which returns basic details of the project and the author such as the author name, author URL, project name, project URL etc. 

@app.route('/', methods=['GET']) 

These details are returned as a json object. For that the jsonify() method of Flask has been used. jsonify method creates a response with the JSON representation of the given arguments. The arguments to this method can be any one of the following three forms. 

jsonify(**kwarg)

jsonify(mapping, **kwarg)

jsonify(iterable, **kwarg)

mapping is a positional argument. It actually takes the form of a dictionary having key-value pairs. The key becomes the key in the JSON object and the value becomes the corresponding value. **kwarg denotes a set of keyword arguments. When used with keyword arguments, the argument name becomes the key in the JSON object and the argument value becomes the corresponding value. This method can also accept a positional argument which is an iterable object. Each item in the iterable must itself be an iterable with exactly two objects. The first object of each item becomes a key in the JSON object, and the second object the corresponding value. If a key occurs more than once, the last value for that key becomes the corresponding value in the JSON object. Following is an example JSON response returned when using keyword arguments. 

return jsonify(username=g.user.username, email=g.user.email, id=g.user.id)

This returns the following JSON response. 

{
"username": "admin",
"email": "admin@localhost",
"id": 42
}

Horoscope-API uses the first two forms of arguments for the jsonify method. index_route method uses mapping object type positional argument and all the other methods use keyword arguments to the jsonify method.

Another special method used by the Horoscope-API is the dict method. This accepts the same set of arguments as mentioned above for the jsonify method. The task of this method is to return a new dictionary from the arguments passed to it. If no positional argument is given, an empty dictionary is created. The methods of the horoscope package return their results embedded in Python dictionary objects. Horoscope-API creates new dictionaries from these dictionary objects by using them as mapping object arguments to the dict method. Shown below is an example of two ways of using the dict method. Both the ways return the dictionary {"one": 1, "two": 2, "three": 3}.

a = dict(one=1, two=2, three=3)
b = {'one': 1, 'two': 2, 'three': 3}

With the knowledge gained regarding the Flask web framework and the Horoscope-API I will be starting to code the API to return the Quora user profile information over the next week.