Reading Files into Python using Pandas

So far we have done scripts using data we have coded ourselves, like creating a list in Visual Studio and putting some elements in it. However, this is neither practical nor realistic in the real world where databases and files exist. In a later post we will explore databases and how to connect to them using Python but for now let’s use real world data from online sources.

We will explore a dataset from the United Nations on imports/exports by Country. Open this link and search for “trade of goods”. A window will open with the results of the search, now click on:

Click on download, you want value separated, comma

This will download the file in a zip format. Unzip it to a custom folder in your C drive, call the folder “tradeDataSet”.  Once the file is in there rename it to just trade, it should look like this:

Now that we have the file in our computer, it is time to create a new Visual Studio Project, call the project “importDataFromCSV”. Check this post if you don’t remember how to create a new project. Also remember to change the environment to honeyBadger.

All right, since we want to do some data analysis on this data, we can use a library called Pandas. Using the instructions in this post (go to the Anaconda Navigator section), install this library using Anaconda Navigator. You want just that: Pandas.

Your Visual Studio environment should look like this:

Now let’s import this library into our project

import pandas as pd

Now every time we need something from this library we can use the alias pd instead of the full name of pandas.

Pandas, has an incredibly easy way to read a CSV file, you can read about it here.

So in our code we can just do the following:

import pandas as pd
#the head method prints the first 5 rows of the dataframe

Running this code produces the following result:

The dataset consists of 9 columns and we can see what the data is about, all commodities either import or exports and their value in US dollars. That’s interesting so far but what else can we learn about the data?

We can use the describe method to get some insights on the numerical columns of our dataframe

import pandas as pd
#the head method prints the first 5 rows of the dataframe

Describe gives us some basic statistics of the numerical columns, we can however see that applying this to Year is useless and there seems to be way too many zeros in Weight and Quantity so these 2 columns are little value for statistical analysis.

Let’s find out which countries are the biggest importers. This means we want to slice the dataframe by the Flow column on the value Import. However, hold on. We see in the top 5 rows that for this field there are 2 values: Import and Export. Are there any other values?

Let’s explore that

import pandas as pd
#the head method prints the first 5 rows of the dataframe
print (trade.groupby(['Flow']).describe())

We now see that this field has other values: Re-Import and Re-Export

And according to the glossary of the UN DB where we obtained the data, Re-Imports should be accounted for in statistics of Importing goods. So let’s get the Imports and Re-Imports:

import pandas as pd
#the head method prints the first 5 rows of the dataframe
#print (trade.groupby(['Flow']).describe())
print (tradeImports.groupby(['Flow']).describe())

Line Analysis

Line 8: Here we created a list with the row values we want from the dataset in the column Flow.

Line 9: Here we create a new dataframe called tradeImports. Then in this statement trade['Flow'].isin(valuesWeWant)  we “slice” the original trade dataframe in the column Flow using the valuesWeWant list.

Line 10: We print our new dataframe but first we use the method groupby. This aggregates the dataframe by the column Flow.

This produces only our new data frame with just Import and Re-Import.

Let’s check our new dataset tradeImports getting the first 10 records.

print (tradeImports.head())

Now, how can we get the top countries that have the most Trade (USD)?


print(tradeImports.groupby('Country or Area').sum().sort_values(by=['Trade (USD)'], ascending=False))

Whoa! So what is this statement?? Let’s dissect it.

Pandas allows you to concatenate operations on a dataframe, in this case the first operation is our groupby that we already used but now we use the Country or Area column. Then we sum the results of this and finally we sort the results by the Trade (USD) column in descending order.

This displays the top countries by trade which is what we wanted. We should note however that this is the total summation of the dataset, it does not account for changes over the year (the year column has been added as well).

Python Lists

Let’s now dive deeper into Python lists. We saw a list in this post, it is basically a way to store data making it one of Python’s data structures (there are more)

Go ahead and create a new project called pyLists. Check the previous post if unser how to create a new project.

Now create the following list and add a print and run it.



We get this message!

Oh damn, Visual Studio didn’t like that. It is telling us that ‘r’ is not defined. If we look at our list, it is a combination of numbers and letters. The numbers didn’t cause any errors, it was the letters where it crashed. That’s because when the program runs, it is assuming that ‘r’ is a variable and it cannot find it anywhere (not defined).

To fix this we need to put all the letters in quotation marks (single or double works):



This runs our code without errors.

What to do with lists?

Lists have a lot of interesting properties, let’s explore a few of them and by the end of this post we will see how to find an element in a list.

Our current list “a” is very small so let’s expand it, we could do that by rewriting it and adding more elements but let’s do it in a different way. Create another list called “b“.


Then we can simply append the lists by adding them up:



Now we have a long list but let’s make it even longer by doing this:



We have now multiplied our list by 10! The c=10*c statement means the following:

  • It reads from right to left, the right side “c” is the current value which in this case it was just our original list of a+b.
  • We grab this value and multiply it by 10
  • This new c x 10 list is now assigned to “c”. This effectively creates a new “c” but with this new value of c x 10

Now that we have out list let’s add one more element which we will also use to search for. To add an element to the list we write (put this before the print):


Python appends this element to the very end of the list which would be 91; this could beg the question, how many elements are in a list? We can find that with the following:


This Python method gives us the number of elements in a container, in our case we get 91.

Let’s assume now that we don’t know if an element is in a list, how do we find out?

For and If Statements

We will use a combination of a For and If loops. These two will allow us to loop over the elements of the list and find the word we are looking for.

Write this in your Visual Studio project:


for element in c:
    if element==searchWord:
        print('{} found at position {}'.format(searchWord,counter))

Line 6

We create a variable called searchWord where we can store the word we are looking for.

Line 7

This is a counter variable, we will use it to see how many times we have looped over the list (to find where in the list is our searchWord)

Lines 9 to 12

We are looping over the c list and we are inspecting each element of it. This in a For loop takes the form of:

FOR Element IN Object:

Line 10 IF statement

The For loop will start with the first element of our list, if this element is equal to our searchWord then the code below the If statement (line 11) will execute, if not, then the For will move to the next on the list.

Line 11

So, if in whatever counter (iteration) of our for loop we see that the current element is equal to our searchWord, this line will be executed:

print('{} found at position {}'.format(searchWord,counter))

You can think of this line as two different statements, both inside the print parenthesis:

  1. ‘{} found at position {}’: This part of the statement is telling Python to write a variable plus the text “found at position” and then another variable, the place holders for the variables are defined by these brackets {}.
  2. Now we add the keyword .format (don’t forget the “.”) and within we write the variables (in the same order we want them to appear in the first statement)

Notice that this line belongs to the If statement, we can identify that by the indentation

Line 12

This line does NOT belong to the If statement, it belongs to the For because of the indentation. Any time the For loop runs, it’s last operation increases our Counter by 1 to keep track of our position in the list.

You should see the following when you run your code:

This concludes our lists exploration, eventually we will use lists to store data obtained from a database and we will perform ETL (Extract Transform Load) with them.

Python Charts with Bokeh

So, today we will create a simple chart with a library called Bokeh. It is a very easy to use library.

First create a new project and point the environment to our honeyBadger env, (Check this post to see how to do this)

Call the project: myFirstBokeh.

Your solution explorer should look like this after changing the environment:

So, how do we create a chart? Well, let’s see this Bokeh library, they have some neat examples in there

They have this code in there

from bokeh.plotting import figure, output_file, show

# prepare some data
x = [1, 2, 3, 4, 5]
y = [6, 7, 2, 4, 5]

# output to static HTML file

# create a new plot with a title and axis labels
p = figure(title="simple line example", x_axis_label='x', y_axis_label='y')

# add a line renderer with legend and line thickness
p.line(x, y, legend="Temp.", line_width=2)

# show the results

If we run this however, it won’t work! Also notice how it shows up in our Visual Studio:

This is Visual Studio saying “I have no idea what this is that you are asking me to import”.

Let’s analyze this for a second. We are asking Visual Studio to import:

  • figure
  • output_file
  • show

From something called bokeh.plotting

This means we are trying to import into our Python project those 3 functions of the bokeh.plotting module. These 3 do stuff that we need to produce a chart and the good thing is we only have to import them into our project and call them.

Anaconda Navigator

Ok but how do we make Visual Studio get that library. That’s where our Anaconda Navigator comes in, open it and click on our honeyBadger environment.


Once in here click on the dropdown (3) and change it to Not Installed and in the Search Packages box type Bokeh. You should see it coming up like this:

Now click on the checkbox (or right click on the name then Mark for Installation) and click Apply on the bottom right then Apply again on the popup

After it finishes, you can click on the dropdown again to see that it is there, also you can click on the “x” to remove the search and you will see all the modules installed in our honeyBadger environment.

All right, after we install Bokeh using Anaconda we can go back to Visual Studio and if we open the Python Environments window (Tools -> Python -> Python Environments) we will see that Visual Studio is working in integrating Bokeh in its Intellisense which basically means that when we type functions from that library, Visual Studio will open a little window (as we type) providing the parts that belong to the libraries we are using (you will see this soon)

You will also notice that in our Solution Explorer if we expand our honeyBadger environment we will have Bokeh in there.

All right, so now hit run and you might have a web browser opening for you with the chart you programmed.

That was easy, wasn’t it? You can even interact with the chart using the tools on the top right But let’s analyze the code now after the import statements.

Data to Plot

# prepare some data
x = [1, 2, 3, 4, 5]
y = [6, 7, 2, 4, 5]

A chart needs data to plot (y) and another set of data for the horizontal axis (x)

Plot to HTML File

# output to static HTML file

Bokeh offers this amazing capability to create the chart to a file which you could then put in a web server.

A Figure

# create a new plot with a title and axis labels
p = figure(title="simple line example", x_axis_label='x', y_axis_label='y')

A figure is the area where the plot (the lines) will be drawn, think of it as the canvas of a picture. This Canvas has certain elements like the Title and the labels.

Plots Within the Figure

# add a line renderer with legend and line thickness
p.line(x, y, legend="Temp.", line_width=2)

Here we add a line plot to our figure P. We also pass the data we want to plot (x,y) and define legend and line width

Render the Figure

# show the results

Finally this line creates the chart and opens the HTML file.

Now that you know how to get Bokeh running, try modifying the charts and also run some of the examples in the Bokeh website. Have fun Bokehing!

Simple Python Script with a Function

In the previous post, I detailed how to install everything to be able to run Python. Now we will create a simple Python script to have fun with what we created

Create New Project

Go ahead and open Visual Studio and create a new project.


Now let’s use the honeyBadger environment that we created in the previous post


Python Script Structure

Let’s talk about the structure of a Python project, basically we can define 3 sections:

#Section 1
import antigravity
#Section 2
def someFunction():
     return stuff
#Section 3

Section 1: Import Libraries

The first section (import antigravity) tells Python to import libraries that you need in your code, libraries are pieces of code someone already created for a purpose. There are thousands of libraries out there, pretty much you only need to have an idea of what you need to do and there’s a library already created for that purpose. Popular libraries include:

  • Pandas: Data analysis/manipulation.
  • Scikit-Learn: Machine learning library.
  • Matplotlib: This is a numerical plotting library.
  • Nltk: This is a great library that allows you to do a lot of text related analysis, I use it in conjunction with Scikit-Learn to create machine learning algorithms which are text based.
  • PyGame: Library to create 2D games.

In this simpleCode exercise we will not use any additional libraries (Python has a default library with lots of goodies)

Oh and if you want to read more about the “antigravity” library please read this

Section 2: Functions and classes (classes will have their own post)

These are pieces of code that you can create and then call within your main code section (Section 3). If you have a repetitive task (like calculating the percentage for a tip in a restaurant) you could create a function called tip that accepts arguments and returns the desired value. For example

def tip(someNumber):
     return tipReturned


You can run this code and get the result of 0.75, try also changing the 5 to some other number.


You might have noticed the indentation in the code, Python requires this (some people hate it, others love it) to work for functions, For loops etc. It basically means the following:

I am defining something here:

     Everything that is below of whatever I am defining and has indentation belongs to what I defined

     This belongs as well

     And so does this

This is something different 

When writing a function for example

def func():

Right after you write the colon character and press enter, Visual Studio will do the indentation for you so you don’t have to do it manually.

Section 3: The code

This is the section where you will make use of everything you defined in sections 1 and 2 (if needed).

If we go back to our tip example of the tip calculator. What if we need to calculate the tip for several values and not only 1? One way to do it would be something like this (if we have 8 values)

def tip(someNumber):
     return tipReturned


This is however, not a good practice. There is a lot of unnecessary repetition and what if we have 100 numbers that we need the calculation for?

One option is by the use of Python lists which is basically a collection of numbers or letters in a set. We can define our list like this:

def tip(someNumber):
     return tipReturned



Unfortunately this will fail because our function is designed to calculate the tip for just one number and when we send the function the complete list, Python doesn’t know what to do with this. So, how do we fix this?

One option is to use our loops, a loop is basically an algorithm that performs an operation until certain condition is met. Let’s try a for loop

def tip(someNumber):
     return tipReturned


for each in list:

This runs! And we get this result

Let’s break down that for statement

for each in list: this means that for each element in the list do whatever is defined below the statement (in this case we do a print with a function in it). The word “each” could be anything, you can change it to muffin or kitty and it will do the same . To summarize, this is a concise way to perform the same operation over the elements of a list (or array)

This concludes the simple Python code, in the next post we will learn how to create charts in Python


Python For Beginners

How to Install Visual Studio + Python + Anaconda in Windows 10 step by step

Most Python for beginners tutorials go immediately to the programming side of Python, i.e the syntax, functions etc but they rarely mention how to really get started: What to install? From where?

This tutorial assumes that you don’t know anything about Python so I will break everything down step by step. At the end you will have in your computer all the software you need to write and manage Python scripts.

I have found that the best way to learn Python is to get the appropriate tools, in this case we have Visual Studio, the language itself (Python) and a package/environment manager (Anaconda). These 3 make it very easy for you to develop Python applications and put them into production.

Visual Studio

Microsoft really makes it easy these days, you only need to go here and get the Community version. With this version you don’t have to pay anything, no fees, no subscription and it includes all you need to create cool applications (and machine learning algorithms)

But first, what is VS?? Well, it is an IDE which stands for Integrated Development Environment which essentially means it contains a lot of libraries, applications and features to develop cool stuff. There are plenty of IDEs out there if you wish to Google them.

Once you click on Community 2017 you will get a downloader (usually into your downloads folder). Click on it and:

Select the Community Edition Installer

Select Python and Anaconda and if you want to, select a different folder for the installation (I have it on another drive)

While you wait for the installer to be done, you can go ahead and create a Microsoft account (otherwise your “trial expires in 30 days”)

Creating Microsoft Account

Go to this site again and click sign in where you will be able to create an account (No account? Create one!)

I usually don’t check the “Send me stuff” button but up to you.

Once you do this login and activate the benefits for VS Community Edition.


What is Anaconda?? First of all it has the name of this guy (or gal) right here.

Anaconda is an amazing piece of Software that makes it extremely easy to manage the other snake: Python!

By manage I mean:

  1. Create different environments: Maybe for a particular application you need a set of x libraries and for another application you need another set. With Anaconda you can keep those sets (environments) separated from each other and summon them (activate them) when needed.
  2. Installs your favorite packages for you: Let’s be honest here, Python sometimes might feel like Linux (even Ubuntu) in that in order to install a package you need to install a lot of dependencies and pray that everything gets along. Anaconda allows you to install scientific, math and ML packages without the dependency headache.
  3. Anaconda Navigator: This is just awesome, it is a visual dashboard where you can do so many interesting things like install packages for your environments, launch Jupyter notebooks and more.

Installing Anaconda

You already did when you installed Visual Studio!

Doing Python

All right, we now have installed everything we need so let’s get to run Python stuff.

After Visual Studio finished installing you should be able to click launch (or find it in the installed programs) then proceed to sign in with the credentials you created.

After you sign in, VS will make some things ready for you like theme selection (i prefer dark) and Development Settings (General)

Dark theme just looks great!

All right, how do we program in Python now while making use of Anaconda?

Creating an environment

Search and Open the Anaconda Navigator.

Your choice if you want to provide data for making it better. Now, we need to create a Python environment so click on Environments and Create.

Name your environment (mine is honeyBadger) and click Create

By the way, you will notice the “root” environment which is the default installation of Python, I prefer to leave that one alone and create separate more manageable envs.

Once the environment is created, we need to do some extra work in VS to point to this particular environment.

Go to Visual Studio and create a Python project.

If you wish, give it a different more meaningful name.

Now we will have this screen where you can see that there is a default environment of Python

Now go to Tools -> Python -> Python Environments. On the solution explorer on the top right you will see a new window, can’t see it very well so go ahead and do Dock as Tabbed Document.

Now that this window is more visible we can point to our honeyBadger environment, we need to find where in the computer this env was created, by default it is created in here:


So put that in the prefix path and then click on Auto Detect, it should populate the other fields like this:

Finally click Apply and done!

Now the last step is to integrate this env to our project which we do by right clicking on Python Environments and then Add/Remove Python Environments.

Select honeyBadger and click ok. VS will take a few minutes to read and check all the installed packages.

Once finished, you can just write print(“Hello World!”) and click the green button to run it and you are done!

Congratulations on your first Python Script! Now you can go to the next post to learn about Functions.