Exercise 1

Importing and Reading Data

import pandas as pd

path_to_data = 'gapminder.tsv'

data = pd.read_csv(path_to_data, sep = '\t')

1

Get a list of all the years in this data, without any duplicates. How many unique values are there, and what are they?

unique_years = data['year'].unique().tolist()

number_of_years = len(unique_years)

There are 12 unique years in the dataset: 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997, 2002, 2007

2

What is the largest value for population (pop) in this data? When and where did this occur?

max_pop = data['pop'].max()

max_pop_idx = data['pop'].idxmax()

data.loc[max_pop_idx]

The largest population value is 1,318,683,096, which occured in China in 2007.

3

Extract all the records for Europe. In 1952, which country had the smallest population, and what was the population in 2007?

data_europe = data[data['continent'] == 'Europe']

europe_pop_1952 = data_europe[(data_europe['year'] == data_europe['year'].min())]

min_pop_idx = data_europe['pop'].idxmin()

iceland_pop_2007 = data[(data['country']=='Iceland') & (data['year'] == data['year'].max())]

In 1952, the European country with the smallest population was Iceland. In 2007, Iceland had a population of 301,931.

Exercise 1 Jupyter Notebook