Star War Data Acuisiqtion

Introduction

It is extremely important to learn how to accquire data from API using Request Package

Request package

Request is a package to retreive data by making http request. I will use the Star War API as an example

Fectch one character from API

import requests
person = requests.get('https://swapi.dev/api/people/1')
person.json()

image alter text

The data retrieved from API is in Json format. It will be much clear if the data is in the table format. We can use a json_normalize function from pandas package as following:

import pandas as pd
pd.json_normalize(person.json())

image alter text

Fetch all characters from API

after we are able to fetch one character from StarWar API, we are then able to fetch all characters from Star War using a for loop. In order to understand the number of the characters in Star War API. There is a “count” key that we can check.

people = requests.get('https://swapi.dev/api/people')
print('there are total {} characters'.format(people.json()['count']))

After running the above code, there are total 82 characters. we then write a for loop to extract all the Star War characters. In addition, we will also check the success of retrieval process by retrieval code. If the code is 200 which means successful, if the code is 404 that means there is an error, we will continue to retrieve the next characters. After looping we will check if we have successfully retrieve all the characters from API by using assert.

def get_all_people():
    starWar_people = pd.DataFrame()
    people = requests.get('https://swapi.dev/api/people').json()
    for i in range(people['count']):
        character = requests.get('https://swapi.dev/api/people/' + str(i+1))
        if character.status_code == 200:
            character_df = pd.json_normalize(character.json())
            starWar_people = pd.concat([character_df,starWar_people])
        elif character.status_code == 404:
            continue
    assert(len(starWar_people)==people['count'])
    return starWar_people

after running the above code, we have the following table

image alter text

Now Let’s take a look at the oldest character and all the movies that he is in. We will use “birth_year” to calculate the age of the character. It appears that there are some character with unknown birth year. We will ignore those characters in this case. The rest of the characters are all born BBY which stands for “Before the Battle of Yavin”. The larger the number means the older the character is. We can fetch the oldest character by following code

df = get_all_people()
valid_birth_df = df[df['birth_year'].str.endswith('BBY')]
oldest = valid_birth_df[valid_birth_df.birth_year == valid_birth_df['birth_year'].max()]

After running the code, the oldest character is Yoda which has over 800 BBY. we will then use the following code to get all the name of the movie that Yoda has appeared.

films = oldest['films'].squeeze()
appeared_film = []
for film in films:
    film = requests.get(film)
    appeared_film.append(film.json()['title'])   
appeared_film
['The Empire Strikes Back','Return of the Jedi','The Phantom Menace','Attack of the Clones','Revenge of the Sith']

May the Force be with You!!