{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Day 4" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Topics\n", "\n", "1. Bit of a review\n", " * Licenses and data origin\n", "1. Pandas functions\n", "1. Pandas $\\rightarrow$ NumPy\n", "1. Fig-Axis interaction in `matplotlib`\n", "1. Plotting with style (plt.style and `with`)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, import our usual things:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Bit of a review\n", "\n", "Last time we played around with the TV dataset after we read it into Pandas." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "movies = pd.read_csv('~/Downloads/tv_shows.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can take a quick look at our data in a table form with:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0TitleYearAgeIMDbRotten TomatoesNetflixHuluPrime VideoDisney+type
00Breaking Bad200818+9.596%10001
11Stranger Things201616+8.893%10001
22Money Heist201718+8.491%10001
33Sherlock201016+9.178%10001
44Better Call Saul201518+8.797%10001
....................................
56065606Tut's Treasures: Hidden Secrets2018NaNNaNNaN00011
56075607Paradise Islands2017NaNNaNNaN00011
56085608Wild Russia2018NaNNaNNaN00011
56095609Love & Vets2017NaNNaNNaN00011
56105610United States of Animals2016NaNNaNNaN00011
\n", "

5611 rows × 11 columns

\n", "
" ], "text/plain": [ " Unnamed: 0 Title Year Age IMDb \\\n", "0 0 Breaking Bad 2008 18+ 9.5 \n", "1 1 Stranger Things 2016 16+ 8.8 \n", "2 2 Money Heist 2017 18+ 8.4 \n", "3 3 Sherlock 2010 16+ 9.1 \n", "4 4 Better Call Saul 2015 18+ 8.7 \n", "... ... ... ... ... ... \n", "5606 5606 Tut's Treasures: Hidden Secrets 2018 NaN NaN \n", "5607 5607 Paradise Islands 2017 NaN NaN \n", "5608 5608 Wild Russia 2018 NaN NaN \n", "5609 5609 Love & Vets 2017 NaN NaN \n", "5610 5610 United States of Animals 2016 NaN NaN \n", "\n", " Rotten Tomatoes Netflix Hulu Prime Video Disney+ type \n", "0 96% 1 0 0 0 1 \n", "1 93% 1 0 0 0 1 \n", "2 91% 1 0 0 0 1 \n", "3 78% 1 0 0 0 1 \n", "4 97% 1 0 0 0 1 \n", "... ... ... ... ... ... ... \n", "5606 NaN 0 0 0 1 1 \n", "5607 NaN 0 0 0 1 1 \n", "5608 NaN 0 0 0 1 1 \n", "5609 NaN 0 0 0 1 1 \n", "5610 NaN 0 0 0 1 1 \n", "\n", "[5611 rows x 11 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "movies" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### License & Origin of this data\n", "\n", "Let's think a bit more about this dataset.\n", "\n", "We found it here: https://www.kaggle.com/ruchi798/tv-shows-on-netflix-prime-video-hulu-and-disney\n", "\n", "We can see where this data came from: it was scraped (this means [web-scraped](https://realpython.com/beautiful-soup-web-scraper-python/)) from Reelgood.com\n", "\n", "We also are told a bit about what *we* can do with the data under the License area on Kaggle which says its Public Domain, meaning we can do whatever we want with it essentially!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also just look at the top bit of our data with:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0TitleYearAgeIMDbRotten TomatoesNetflixHuluPrime VideoDisney+type
00Breaking Bad200818+9.596%10001
11Stranger Things201616+8.893%10001
22Money Heist201718+8.491%10001
33Sherlock201016+9.178%10001
44Better Call Saul201518+8.797%10001
\n", "
" ], "text/plain": [ " Unnamed: 0 Title Year Age IMDb Rotten Tomatoes Netflix \\\n", "0 0 Breaking Bad 2008 18+ 9.5 96% 1 \n", "1 1 Stranger Things 2016 16+ 8.8 93% 1 \n", "2 2 Money Heist 2017 18+ 8.4 91% 1 \n", "3 3 Sherlock 2010 16+ 9.1 78% 1 \n", "4 4 Better Call Saul 2015 18+ 8.7 97% 1 \n", "\n", " Hulu Prime Video Disney+ type \n", "0 0 0 0 1 \n", "1 0 0 0 1 \n", "2 0 0 0 1 \n", "3 0 0 0 1 \n", "4 0 0 0 1 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "movies.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can access individual rows of our data with `.iloc`:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0TitleYearAgeIMDbRotten TomatoesNetflixHuluPrime VideoDisney+type
5555Pretty Little Liars201016+7.481%11001
5656Unbelievable201918+8.497%10001
5757Arrow201216+7.685%10001
5858The IT Crowd200616+8.587%10001
5959The Twilight Zone19597+9.082%11001
6060YOU201818+7.891%10001
6161Sex Education201916+8.394%10001
6262Tiger King: Murder, Mayhem and Madness202018+7.886%10001
6363Star Trek: The Next Generation19877+8.689%11101
6464Broadchurch201318+8.492%10001
\n", "
" ], "text/plain": [ " Unnamed: 0 Title Year Age IMDb \\\n", "55 55 Pretty Little Liars 2010 16+ 7.4 \n", "56 56 Unbelievable 2019 18+ 8.4 \n", "57 57 Arrow 2012 16+ 7.6 \n", "58 58 The IT Crowd 2006 16+ 8.5 \n", "59 59 The Twilight Zone 1959 7+ 9.0 \n", "60 60 YOU 2018 18+ 7.8 \n", "61 61 Sex Education 2019 16+ 8.3 \n", "62 62 Tiger King: Murder, Mayhem and Madness 2020 18+ 7.8 \n", "63 63 Star Trek: The Next Generation 1987 7+ 8.6 \n", "64 64 Broadchurch 2013 18+ 8.4 \n", "\n", " Rotten Tomatoes Netflix Hulu Prime Video Disney+ type \n", "55 81% 1 1 0 0 1 \n", "56 97% 1 0 0 0 1 \n", "57 85% 1 0 0 0 1 \n", "58 87% 1 0 0 0 1 \n", "59 82% 1 1 0 0 1 \n", "60 91% 1 0 0 0 1 \n", "61 94% 1 0 0 0 1 \n", "62 86% 1 0 0 0 1 \n", "63 89% 1 1 1 0 1 \n", "64 92% 1 0 0 0 1 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "movies.iloc[55:65,:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also do this with columns by index:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TitleYear
0Breaking Bad2008
1Stranger Things2016
2Money Heist2017
3Sherlock2010
4Better Call Saul2015
.........
5606Tut's Treasures: Hidden Secrets2018
5607Paradise Islands2017
5608Wild Russia2018
5609Love & Vets2017
5610United States of Animals2016
\n", "

5611 rows × 2 columns

\n", "
" ], "text/plain": [ " Title Year\n", "0 Breaking Bad 2008\n", "1 Stranger Things 2016\n", "2 Money Heist 2017\n", "3 Sherlock 2010\n", "4 Better Call Saul 2015\n", "... ... ...\n", "5606 Tut's Treasures: Hidden Secrets 2018\n", "5607 Paradise Islands 2017\n", "5608 Wild Russia 2018\n", "5609 Love & Vets 2017\n", "5610 United States of Animals 2016\n", "\n", "[5611 rows x 2 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "movies.iloc[:,1:3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or by named column with `.loc` (instead of `.iloc`):" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TitleYear
0Breaking Bad2008
1Stranger Things2016
2Money Heist2017
3Sherlock2010
4Better Call Saul2015
.........
5606Tut's Treasures: Hidden Secrets2018
5607Paradise Islands2017
5608Wild Russia2018
5609Love & Vets2017
5610United States of Animals2016
\n", "

5611 rows × 2 columns

\n", "
" ], "text/plain": [ " Title Year\n", "0 Breaking Bad 2008\n", "1 Stranger Things 2016\n", "2 Money Heist 2017\n", "3 Sherlock 2010\n", "4 Better Call Saul 2015\n", "... ... ...\n", "5606 Tut's Treasures: Hidden Secrets 2018\n", "5607 Paradise Islands 2017\n", "5608 Wild Russia 2018\n", "5609 Love & Vets 2017\n", "5610 United States of Animals 2016\n", "\n", "[5611 rows x 2 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "movies.loc[:,['Title', 'Year']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can print out the names of the columns in our dataset:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['Unnamed: 0', 'Title', 'Year', 'Age', 'IMDb', 'Rotten Tomatoes',\n", " 'Netflix', 'Hulu', 'Prime Video', 'Disney+', 'type'],\n", " dtype='object')" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "movies.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: \"Unnamed: 0\" is just an extra column that *also* contains the index. This is actually telling us this person saved this from a Pandas DataFrame but didn't select `index=False` when they saved it :)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also print out a summary of our dataset:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0YearIMDbNetflixHuluPrime VideoDisney+type
count5611.0000005611.0000004450.0000005611.0000005611.0000005611.0000005611.0000005611.0
mean2805.0000002011.0210307.1132580.3441450.3126000.3821070.0320801.0
std1619.90051111.0051161.1320600.4751310.4635940.4859460.1762280.0
min0.0000001901.0000001.0000000.0000000.0000000.0000000.0000001.0
25%1402.5000002010.0000006.6000000.0000000.0000000.0000000.0000001.0
50%2805.0000002015.0000007.3000000.0000000.0000000.0000000.0000001.0
75%4207.5000002017.0000007.9000001.0000001.0000001.0000000.0000001.0
max5610.0000002020.0000009.6000001.0000001.0000001.0000001.0000001.0
\n", "
" ], "text/plain": [ " Unnamed: 0 Year IMDb Netflix Hulu \\\n", "count 5611.000000 5611.000000 4450.000000 5611.000000 5611.000000 \n", "mean 2805.000000 2011.021030 7.113258 0.344145 0.312600 \n", "std 1619.900511 11.005116 1.132060 0.475131 0.463594 \n", "min 0.000000 1901.000000 1.000000 0.000000 0.000000 \n", "25% 1402.500000 2010.000000 6.600000 0.000000 0.000000 \n", "50% 2805.000000 2015.000000 7.300000 0.000000 0.000000 \n", "75% 4207.500000 2017.000000 7.900000 1.000000 1.000000 \n", "max 5610.000000 2020.000000 9.600000 1.000000 1.000000 \n", "\n", " Prime Video Disney+ type \n", "count 5611.000000 5611.000000 5611.0 \n", "mean 0.382107 0.032080 1.0 \n", "std 0.485946 0.176228 0.0 \n", "min 0.000000 0.000000 1.0 \n", "25% 0.000000 0.000000 1.0 \n", "50% 0.000000 0.000000 1.0 \n", "75% 1.000000 0.000000 1.0 \n", "max 1.000000 1.000000 1.0 " ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "movies.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that this only does \"summary statistics\" like the mean, standdard deviation (STD) - a measure of the \"spread\" of the dataset, min & max for the numerical data.\n", "\n", "Also note that some of this summary doesn't actually make sense. For example, the \"mean index\" is just the 1/2 point of our dataset! Also, does the mean year have any meaning? It may or may not, depending on your application.\n", "\n", "This just means we have to be careful when we use these functions!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## More on Pandas Functions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are also some \"built-in\" functions for this data. For example, we can take the average (mean) IMDb score ourselves:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "7.1132584269662855" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "movies['IMDb'].mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or we could count the number of titles available on Hulu:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1754" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "movies['Hulu'].sum() # each row has an entry either 0 or 1 that tells us a NOT or YES on this platform" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also do this for a variety of columns:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Hulu 1754\n", "Prime Video 2144\n", "Disney+ 180\n", "dtype: int64" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "movies[['Hulu', 'Prime Video', 'Disney+']].sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the above we passed in a *list* of columns to take the sum over." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But what about taking a sum over sections of *rows*?\n", "\n", "We can ask, across Netflix, Hulu, Prime Video and Disney+ -- how many of these platforms support each title?\n", "\n", "Let's build this up bit by bit. First, let's look at the subset of data that we might be interested in:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TitleNetflixHuluPrime VideoDisney+
0Breaking Bad1000
1Stranger Things1000
2Money Heist1000
3Sherlock1000
4Better Call Saul1000
..................
5606Tut's Treasures: Hidden Secrets0001
5607Paradise Islands0001
5608Wild Russia0001
5609Love & Vets0001
5610United States of Animals0001
\n", "

5611 rows × 5 columns

\n", "
" ], "text/plain": [ " Title Netflix Hulu Prime Video Disney+\n", "0 Breaking Bad 1 0 0 0\n", "1 Stranger Things 1 0 0 0\n", "2 Money Heist 1 0 0 0\n", "3 Sherlock 1 0 0 0\n", "4 Better Call Saul 1 0 0 0\n", "... ... ... ... ... ...\n", "5606 Tut's Treasures: Hidden Secrets 0 0 0 1\n", "5607 Paradise Islands 0 0 0 1\n", "5608 Wild Russia 0 0 0 1\n", "5609 Love & Vets 0 0 0 1\n", "5610 United States of Animals 0 0 0 1\n", "\n", "[5611 rows x 5 columns]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "movies.loc[:,['Title','Netflix', 'Hulu', 'Prime Video', 'Disney+']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: you can also do `movies[['Title','Netflix', 'Hulu', 'Prime Video', 'Disney+']]` in this case." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we want to add all across all rows that are not the Title. There are a few ways to do this and you should definetly check out the [Pandas docs](https://pandas.pydata.org/docs/) for more info.\n", "\n", "I'm going to do this in a few steps. First, let's isolate those 4 columns:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NetflixHuluPrime VideoDisney+
01000
11000
21000
31000
41000
...............
56060001
56070001
56080001
56090001
56100001
\n", "

5611 rows × 4 columns

\n", "
" ], "text/plain": [ " Netflix Hulu Prime Video Disney+\n", "0 1 0 0 0\n", "1 1 0 0 0\n", "2 1 0 0 0\n", "3 1 0 0 0\n", "4 1 0 0 0\n", "... ... ... ... ...\n", "5606 0 0 0 1\n", "5607 0 0 0 1\n", "5608 0 0 0 1\n", "5609 0 0 0 1\n", "5610 0 0 0 1\n", "\n", "[5611 rows x 4 columns]" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "movies.loc[:,['Netflix', 'Hulu', 'Prime Video', 'Disney+']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's try using \"sum\" here:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Netflix 1931\n", "Hulu 1754\n", "Prime Video 2144\n", "Disney+ 180\n", "dtype: int64" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "movies.loc[:,['Netflix', 'Hulu', 'Prime Video', 'Disney+']].sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hmmmm, that's not quite right, that's just what we were doing before! Now I'm going to re-call this, but specify an axis parameter:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 1\n", "1 1\n", "2 1\n", "3 1\n", "4 1\n", " ..\n", "5606 1\n", "5607 1\n", "5608 1\n", "5609 1\n", "5610 1\n", "Length: 5611, dtype: int64" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "movies.loc[:,['Netflix', 'Hulu', 'Prime Video', 'Disney+']].sum(axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That looks to be more like it! We can even double check the min of this:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "movies.loc[:,['Netflix', 'Hulu', 'Prime Video', 'Disney+']].sum(axis=1).min()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and the max:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "movies.loc[:,['Netflix', 'Hulu', 'Prime Video', 'Disney+']].sum(axis=1).max()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So, this sort of makes sense with our understanding of how streaming services work -- usually a title is on one or two but not all of these services." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok! We are almost there! Let's actually add our summation calculation back into our dataset:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "movies['Total Service'] = movies.loc[:,['Netflix', 'Hulu', 'Prime Video', 'Disney+']].sum(axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And now take a look:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0TitleYearAgeIMDbRotten TomatoesNetflixHuluPrime VideoDisney+typeTotal Service
00Breaking Bad200818+9.596%100011
11Stranger Things201616+8.893%100011
22Money Heist201718+8.491%100011
33Sherlock201016+9.178%100011
44Better Call Saul201518+8.797%100011
\n", "
" ], "text/plain": [ " Unnamed: 0 Title Year Age IMDb Rotten Tomatoes Netflix \\\n", "0 0 Breaking Bad 2008 18+ 9.5 96% 1 \n", "1 1 Stranger Things 2016 16+ 8.8 93% 1 \n", "2 2 Money Heist 2017 18+ 8.4 91% 1 \n", "3 3 Sherlock 2010 16+ 9.1 78% 1 \n", "4 4 Better Call Saul 2015 18+ 8.7 97% 1 \n", "\n", " Hulu Prime Video Disney+ type Total Service \n", "0 0 0 0 1 1 \n", "1 0 0 0 1 1 \n", "2 0 0 0 1 1 \n", "3 0 0 0 1 1 \n", "4 0 0 0 1 1 " ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "movies.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hey cool! Now we have a whole extra column to our dataset! Let's plot it with a Pandas plot call:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "movies.plot(y='Total Service', kind='hist')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So, as expected, we that most of the titles are on only 1 streaming service and very few are on 2, and even fewer are on 3." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, those axis don't look quite right -- we know that there are no 1/2 services! Let's see how we can see what parameters are available to us in this plot:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "movies.plot?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So, from this pop up doc, we see that there is something called `xticks`. Let's explicitly put in the numbers 0-4:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "movies.plot(y='Total Service', kind='hist', xticks=[1,2,3,4])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This depicts indeed that there are no cases where a title appears on all 4 streaming services. \n", "\n", "Let's try one more thing listed in the parameters -- log scale the y-axis:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAD4CAYAAAD2FnFTAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAR9klEQVR4nO3de5BW9X3H8feX5eYFccpl6rgipDpUBFRc0VSRYqjBC8YkNsLUzrSTemtSdTBTSUysZuKMzaRJahpHNzaTmljBaBVRM9GoiTCTcUXUKt0wMvGGmogwLhC5uPTbP3ahqGd3H3CP53mW92tmZ55z9jxnP+4jfDjn9zvnRGYiSdL7Dao6gCSpPlkQkqRCFoQkqZAFIUkqZEFIkgoNrjrAhzF69OgcP3581TEkqaE89dRTb2XmmL62a+iCGD9+PCtWrKg6hiQ1lIh4uZbtPMUkSSpkQUiSClkQkqRCDTkGERFzgblHHHFE1VEk7ebdd99l7dq1bN26teooAoYPH05zczNDhgzZq/dHI9+LqaWlJR2klurHiy++yIgRIxg1ahQRUXWcfVpmsn79ejZt2sSECRPe872IeCozW/rah6eYJPWbrVu3Wg51IiIYNWrUhzqasyAk9SvLoX582M/CgpAkFWrIQeq9MX7hA/2+z5duOKvf9ykNJP39566vP3Pr16/nE5/4BAC/+93vaGpqYsyYrguG29raGDp06Hu237BhA3feeSeXXHJJr/vt7Oxk9OjRvP322x/43te//nUWL15MU1MTTU1NtLa2csIJJ+zJf1ahq6++mtmzZzNr1qwPva+9tc8UhKSBb9SoUTzzzDMAXHvttRx44IF86Utf6nH7DRs2cPPNN/dZED1ZtmwZDz30EE8//TRDhw5l3bp1dHZ21vz+zs5OBg8u/mv4+uuv36tM/akhTzFFxNyIaO3o6Kg6iqQG8c1vfpPJkyczefJkvve97wGwcOFCVq9ezbHHHsvChQvZuHEjp512GtOmTWPq1Kncf//9ve7zjTfeYMyYMbuOTMaMGcMhhxwCwJNPPsnMmTM5/vjjOeOMM/j9738PwCmnnMLVV1/Nqaeeyo033siECRPYOZt08+bNjBs3js7OTi644ALuvfdeAJ544gk+/vGPc8wxx3DiiSfyzjvv0NnZyYIFC5g+fTpTp07l1ltv7fffWUMeQWTmUmBpS0vLhVVnkVT/2trauP3222lra2PHjh1Mnz6dmTNncsMNN7BmzZpdRx3vvvsuS5YsYcSIEbz55pucfPLJnH322T3ud86cOXzjG99g4sSJzJ49m3nz5jFjxgy2bdvG5Zdfzn333cfo0aO5/fbb+drXvkZraysAGzdu5PHHHwfgkUceYfny5cyYMYMlS5Zw5plnvueoYuvWrcybN4+7776badOm0dHRwbBhw7jlllsYO3YsbW1tbNu2jZNOOonTTz+dcePG9dvvrSELQpL2xLJly/jsZz/L/vvvD8C5557L8uXLOf3009+zXWZy1VVXsXz5cgYNGsSrr77KW2+9xcEHH1y434MOOoiVK1eybNkyHnvsMc477zy+9a1vMWXKFFatWsXs2bMB2LFjB83NzbveN2/evF2vzz//fBYvXsyMGTNYtGgRCxYseM/PaG9vZ9y4cUybNg2AkSNHAvDQQw/R3t7OokWLAOjo6OCFF16wICRpT9R6QfBtt91GR0cHK1euZPDgwTQ3N/d5HcHgwYOZNWsWs2bNYtKkSSxevJjJkyczdepUli1bVvieAw44YNfrc889l2uuuYbrrruO5557jpkzZ34ge9F01czkpptu2jUoX4aGHIOQpD1x6qmncs8997BlyxY2b97MkiVLmDFjBiNGjGDTpk27tuvo6GDs2LEMHjyYhx9+mNdee63X/ba3t7NmzZpdy88++yyHH344kyZN4rXXXqOtrQ2A7du3s2rVqsJ9HHTQQRx33HFcccUVnHPOOQwa9N6/lo8++mhefvllVq5cCXSdntqxYwef/OQnuemmm3YNiq9evZotW7bs+S+nFx5BSCpNvUwFnz59OvPnz981/fTSSy9lypQpALS0tDBlyhTOOussFixYwNy5c2lpaWHatGkceeSRve538+bNXHbZZXR0dNDU1MTEiRNpbW1l2LBh3HXXXVx22WVs2rSJzs5OrrzySo4++ujC/Zx//vnMnz+f5cuXf+B7w4YN44477uDSSy9l69at7Lfffjz66KNcfPHFvPLKKxx77LEAjB07liVLlnyYX9MH7DP3YvI6CKl87e3tHHXUUVXH0G6KPhPvxSRJ+lAsCElSoYYsCC+Uk+pXI5+2Hmg+7GfRkAWRmUsz86Kd84El1Yfhw4ezfv16S6IO7HwexPDhw/d6H85iktRvmpubWbt2LevWras6ivj/J8rtLQtCUr8ZMmTIB55epsbVkKeYJEnlsyAkSYUsCElSIQtCklTIgpAkFbIgJEmFLAhJUqGGLAhvtSFJ5WvIgvBWG5JUvoYsCElS+SwISVIhC0KSVMiCkCQVsiAkSYUsCElSIQtCklTIgpAkFbIgJEmFLAhJUiELQpJUyIKQJBWyICRJhSwISVKhhiwInwchSeVryILweRCSVL6GLAhJUvksCElSIQtCklTIgpAkFbIgJEmFLAhJUiELQpJUyIKQJBWyICRJhSwISVIhC0KSVMiCkCQVsiAkSYUsCElSIQtCklTIgpAkFbIgJEmFLAhJUiELQpJUyIKQJBWyICRJheqmICLiqIi4OSLuiohLq84jSfu6UgsiIn4YEW9GxPPvWz8nIlZHxJqIWAiQme2ZeQnwOaClzFySpL6VfQTxI2DO7isiogn4PnAGMAmYHxGTur93DrAceKTkXJKkPpRaEJn5OLDhfaunA2sy87eZuR1YBHyqe/v7MvPPgL/qaZ8RcVFErIiIFevWrSsruiTt8wZX8DMPBV7dbXktcGJE/DnwGWAY8GBPb87MVqAVoKWlJcuLKUn7tioKIgrWZWb+EvjlRxtFktSTKmYxrQUO2225GXi9ghySpF5UURBPAkdGxISIGArMA+6rIIckqRdlT3O9A/g1MDEi1kbE5zOzE/gi8HOgHbgzM1ft4X7nRkRrR0dH/4eWJAElj0Fk5vwe1j9ILwPRNex3KbC0paXlwr3dhySpd3VzJbUkqb5YEJKkQg1ZEI5BSFL5GrIgMnNpZl40cuTIqqNI0oBVU0FExOSyg0iS6kutRxA3R0RbRPx9RBxcaiJJUl2oqSAy8xS6bqB3GLAiIv4zIv6i1GSSpErVPAaRmS8AXwWuAmYCN0bEbyLiM2WF64mD1JJUvlrHIKZGxHfouvL5NGBuZh7V/fo7JeYr5CC1JJWv1iup/w34AfCVzNyyc2Vmvh4RXy0lmSSpUrUWxJnAlszcARARg4DhmflOZv64tHSSpMrUOgbxC2C/3Zb3714nSRqgai2I4Zm5eedC9+v9y4kkSaoHtRbEHyJi2s6FiDge2NLL9qVyFpMkla/WgrgC+GlELIuIZcBiup7pUAlnMUlS+WoapM7MJyPiT4GJdD1T+jeZ+W6pySRJldqTBwadAIzvfs9xEUFm3lZKKklS5WoqiIj4MfAnwDPAju7VCVgQkjRA1XoE0QJMyswsM4wkqX7UOkj9PPDHZQaRJNWXWo8gRgP/ExFtwLadKzPznFJS9SEi5gJzjzjiiCp+vCTtE2otiGvLDLGnMnMpsLSlpeXCqrNI0kBV6zTXX0XE4cCRmfmLiNgfaCo3miSpSrXe7vtC4C7glu5VhwL3lhVKklS9WgepvwCcDGyEXQ8PGltWKElS9WotiG2ZuX3nQkQMpus6CEnSAFVrQfwqIr4C7Nf9LOqfAkvLiyVJqlqtBbEQWAc8B1wMPEjX86klSQNUrbOY/peuR47+oNw4kqR6Ueu9mF6kYMwhMz/W74lq4IVyklS+PbkX007Dgb8E/qj/49TGC+UkqXw1jUFk5vrdvl7LzO8Cp5WcTZJUoVpPMU3bbXEQXUcUI0pJJEmqC7WeYvqX3V53Ai8Bn+v3NJKkulHrLKZZZQeRJNWXWk8xLejt+5n57f6JI0mqF3syi+kE4L7u5bnA48CrZYSSxi98oN/3+dINZ/X7PqWBbE8eGDQtMzcBRMS1wE8z8+/KCiZJqlatt9oYB2zfbXk7ML7f00iS6katRxA/Btoi4h66rqj+NHBbaakkSZWrdRbT9RHxM2BG96q/zcyny4vVO2+1IUnlq/UUE8D+wMbM/FdgbURMKClTnzJzaWZeNHLkyKoiSNKAV+sjR/8JuAr4cveqIcBPygolSaperUcQnwbOAf4AkJmv4602JGlAq7Ugtmdm0n3L74g4oLxIkqR6UGtB3BkRtwAHR8SFwC/w4UGSNKDVOovpW93Pot4ITASuycyHS00mSapUnwUREU3AzzNzNmApSNI+os9TTJm5A3gnIpxTKkn7kFqvpN4KPBcRD9M9kwkgMy8rJZUkqXK1FsQD3V+SpH1ErwUREeMy85XM/I+PKpAkqT70NQZx784XEXF3yVkkSXWkr4KI3V5/rMwgkqT60ldBZA+vJUkDXF+D1MdExEa6jiT2635N93Jm5kGlppMkVabXgsjMpo8qyJ7weRCSVL49eR5E3fB5EJJUvoYsCElS+SwISVIhC0KSVMiCkCQVsiAkSYUsCElSIQtCklTIgpAkFbIgJEmFLAhJUiELQpJUyIKQJBWyICRJhSwISVIhC0KSVMiCkCQVsiAkSYUsCElSIQtCklTIgpAkFbIgJEmF6qYgIuLciPhBRCyJiNOrziNJ+7pSCyIifhgRb0bE8+9bPyciVkfEmohYCJCZ92bmhcDfAOeXmUuS1LeyjyB+BMzZfUVENAHfB84AJgHzI2LSbpt8tfv7kqQKlVoQmfk4sOF9q6cDazLzt5m5HVgEfCq6/DPws8xcWWYuSVLfqhiDOBR4dbfltd3r/gGYDZwXEZf09OaIuCgiVkTEinXr1pWbVJL2YYMr+JlRsC4z80bgxr7enJmtQCtAS0tL9nM2SVK3Ko4g1gKH7bbcDLxeQQ5JUi+qKIgngSMjYkJEDAXmAfdVkEOS1Iuyp7neAfwamBgRayPi85nZCXwR+DnQDtyZmav2cL9zI6K1o6Oj/0NLkoCSxyAyc34P6x8EHvwQ+10KLG1pablwb/chSepd3VxJLUmqLxaEJKlQQxaEYxCSVL6GLIjMXJqZF40cObLqKJI0YDVkQUiSymdBSJIKWRCSpEIWhCSpUEMWhLOYJKl8DVkQzmKSpPI1ZEFIkspnQUiSClkQkqRCFoQkqVBDFoSzmCSpfA1ZEM5ikqTyNWRBSJLKZ0FIkgpZEJKkQhaEJKmQBSFJKtSQBeE0V0kqX0MWhNNcJal8DVkQkqTyWRCSpEIWhCSpkAUhSSpkQUiSClkQkqRCDVkQXgchSeVryILwOghJKl9DFoQkqXwWhCSpkAUhSSpkQUiSClkQkqRCFoQkqZAFIUkqNLjqAJI+OuMXPlDKfl+64axS9qtqeQQhSSrUkAXhrTYkqXwNWRDeakOSyteQBSFJKp8FIUkqZEFIkgpZEJKkQhaEJKmQBSFJKhSZWXWGvRYR64CXa9x8NPBWiXHUv/y8GoufV2OZmJkj+tqooW+1kZljat02IlZkZkuZedR//Lwai59XY4mIFbVs5ykmSVIhC0KSVGhfKojWqgNoj/h5NRY/r8ZS0+fV0IPUkqTy7EtHEJKkPWBBSJIKDfiCiIgfRsSbEfF81VnUu4g4LCIei4j2iFgVEZdXnUk9i4jhEdEWEc92f17XVZ1JfYuIpoh4OiLu72vbAV8QwI+AOVWHUE06gSsz8yjgJOALETGp4kzq2TbgtMw8BjgWmBMRJ1WcSX27HGivZcMBXxCZ+Tiwoeoc6ltmvpGZK7tfb6Lrf+JDq02lnmSXzd2LQ7q/nPVSxyKiGTgLuLWW7Qd8QagxRcR44DjgiWqTqDfdpyueAd4EHs5MP6/69l3gH4H/rWVjC0J1JyIOBO4GrsjMjVXnUc8yc0dmHgs0A9MjYnLVmVQsIs4G3szMp2p9jwWhuhIRQ+gqh9sz87+qzqPaZObbwC9xvK+enQycExEvAYuA0yLiJ729wYJQ3YiIAP4daM/Mb1edR72LiDERcXD36/2A2cBvqk2lnmTmlzOzOTPHA/OARzPzgt7eM+ALIiLuAH4NTIyItRHx+aozqUcnA39N179snun+OrPqUOrRIcBjEfHfwJN0jUH0OXVSjcNbbUiSCg34IwhJ0t6xICRJhSwISVIhC0KSVMiCkCQVsiAkSYUsCElSof8DDuMK/TSrhCkAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "movies.plot(y='Total Service', kind='hist', xticks=[1,2,3,4], logy=True)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This shows that there are ~5000 titles available on 1 service, ~300 on 2 services, and it looks like ~30 on 3 services." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Take aways\n", "\n", "1. We did a lot of data exploration & processing before we actually got to a plot that we liked -- this is normal!\n", "1. We did much of the data processing & plotting \"step-by-step\" -- we didn't just start right away with the answer, we had to try a few things and slowly build up to a solution we liked. This is also normal!\n", "\n", "Much of what we do in data viz is data processing to get our data in a form that actually \"works\" for data viz!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Pandas $\\rightarrow$ NumPy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is a lot of interaction between NumPy and Pandas. In fact the \"sum\" we just did is basically implemented in NumPy. There are ways to go between Pandas & NumPy. Here is one way to get a NumPy array from a Pandas DataFrame.\n", "\n", "Let's go back to that array of for columns:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NetflixHuluPrime VideoDisney+
01000
11000
21000
31000
41000
...............
56060001
56070001
56080001
56090001
56100001
\n", "

5611 rows × 4 columns

\n", "
" ], "text/plain": [ " Netflix Hulu Prime Video Disney+\n", "0 1 0 0 0\n", "1 1 0 0 0\n", "2 1 0 0 0\n", "3 1 0 0 0\n", "4 1 0 0 0\n", "... ... ... ... ...\n", "5606 0 0 0 1\n", "5607 0 0 0 1\n", "5608 0 0 0 1\n", "5609 0 0 0 1\n", "5610 0 0 0 1\n", "\n", "[5611 rows x 4 columns]" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "movies.loc[:,['Netflix', 'Hulu', 'Prime Video', 'Disney+']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can grab the \"values\" of this Subset DataFrame and we get back out a NumPy array:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 0, 0, 0],\n", " [1, 0, 0, 0],\n", " [1, 0, 0, 0],\n", " ...,\n", " [0, 0, 0, 1],\n", " [0, 0, 0, 1],\n", " [0, 0, 0, 1]])" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "movies.loc[:,['Netflix', 'Hulu', 'Prime Video', 'Disney+']].values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's store this in a variable:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "myArray = movies.loc[:,['Netflix', 'Hulu', 'Prime Video', 'Disney+']].values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What is the type of this data?" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "numpy.ndarray" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(myArray)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This tells us we are getting a NumPy array back!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can then sum this like before:" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 1, 1, ..., 1, 1, 1])" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "myArray.sum(axis=1) # counts number of services" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And then we can plot using `matplotlib` like before:" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAAD4CAYAAAAAczaOAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAARiUlEQVR4nO3df6xfdX3H8efLFtCps0UKIy1bWewfwjKVNNjJsqgsUMGtLJGkxmhDSJpsXaLJsg39QyJIAv8MQzI1RJoVoyJRGQRRbPgRtxl+XBT5KesVGTQltlpAnZOl7L0/vp/il3Lv/X4vvfd7gc/zkdx8z3mfz/d7Pufk09c995zzPU1VIUnqw2uWugOSpMkx9CWpI4a+JHXE0Jekjhj6ktSR5Uvdgbkcc8wxtXbt2qXuhiS9otxzzz0/q6pVMy17WYf+2rVrmZqaWupuSNIrSpL/mm2Zp3ckqSOGviR1xNCXpI4Y+pLUEUNfkjpi6EtSRwx9SeqIoS9JHTH0JakjL+tv5B6utRd8c0nW+9ilZy/JeiVpFI/0Jakjhr4kdcTQl6SOGPqS1BFDX5I6MlboJ3ksyf1J7k0y1WpHJ9mZZFd7XdnqSXJFkukk9yU5ZehztrT2u5JsWZxNkiTNZj5H+u+pqrdX1fo2fwFwS1WtA25p8wDvA9a1n63A52DwSwK4EHgncCpw4cFfFJKkyTic0zubgB1tegdwzlD96hq4A1iR5HjgTGBnVe2vqqeAncDGw1i/JGmexg39Ar6T5J4kW1vtuKp6EqC9Htvqq4Enht67u9Vmq79Akq1JppJM7du3b/wtkSSNNO43ck+rqj1JjgV2JvnRHG0zQ63mqL+wUHUlcCXA+vXrX7RckvTSjXWkX1V72ute4DoG5+R/2k7b0F73tua7gROG3r4G2DNHXZI0ISNDP8nrk7zx4DRwBvAAcANw8A6cLcD1bfoG4CPtLp4NwDPt9M/NwBlJVrYLuGe0miRpQsY5vXMccF2Sg+2/XFXfTnI3cG2S84HHgXNb+5uAs4Bp4NfAeQBVtT/JxcDdrd1FVbV/wbZEkjTSyNCvqkeBt81Q/zlw+gz1ArbN8lnbge3z76YkaSH4jVxJ6oihL0kdMfQlqSOGviR1xNCXpI4Y+pLUEUNfkjpi6EtSRwx9SeqIoS9JHTH0Jakjhr4kdcTQl6SOGPqS1BFDX5I6YuhLUkcMfUnqiKEvSR0x9CWpI4a+JHXE0Jekjhj6ktQRQ1+SOmLoS1JHDH1J6oihL0kdMfQlqSOGviR1xNCXpI4Y+pLUEUNfkjoydugnWZbkB0lubPMnJrkzya4kX01yZKsf1ean2/K1Q5/x8VZ/JMmZC70xkqS5zedI/6PAw0PzlwGXV9U64Cng/FY/H3iqqt4CXN7akeQkYDNwMrAR+GySZYfXfUnSfIwV+knWAGcDX2jzAd4LfK012QGc06Y3tXna8tNb+03ANVX1bFX9BJgGTl2IjZAkjWfcI/3PAP8A/F+bfzPwdFUdaPO7gdVtejXwBEBb/kxr/3x9hvc8L8nWJFNJpvbt2zePTZEkjTIy9JO8H9hbVfcMl2doWiOWzfWe3xaqrqyq9VW1ftWqVaO6J0mah+VjtDkN+MskZwGvBX6XwZH/iiTL29H8GmBPa78bOAHYnWQ58CZg/1D9oOH3SJImYOSRflV9vKrWVNVaBhdib62qDwG3AR9ozbYA17fpG9o8bfmtVVWtvrnd3XMisA64a8G2RJI00jhH+rP5R+CaJJ8GfgBc1epXAV9MMs3gCH8zQFU9mORa4CHgALCtqp47jPVLkuZpXqFfVbcDt7fpR5nh7puq+g1w7izvvwS4ZL6dlCQtDL+RK0kdMfQlqSOGviR1xNCXpI4Y+pLUEUNfkjpi6EtSRwx9SeqIoS9JHTH0Jakjhr4kdcTQl6SOGPqS1BFDX5I6YuhLUkcMfUnqiKEvSR0x9CWpI4a+JHXE0Jekjhj6ktQRQ1+SOmLoS1JHDH1J6oihL0kdMfQlqSOGviR1xNCXpI4Y+pLUEUNfkjoyMvSTvDbJXUl+mOTBJJ9q9ROT3JlkV5KvJjmy1Y9q89Nt+dqhz/p4qz+S5MzF2ihJ0szGOdJ/FnhvVb0NeDuwMckG4DLg8qpaBzwFnN/anw88VVVvAS5v7UhyErAZOBnYCHw2ybKF3BhJ0txGhn4N/KrNHtF+Cngv8LVW3wGc06Y3tXna8tOTpNWvqapnq+onwDRw6oJshSRpLGOd00+yLMm9wF5gJ/Bj4OmqOtCa7AZWt+nVwBMAbfkzwJuH6zO8Z3hdW5NMJZnat2/f/LdIkjSrsUK/qp6rqrcDaxgcnb91pmbtNbMsm61+6LqurKr1VbV+1apV43RPkjSmed29U1VPA7cDG4AVSZa3RWuAPW16N3ACQFv+JmD/cH2G90iSJmCcu3dWJVnRpl8H/DnwMHAb8IHWbAtwfZu+oc3Tlt9aVdXqm9vdPScC64C7FmpDJEmjLR/dhOOBHe1Om9cA11bVjUkeAq5J8mngB8BVrf1VwBeTTDM4wt8MUFUPJrkWeAg4AGyrqucWdnMkSXMZGfpVdR/wjhnqjzLD3TdV9Rvg3Fk+6xLgkvl3U5K0EPxGriR1xNCXpI4Y+pLUEUNfkjpi6EtSRwx9SeqIoS9JHTH0Jakjhr4kdcTQl6SOGPqS1BFDX5I6YuhLUkcMfUnqiKEvSR0x9CWpI4a+JHXE0Jekjhj6ktQRQ1+SOmLoS1JHDH1J6oihL0kdMfQlqSOGviR1xNCXpI4Y+pLUEUNfkjpi6EtSRwx9SeqIoS9JHRkZ+klOSHJbkoeTPJjko61+dJKdSXa115WtniRXJJlOcl+SU4Y+a0trvyvJlsXbLEnSTMY50j8A/F1VvRXYAGxLchJwAXBLVa0DbmnzAO8D1rWfrcDnYPBLArgQeCdwKnDhwV8UkqTJGBn6VfVkVX2/Tf8SeBhYDWwCdrRmO4Bz2vQm4OoauANYkeR44ExgZ1Xtr6qngJ3AxgXdGknSnOZ1Tj/JWuAdwJ3AcVX1JAx+MQDHtmargSeG3ra71WarH7qOrUmmkkzt27dvPt2TJI0wdugneQPwdeBjVfWLuZrOUKs56i8sVF1ZVeurav2qVavG7Z4kaQxjhX6SIxgE/peq6hut/NN22ob2urfVdwMnDL19DbBnjrokaULGuXsnwFXAw1X1T0OLbgAO3oGzBbh+qP6RdhfPBuCZdvrnZuCMJCvbBdwzWk2SNCHLx2hzGvBh4P4k97baJ4BLgWuTnA88Dpzblt0EnAVMA78GzgOoqv1JLgbubu0uqqr9C7IVkqSxjAz9qvp3Zj4fD3D6DO0L2DbLZ20Hts+ng5KkheM3ciWpI4a+JHXE0Jekjhj6ktQRQ1+SOmLoS1JHDH1J6oihL0kdMfQlqSOGviR1xNCXpI4Y+pLUEUNfkjpi6EtSRwx9SeqIoS9JHTH0Jakjhr4kdcTQl6SOGPqS1BFDX5I6YuhLUkcMfUnqiKEvSR0x9CWpI4a+JHXE0Jekjhj6ktQRQ1+SOmLoS1JHRoZ+ku1J9iZ5YKh2dJKdSXa115WtniRXJJlOcl+SU4bes6W135Vky+JsjiRpLuMc6f8LsPGQ2gXALVW1DrilzQO8D1jXfrYCn4PBLwngQuCdwKnAhQd/UUiSJmdk6FfVd4H9h5Q3ATva9A7gnKH61TVwB7AiyfHAmcDOqtpfVU8BO3nxLxJJ0iJ7qef0j6uqJwHa67Gtvhp4Yqjd7labrf4iSbYmmUoytW/fvpfYPUnSTBb6Qm5mqNUc9RcXq66sqvVVtX7VqlUL2jlJ6t1LDf2fttM2tNe9rb4bOGGo3Rpgzxx1SdIEvdTQvwE4eAfOFuD6ofpH2l08G4Bn2umfm4EzkqxsF3DPaDVJ0gQtH9UgyVeAdwPHJNnN4C6cS4Frk5wPPA6c25rfBJwFTAO/Bs4DqKr9SS4G7m7tLqqqQy8OS5IW2cjQr6oPzrLo9BnaFrBtls/ZDmyfV+8kSQvKb+RKUkcMfUnqiKEvSR0x9CWpI4a+JHXE0Jekjhj6ktQRQ1+SOmLoS1JHDH1J6oihL0kdMfQlqSOGviR1xNCXpI4Y+pLUEUNfkjpi6EtSRwx9SeqIoS9JHTH0Jakjhr4kdWT5UndAeqVae8E3l2S9j1169pKsV68OHulLUkcMfUnqiKEvSR0x9CWpI4a+JHXE0Jekjhj6ktQRQ1+SOmLoS1JHJh76STYmeSTJdJILJr1+SerZREM/yTLgn4H3AScBH0xy0iT7IEk9m/Szd04FpqvqUYAk1wCbgIcm3A9JGmmpnq8Ei/eMpUmH/mrgiaH53cA7hxsk2QpsbbO/SvLIYazvGOBnh/H+lySXjWyyJP0ag/2aH8fX/Nivechlh9WvP5htwaRDPzPU6gUzVVcCVy7IypKpqlq/EJ+1kOzX/Niv+bFf89NbvyZ9IXc3cMLQ/Bpgz4T7IEndmnTo3w2sS3JikiOBzcANE+6DJHVroqd3qupAkr8FbgaWAdur6sFFXOWCnCZaBPZrfuzX/Niv+emqX6mq0a0kSa8KfiNXkjpi6EtSR16RoZ9ke5K9SR6YZXmSXNEe9XBfklOGlm1Jsqv9bJlwvz7U+nNfku8ledvQsseS3J/k3iRTE+7Xu5M809Z9b5JPDi1btMdmjNGvvx/q0wNJnktydFu2KPsryQlJbkvycJIHk3x0hjYTH19j9mvi42vMfk18fI3Zr4mPr/bZr01yV5Iftr59aoY2RyX5atsvdyZZO7Ts463+SJIz592BqnrF/QB/BpwCPDDL8rOAbzH4XsAG4M5WPxp4tL2ubNMrJ9ivdx1cH4NHUdw5tOwx4Jgl2l/vBm6cob4M+DHwh8CRwA+BkybVr0Pa/gVw62LvL+B44JQ2/UbgPw/d5qUYX2P2a+Lja8x+TXx8jdOvpRhf7bMDvKFNHwHcCWw4pM3fAJ9v05uBr7bpk9p+Ogo4se2/ZfNZ/yvySL+qvgvsn6PJJuDqGrgDWJHkeOBMYGdV7a+qp4CdwMZJ9auqvtfWC3AHg+8pLLox9tdsnn9sRlX9L3DwsRlL0a8PAl9ZqHXPpqqerKrvt+lfAg8z+Cb5sImPr3H6tRTja8z9NZtFG18voV8TGV+tP1VVv2qzR7SfQ++o2QTsaNNfA05Pkla/pqqeraqfANMM9uPYXpGhP4aZHveweo76UjifwdHiQQV8J8k9GTyKYtL+pP25+a0kJ7fay2J/JfkdBuH59aHyou+v9if1OxgciQ1b0vE1R7+GTXx8jejXko2vUftrKcZXkmVJ7gX2MjhQmHWMVdUB4BngzSzAPpv0YxgmZbbHPYx8DMQkJHkPg3+UfzpUPq2q9iQ5FtiZ5EftSHgSvg/8QVX9KslZwL8C63iZ7C8Gf3r/R1UN/1WwqPsryRsYhMDHquoXhy6e4S0TGV8j+nWwzcTH14h+Ldn4Gmd/sQTjq6qeA96eZAVwXZI/qqrha1uLNsZerUf6sz3uYckfA5Hkj4EvAJuq6ucH61W1p73uBa5jnn+yHY6q+sXBPzer6ibgiCTH8DLYX81mDvnTezH3V5IjGATFl6rqGzM0WZLxNUa/lmR8jerXUo2vcfZXM9Hxdch6ngZu58WnAZ/fN0mWA29icCr08PfZYlyomMQPsJbZL0yezQsvtN3V6kcDP2FwkW1lmz56gv36fQbn4N51SP31wBuHpr8HbJxgv36P335R71Tg8bbvljO4GHkiv73QdvKk+tWWHxzsr5/E/mrbfTXwmTnaTHx8jdmviY+vMfs18fE1Tr+WYny1z1wFrGjTrwP+DXj/IW228cILude26ZN54YXcR5nnhdxX5OmdJF9hcEfAMUl2AxcyuBhCVX0euInBHRbTwK+B89qy/UkuZvAMIICL6oV/0i12vz7J4LzcZwfXZDhQg6foHcfgTzwY/EP4clV9e4L9+gDw10kOAP8DbK7BCFvUx2aM0S+AvwK+U1X/PfTWxdxfpwEfBu5v51wBPsEgUJdyfI3Tr6UYX+P0aynG1zj9gsmPLxjcWbQjg/9U6jUMAv3GJBcBU1V1A3AV8MUk0wx+KW1u/X4wybUM/g+SA8C2GpwqGpuPYZCkjrxaz+lLkmZg6EtSRwx9SeqIoS9JHTH0Jakjhr4kdcTQl6SO/D/WTZFNCrk6yQAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.hist(myArray.sum(axis=1))\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Look's familiar!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fig/Axis calls to `matplotlib`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's go back to plotting GDP:" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "gdp = pd.read_csv(\"https://raw.githubusercontent.com/UIUC-iSchool-DataViz/spring2020/master/week01/data/GDP.csv\")" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DATEGDP
01947-01-01243.164
11947-04-01245.968
21947-07-01249.585
31947-10-01259.745
41948-01-01265.742
.........
2862018-07-0120749.752
2872018-10-0120897.804
2882019-01-0121098.827
2892019-04-0121340.267
2902019-07-0121542.540
\n", "

291 rows × 2 columns

\n", "
" ], "text/plain": [ " DATE GDP\n", "0 1947-01-01 243.164\n", "1 1947-04-01 245.968\n", "2 1947-07-01 249.585\n", "3 1947-10-01 259.745\n", "4 1948-01-01 265.742\n", ".. ... ...\n", "286 2018-07-01 20749.752\n", "287 2018-10-01 20897.804\n", "288 2019-01-01 21098.827\n", "289 2019-04-01 21340.267\n", "290 2019-07-01 21542.540\n", "\n", "[291 rows x 2 columns]" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gdp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we recall, we had to do some data manipulation:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "gdp['DATE'] = pd.to_datetime(gdp['DATE'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And we got to the following plot:" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot(gdp['DATE'], gdp['GDP'])\n", "plt.xlabel('Years')\n", "plt.ylabel('US GDP in Billions')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So this is a fine way to interact with plots, but there is another way to interact with `matplotlib` through figure and axis \"objects\". I'm going to make the above plot with these calls and then we'll talk about what just happened:" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "fig, ax = plt.subplots(1,1,figsize=(8,6)) # creating figure & axis objects\n", "\n", "ax.plot(gdp['DATE'], gdp['GDP']) # plot now on the *axis object*\n", "ax.set_xlabel('Years') # note the set_ in the front!\n", "ax.set_ylabel('US GDP in Billions')\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So, its a bit bigger than before, but essentially there!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So what are these new things? This `fig` and `ax` thing? Well `fig` is an object that actually stores our figure:" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fig" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`ax` is an axis object that sort of \"holds\" all the plotting info:" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ax" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `set_` we have to add in is a function attached to this object. We can use `get_` to retrieve info about our plot. For example, the axis labels:" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Years'" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ax.get_xlabel()" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'US GDP in Billions'" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ax.get_ylabel()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plotting with style\n", "\n", "We've been using the default \"style\" for our plots, but it turns out there are a few different styles associated with `matplotlib`:" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['seaborn-dark',\n", " 'seaborn-darkgrid',\n", " 'seaborn-ticks',\n", " 'fivethirtyeight',\n", " 'seaborn-whitegrid',\n", " 'classic',\n", " '_classic_test',\n", " 'fast',\n", " 'seaborn-talk',\n", " 'seaborn-dark-palette',\n", " 'seaborn-bright',\n", " 'seaborn-pastel',\n", " 'grayscale',\n", " 'seaborn-notebook',\n", " 'ggplot',\n", " 'seaborn-colorblind',\n", " 'seaborn-muted',\n", " 'seaborn',\n", " 'Solarize_Light2',\n", " 'seaborn-paper',\n", " 'bmh',\n", " 'tableau-colorblind10',\n", " 'seaborn-white',\n", " 'dark_background',\n", " 'seaborn-poster',\n", " 'seaborn-deep']" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "plt.style.available" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But what do these all mean? These are different collections of ways that our plot line colors, fonts, thicknesses, sizes will be chosen for us. Let's try a new one:" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "style = 'seaborn-dark'\n", "plt.style.use(style)\n", "\n", "fig, ax = plt.subplots(1,1,figsize=(8,6)) # creating figure & axis objects\n", "\n", "ax.plot(gdp['DATE'], gdp['GDP']) # plot now on the *axis object*\n", "ax.set_xlabel('Years') # note the set_ in the front!\n", "ax.set_ylabel('US GDP in Billions')\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So now we can see that the fonts are different and the background has changed color. So, neat!\n", "\n", "The only problem is that now *all* of our plots will be in this style:" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fig, ax = plt.subplots(1,1)\n", "\n", "plt.plot([1,2,3])\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we want to use a certain plot style for *only* the cell we are currently running we can make use of the `with` keyword." ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "style = 'fivethirtyeight'\n", "with plt.style.context(style): # here context means we'll plot *with this style but only in this context*\n", " fig, ax = plt.subplots(figsize=(10, 8))\n", " ax.plot(gdp[\"DATE\"], gdp[\"GDP\"], '-')\n", " plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now if we plot again:" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fig, ax = plt.subplots(1,1)\n", "\n", "plt.plot([1,2,3])\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have whatever style we last used with `plt.style.use`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we want to try out a few different styles we can make a function:" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [], "source": [ "def make_gdp_plot(style): # note, \"style\" is something you can gooogle if you want more options\n", " with plt.style.context(style):\n", " fig, ax = plt.subplots(figsize=(10, 8))\n", " ax.set_title(\"Style: \" + style) # append 'Style:' and whatever style we chose\n", " ax.plot(gdp[\"DATE\"], gdp[\"GDP\"], '-')\n", " plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can run our function. Let's remind ourselves of our choses:" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['seaborn-dark',\n", " 'seaborn-darkgrid',\n", " 'seaborn-ticks',\n", " 'fivethirtyeight',\n", " 'seaborn-whitegrid',\n", " 'classic',\n", " '_classic_test',\n", " 'fast',\n", " 'seaborn-talk',\n", " 'seaborn-dark-palette',\n", " 'seaborn-bright',\n", " 'seaborn-pastel',\n", " 'grayscale',\n", " 'seaborn-notebook',\n", " 'ggplot',\n", " 'seaborn-colorblind',\n", " 'seaborn-muted',\n", " 'seaborn',\n", " 'Solarize_Light2',\n", " 'seaborn-paper',\n", " 'bmh',\n", " 'tableau-colorblind10',\n", " 'seaborn-white',\n", " 'dark_background',\n", " 'seaborn-poster',\n", " 'seaborn-deep']" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "plt.style.available" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "make_gdp_plot('ggplot')" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "make_gdp_plot('classic')" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "make_gdp_plot('bmh')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This plotting-with-a-function will come up later in class when we do this sort of thing interactively so stay tuned!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }