Uvod u Python Pandas

Peti deo

Filtriranje podataka

In [1]:
import pandas as pd
df = pd.read_csv('../datasets/bikes.csv')
df.head(3)
Out[1]:
trip_id usertype gender starttime stoptime tripduration from_station_name latitude_start longitude_start dpcapacity_start to_station_name latitude_end longitude_end dpcapacity_end temperature visibility wind_speed precipitation events
0 7147 Subscriber Male 2013-06-28 19:01:00 2013-06-28 19:17:00 993 Lake Shore Dr & Monroe St 41.881050 -87.616970 11.0 Michigan Ave & Oak St 41.90096 -87.623777 15.0 73.9 10.0 12.7 -9999.0 mostlycloudy
1 7524 Subscriber Male 2013-06-28 22:53:00 2013-06-28 23:03:00 623 Clinton St & Washington Blvd 41.883380 -87.641170 31.0 Wells St & Walton St 41.89993 -87.634430 19.0 69.1 10.0 6.9 -9999.0 partlycloudy
2 10927 Subscriber Male 2013-06-30 14:43:00 2013-06-30 15:01:00 1040 Sheffield Ave & Kingsbury St 41.909592 -87.653497 15.0 Dearborn St & Monroe St 41.88132 -87.629521 23.0 73.0 10.0 16.1 -9999.0 mostlycloudy

Odrediti u kojim redovima putovanje traje duže od 1000 sekundi. Da bismo napravili poređenje, biramo kolonu tripduration kao niz i upoređujemo je sa celim brojem 1000.

In [2]:
uslov = df['tripduration']  > 1000
uslov.head(3)
Out[2]:
0    False
1    False
2     True
Name: tripduration, dtype: bool

Kada napišemo df['tripduration'] > 1000, pandas upoređuje svaku vrednost u koloni tripduration sa 1000. Vraća novu seriju iste dužine kao tripduartion sa logičkim vrednostima koje odgovaraju ishodu poređenja.

In [3]:
len(uslov)
Out[3]:
50089
In [4]:
len(df)
Out[4]:
50089
In [5]:
df[uslov].head(3)
Out[5]:
trip_id usertype gender starttime stoptime tripduration from_station_name latitude_start longitude_start dpcapacity_start to_station_name latitude_end longitude_end dpcapacity_end temperature visibility wind_speed precipitation events
2 10927 Subscriber Male 2013-06-30 14:43:00 2013-06-30 15:01:00 1040 Sheffield Ave & Kingsbury St 41.909592 -87.653497 15.0 Dearborn St & Monroe St 41.881320 -87.629521 23.0 73.0 10.0 16.1 -9999.0 mostlycloudy
8 21028 Subscriber Male 2013-07-03 15:21:00 2013-07-03 15:42:00 1300 Clinton St & Washington Blvd 41.883380 -87.641170 31.0 Wood St & Division St 41.903320 -87.672730 15.0 71.1 8.0 0.0 -9999.0 cloudy
10 24383 Subscriber Male 2013-07-04 17:17:00 2013-07-04 17:42:00 1523 Morgan St & 18th St 41.858086 -87.651073 15.0 Damen Ave & Pierce Ave 41.909396 -87.677692 19.0 79.0 10.0 9.2 -9999.0 mostlycloudy

Koliko imamo redova čije je trajanje putovanja veće od 1000? Da bismo odgovorili na ovo pitanje, dodelimo rezultat logičke selekcije promenljivoj, a zatim uporedimo broj redova između nje i originalnog DataFrame-a.

In [6]:
bikes_duration_1000 = df[uslov]
bikes_duration_1000
Out[6]:
trip_id usertype gender starttime stoptime tripduration from_station_name latitude_start longitude_start dpcapacity_start to_station_name latitude_end longitude_end dpcapacity_end temperature visibility wind_speed precipitation events
2 10927 Subscriber Male 2013-06-30 14:43:00 2013-06-30 15:01:00 1040 Sheffield Ave & Kingsbury St 41.909592 -87.653497 15.0 Dearborn St & Monroe St 41.881320 -87.629521 23.0 73.0 10.0 16.1 -9999.00 mostlycloudy
8 21028 Subscriber Male 2013-07-03 15:21:00 2013-07-03 15:42:00 1300 Clinton St & Washington Blvd 41.883380 -87.641170 31.0 Wood St & Division St 41.903320 -87.672730 15.0 71.1 8.0 0.0 -9999.00 cloudy
10 24383 Subscriber Male 2013-07-04 17:17:00 2013-07-04 17:42:00 1523 Morgan St & 18th St 41.858086 -87.651073 15.0 Damen Ave & Pierce Ave 41.909396 -87.677692 19.0 79.0 10.0 9.2 -9999.00 mostlycloudy
11 24673 Subscriber Male 2013-07-04 18:13:00 2013-07-04 18:42:00 1697 Ashland Ave & Armitage Ave 41.917859 -87.668919 15.0 Lincoln Ave & Armitage Ave 41.918273 -87.638116 19.0 79.0 10.0 10.4 -9999.00 mostlycloudy
12 26214 Subscriber Male 2013-07-05 10:02:00 2013-07-05 10:40:00 2263 Jefferson St & Monroe St 41.880422 -87.642746 19.0 Jefferson St & Monroe St 41.880422 -87.642746 19.0 79.0 10.0 0.0 -9999.00 partlycloudy
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
50058 17525403 Subscriber Female 2017-12-23 13:48:00 2017-12-23 14:14:00 1559 Michigan Ave & Madison St 41.882134 -87.625125 19.0 Shedd Aquarium 41.867226 -87.615355 55.0 28.0 10.0 8.1 -9999.00 mostlycloudy
50077 17533484 Subscriber Male 2017-12-29 09:13:00 2017-12-29 09:53:00 2378 Clinton St & 18th St 41.857950 -87.640826 15.0 Canal St & Taylor St 41.870257 -87.639474 15.0 12.9 9.0 4.6 -9999.00 cloudy
50080 17534057 Subscriber Male 2017-12-29 15:28:00 2017-12-29 15:51:00 1378 Cityfront Plaza Dr & Pioneer Ct 41.890573 -87.622072 23.0 Mies van der Rohe Way & Chestnut St 41.898587 -87.621915 19.0 14.0 1.5 6.9 0.01 snow
50083 17534831 Subscriber Male 2017-12-30 11:36:00 2017-12-30 11:55:00 1175 Western Ave & Walton St 41.898418 -87.686596 19.0 Damen Ave & Clybourn Ave 41.931931 -87.677856 15.0 3.9 10.0 13.8 -9999.00 partlycloudy
50084 17534938 Subscriber Male 2017-12-30 13:07:00 2017-12-30 13:34:00 1625 State St & Pearson St 41.897448 -87.628722 27.0 Clark St & Elm St 41.902973 -87.631280 27.0 5.0 10.0 16.1 -9999.00 partlycloudy

10178 rows × 19 columns

In [7]:
len(df)
Out[7]:
50089
In [8]:
len(bikes_duration_1000)
Out[8]:
10178

Izračunali smo da je 20% vožnji duže od 1000 sekundi.

In [9]:
len(bikes_duration_1000) / len(df)
Out[9]:
0.20319830701351593

Logički izbor u jednom redu

In [10]:
df[df['tripduration'] > 1000].head(3)
Out[10]:
trip_id usertype gender starttime stoptime tripduration from_station_name latitude_start longitude_start dpcapacity_start to_station_name latitude_end longitude_end dpcapacity_end temperature visibility wind_speed precipitation events
2 10927 Subscriber Male 2013-06-30 14:43:00 2013-06-30 15:01:00 1040 Sheffield Ave & Kingsbury St 41.909592 -87.653497 15.0 Dearborn St & Monroe St 41.881320 -87.629521 23.0 73.0 10.0 16.1 -9999.0 mostlycloudy
8 21028 Subscriber Male 2013-07-03 15:21:00 2013-07-03 15:42:00 1300 Clinton St & Washington Blvd 41.883380 -87.641170 31.0 Wood St & Division St 41.903320 -87.672730 15.0 71.1 8.0 0.0 -9999.0 cloudy
10 24383 Subscriber Male 2013-07-04 17:17:00 2013-07-04 17:42:00 1523 Morgan St & 18th St 41.858086 -87.651073 15.0 Damen Ave & Pierce Ave 41.909396 -87.677692 19.0 79.0 10.0 9.2 -9999.0 mostlycloudy

Primer: pronaći sve vožnje koje su se desile kada je vreme bilo oblačno. Koristimo == operator da testiramo jednakost i ponovo prosleđujemo ovu promenljivu u zagrade čime se završava naš izbor.

In [11]:
uslov = df['events'] == 'cloudy'
df[uslov].head(3)
Out[11]:
trip_id usertype gender starttime stoptime tripduration from_station_name latitude_start longitude_start dpcapacity_start to_station_name latitude_end longitude_end dpcapacity_end temperature visibility wind_speed precipitation events
6 18880 Subscriber Male 2013-07-02 17:47:00 2013-07-02 17:56:00 565 Clark St & Randolph St 41.884576 -87.631890 31.0 Ravenswood Ave & Irving Park Rd 41.954690 -87.673930 19.0 66.0 10.0 15.0 -9999.0 cloudy
7 19689 Subscriber Male 2013-07-03 09:07:00 2013-07-03 09:16:00 505 State St & Van Buren St 41.877181 -87.627844 27.0 Franklin St & Jackson Blvd 41.877708 -87.635321 27.0 64.0 7.0 5.8 -9999.0 cloudy
8 21028 Subscriber Male 2013-07-03 15:21:00 2013-07-03 15:42:00 1300 Clinton St & Washington Blvd 41.883380 -87.641170 31.0 Wood St & Division St 41.903320 -87.672730 15.0 71.1 8.0 0.0 -9999.0 cloudy
In [12]:
df[df['events'] == 'cloudy'].head(3)
Out[12]:
trip_id usertype gender starttime stoptime tripduration from_station_name latitude_start longitude_start dpcapacity_start to_station_name latitude_end longitude_end dpcapacity_end temperature visibility wind_speed precipitation events
6 18880 Subscriber Male 2013-07-02 17:47:00 2013-07-02 17:56:00 565 Clark St & Randolph St 41.884576 -87.631890 31.0 Ravenswood Ave & Irving Park Rd 41.954690 -87.673930 19.0 66.0 10.0 15.0 -9999.0 cloudy
7 19689 Subscriber Male 2013-07-03 09:07:00 2013-07-03 09:16:00 505 State St & Van Buren St 41.877181 -87.627844 27.0 Franklin St & Jackson Blvd 41.877708 -87.635321 27.0 64.0 7.0 5.8 -9999.0 cloudy
8 21028 Subscriber Male 2013-07-03 15:21:00 2013-07-03 15:42:00 1300 Clinton St & Washington Blvd 41.883380 -87.641170 31.0 Wood St & Division St 41.903320 -87.672730 15.0 71.1 8.0 0.0 -9999.0 cloudy

Vežbe

Za sledeće vežbe koristite skup podataka movie.csv sa skupom naslova kao indeksom (title).

Vežba 1

Izaberite sve filmove u kojima je Johnny Depp glumac1. U koliko od ovih filmova je glumio?

In [ ]:
 
In [ ]:
 
In [ ]:
 

Vežba 2

Izaberite filmove sa IMDB score ocenom većom od 9.

In [ ]:
 

Vežba 3

Napišite funkciju koja prihvata jedan parametar da biste pronašli broj filmova za datu ocenu sadržaja. Koristite ovu funkciju da biste pronašli broj filmova ocenjene sa „R“, „PG-13“ i „PG“.

In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 

Višestruki uslovi

Pronaći sve vožnje duže od 1000 sekundi kada je bilo oblačno.
Ovaj upit ima dva uslova - trajanje putovanja veće od 1000 i oblačno vreme.

In [13]:
import pandas as pd
df = pd.read_csv('../datasets/bikes.csv')
df.head(3)
Out[13]:
trip_id usertype gender starttime stoptime tripduration from_station_name latitude_start longitude_start dpcapacity_start to_station_name latitude_end longitude_end dpcapacity_end temperature visibility wind_speed precipitation events
0 7147 Subscriber Male 2013-06-28 19:01:00 2013-06-28 19:17:00 993 Lake Shore Dr & Monroe St 41.881050 -87.616970 11.0 Michigan Ave & Oak St 41.90096 -87.623777 15.0 73.9 10.0 12.7 -9999.0 mostlycloudy
1 7524 Subscriber Male 2013-06-28 22:53:00 2013-06-28 23:03:00 623 Clinton St & Washington Blvd 41.883380 -87.641170 31.0 Wells St & Walton St 41.89993 -87.634430 19.0 69.1 10.0 6.9 -9999.0 partlycloudy
2 10927 Subscriber Male 2013-06-30 14:43:00 2013-06-30 15:01:00 1040 Sheffield Ave & Kingsbury St 41.909592 -87.653497 15.0 Dearborn St & Monroe St 41.88132 -87.629521 23.0 73.0 10.0 16.1 -9999.0 mostlycloudy
In [14]:
uslov1 = df['tripduration'] > 1000
uslov2 = df['events'] == 'cloudy'
uslov = uslov1 & uslov2
df[uslov].head(3)
Out[14]:
trip_id usertype gender starttime stoptime tripduration from_station_name latitude_start longitude_start dpcapacity_start to_station_name latitude_end longitude_end dpcapacity_end temperature visibility wind_speed precipitation events
8 21028 Subscriber Male 2013-07-03 15:21:00 2013-07-03 15:42:00 1300 Clinton St & Washington Blvd 41.883380 -87.641170 31.0 Wood St & Division St 41.903320 -87.672730 15.0 71.1 8.0 0.0 -9999.0 cloudy
18 40924 Subscriber Male 2013-07-09 13:12:00 2013-07-09 14:42:00 5396 Canal St & Jackson Blvd 41.878114 -87.639971 35.0 Millennium Park 41.881032 -87.624084 35.0 79.0 10.0 13.8 0.0 cloudy
80 90932 Subscriber Female 2013-07-22 07:59:00 2013-07-22 08:19:00 1224 Lincoln Ave & Armitage Ave 41.918273 -87.638116 19.0 Dearborn St & Adams St 41.879356 -87.629791 19.0 73.4 10.0 0.0 -9999.0 cloudy

Više uslova u jednom redu

In [15]:
df[(df['tripduration'] > 1000) & (df['events'] == 'cloudy')].head(3)
Out[15]:
trip_id usertype gender starttime stoptime tripduration from_station_name latitude_start longitude_start dpcapacity_start to_station_name latitude_end longitude_end dpcapacity_end temperature visibility wind_speed precipitation events
8 21028 Subscriber Male 2013-07-03 15:21:00 2013-07-03 15:42:00 1300 Clinton St & Washington Blvd 41.883380 -87.641170 31.0 Wood St & Division St 41.903320 -87.672730 15.0 71.1 8.0 0.0 -9999.0 cloudy
18 40924 Subscriber Male 2013-07-09 13:12:00 2013-07-09 14:42:00 5396 Canal St & Jackson Blvd 41.878114 -87.639971 35.0 Millennium Park 41.881032 -87.624084 35.0 79.0 10.0 13.8 0.0 cloudy
80 90932 Subscriber Female 2013-07-22 07:59:00 2013-07-22 08:19:00 1224 Lincoln Ave & Armitage Ave 41.918273 -87.638116 19.0 Dearborn St & Adams St 41.879356 -87.629791 19.0 73.4 10.0 0.0 -9999.0 cloudy

Vežba 4

U skupu podataka movie.csv izaberite sve filmove iz 1970-ih.

In [ ]:
 
In [ ]:
 

Vežba 5

Izaberite sve filmove iz 1970-ih koji su imali IMDB ocene veće od 8.

In [ ]:
 

Vežba 6

Izaberite filmove koji su ocenjeni sa R, PG-13 ili PG.

In [ ]:
 
In [ ]: