5.7.2. 实例-subset¶
导入一个student数据集:
import pandas as pd
stu_dic = {
'Age':[14,13,13,14,14,12,12,15,13,12,11,14,12,15,16,12,15,11,15],
'Height':[69,56.5,65.3,62.8,63.5,57.3,59.8,62.5,62.5,59,51.3,64.3,56.3,66.5,72,64.8,67,57.5,66.5],
'Name':['Alfred','Alice','Barbara','Carol','Henry','James','Jane','Janet','Jeffrey','John','Joyce','Judy','Louise','Marry','Philip','Robert','Ronald','Thomas','Willam'],
'Sex':['M','F','F','F','M','M','F','F','M','M','F','F','F','F','M','M','M','M','M'],
'Weight':[112.5,84,98,102.5,102.5,83,84.5,112.5,84,99.5,50.5,90,77,112,150,128,133,85,112]
}
student = pd.DataFrame(stu_dic)
查询数据的前5行或末尾5行:
In [63]: student.head()
...:
Out[63]:
Age Height Name Sex Weight
0 14 69.0 Alfred M 112.5
1 13 56.5 Alice F 84.0
2 13 65.3 Barbara F 98.0
3 14 62.8 Carol F 102.5
4 14 63.5 Henry M 102.5
In [64]: student.tail()
Out[64]:
Age Height Name Sex Weight
14 16 72.0 Philip M 150.0
15 12 64.8 Robert M 128.0
16 15 67.0 Ronald M 133.0
17 11 57.5 Thomas M 85.0
18 15 66.5 Willam M 112.0
查询指定的行:
In [65]: print(student.loc[[0,2,4,5,7]])
Age Height Name Sex Weight
0 14 69.0 Alfred M 112.5
2 13 65.3 Barbara F 98.0
4 14 63.5 Henry M 102.5
5 12 57.3 James M 83.0
7 15 62.5 Janet F 112.5
查询指定的列:
In [66]: print(student[['Name','Height','Weight']].head())
Name Height Weight
0 Alfred 69.0 112.5
1 Alice 56.5 84.0
2 Barbara 65.3 98.0
3 Carol 62.8 102.5
4 Henry 63.5 102.5
也可以通过loc索引标签查询指定的列:
In [67]: print(student.loc[:,['Name','Height','Weight']].head())
Name Height Weight
0 Alfred 69.0 112.5
1 Alice 56.5 84.0
2 Barbara 65.3 98.0
3 Carol 62.8 102.5
4 Henry 63.5 102.5
查询出所有12岁以上的女生信息:
In [68]: print(student[(student['Sex']=='F') & (student['Age']>12)])
Age Height Name Sex Weight
1 13 56.5 Alice F 84.0
2 13 65.3 Barbara F 98.0
3 14 62.8 Carol F 102.5
7 15 62.5 Janet F 112.5
11 14 64.3 Judy F 90.0
13 15 66.5 Marry F 112.0
查询出所有12岁以上的女生姓名、身高和体重:
In [69]: print(student[(student['Sex']=='F') & (student['Age']>12)][['Name','Height','Weight']])
Name Height Weight
1 Alice 56.5 84.0
2 Barbara 65.3 98.0
3 Carol 62.8 102.5
7 Janet 62.5 112.5
11 Judy 64.3 90.0
13 Marry 66.5 112.0
统计离散变量的观测数、唯一值个数、众数水平及个数:
In [83]: print(student['Sex'].describe())
count 19
unique 2
top M
freq 10
Name: Sex, dtype: object