6.7.2. 实例-subset¶

导入一个student数据集:

import pandas as pd

stu_dic = {
  'Age':[14,13,13,14,14,12,12,15,13,12,11,14,12,15,16,12,15,11,15],
  'Height':[69,56.5,65.3,62.8,63.5,57.3,59.8,62.5,62.5,59,51.3,64.3,56.3,66.5,72,64.8,67,57.5,66.5],
  'Name':['Alfred','Alice','Barbara','Carol','Henry','James','Jane','Janet','Jeffrey','John','Joyce','Judy','Louise','Marry','Philip','Robert','Ronald','Thomas','Willam'],
  'Sex':['M','F','F','F','M','M','F','F','M','M','F','F','F','F','M','M','M','M','M'],
  'Weight':[112.5,84,98,102.5,102.5,83,84.5,112.5,84,99.5,50.5,90,77,112,150,128,133,85,112]
}
student = pd.DataFrame(stu_dic)

查询数据的前5行或末尾5行:

In [63]: student.head()
    ...:
Out[63]:
   Age  Height     Name Sex  Weight
0   14    69.0   Alfred   M   112.5
1   13    56.5    Alice   F    84.0
2   13    65.3  Barbara   F    98.0
3   14    62.8    Carol   F   102.5
4   14    63.5    Henry   M   102.5

In [64]: student.tail()
Out[64]:
    Age  Height    Name Sex  Weight
14   16    72.0  Philip   M   150.0
15   12    64.8  Robert   M   128.0
16   15    67.0  Ronald   M   133.0
17   11    57.5  Thomas   M    85.0
18   15    66.5  Willam   M   112.0

查询指定的行:

In [65]: print(student.loc[[0,2,4,5,7]])
   Age  Height     Name Sex  Weight
 14    69.0   Alfred   M   112.5
 13    65.3  Barbara   F    98.0
 14    63.5    Henry   M   102.5
 12    57.3    James   M    83.0
 15    62.5    Janet   F   112.5

查询指定的列:

In [66]: print(student[['Name','Height','Weight']].head())
      Name  Height  Weight
 Alfred    69.0   112.5
  Alice    56.5    84.0
Barbara    65.3    98.0
  Carol    62.8   102.5
  Henry    63.5   102.5

也可以通过loc索引标签查询指定的列:

In [67]: print(student.loc[:,['Name','Height','Weight']].head())
      Name  Height  Weight
 Alfred    69.0   112.5
  Alice    56.5    84.0
Barbara    65.3    98.0
  Carol    62.8   102.5
  Henry    63.5   102.5

查询出所有12岁以上的女生信息:

In [68]: print(student[(student['Sex']=='F') & (student['Age']>12)])
    Age  Height     Name Sex  Weight
  13    56.5    Alice   F    84.0
  13    65.3  Barbara   F    98.0
  14    62.8    Carol   F   102.5
  15    62.5    Janet   F   112.5
 14    64.3     Judy   F    90.0
 15    66.5    Marry   F   112.0

查询出所有12岁以上的女生姓名、身高和体重:

In [69]: print(student[(student['Sex']=='F') & (student['Age']>12)][['Name','Height','Weight']])
       Name  Height  Weight
   Alice    56.5    84.0
 Barbara    65.3    98.0
   Carol    62.8   102.5
   Janet    62.5   112.5
   Judy    64.3    90.0
  Marry    66.5   112.0

统计离散变量的观测数、唯一值个数、众数水平及个数:

In [83]: print(student['Sex'].describe())
count     19
unique     2
top        M
freq      10
Name: Sex, dtype: object