.. _pandas_example_subset:

实例-subset
###########

导入一个student数据集::

    import pandas as pd

    stu_dic = {
      'Age':[14,13,13,14,14,12,12,15,13,12,11,14,12,15,16,12,15,11,15],
      'Height':[69,56.5,65.3,62.8,63.5,57.3,59.8,62.5,62.5,59,51.3,64.3,56.3,66.5,72,64.8,67,57.5,66.5],
      'Name':['Alfred','Alice','Barbara','Carol','Henry','James','Jane','Janet','Jeffrey','John','Joyce','Judy','Louise','Marry','Philip','Robert','Ronald','Thomas','Willam'],
      'Sex':['M','F','F','F','M','M','F','F','M','M','F','F','F','F','M','M','M','M','M'],
      'Weight':[112.5,84,98,102.5,102.5,83,84.5,112.5,84,99.5,50.5,90,77,112,150,128,133,85,112]
    }
    student = pd.DataFrame(stu_dic)

查询数据的前5行或末尾5行::

    In [63]: student.head()
        ...:
    Out[63]:
       Age  Height     Name Sex  Weight
    0   14    69.0   Alfred   M   112.5
    1   13    56.5    Alice   F    84.0
    2   13    65.3  Barbara   F    98.0
    3   14    62.8    Carol   F   102.5
    4   14    63.5    Henry   M   102.5

    In [64]: student.tail()
    Out[64]:
        Age  Height    Name Sex  Weight
    14   16    72.0  Philip   M   150.0
    15   12    64.8  Robert   M   128.0
    16   15    67.0  Ronald   M   133.0
    17   11    57.5  Thomas   M    85.0
    18   15    66.5  Willam   M   112.0

查询指定的行::

    In [65]: print(student.loc[[0,2,4,5,7]])
       Age  Height     Name Sex  Weight
    0   14    69.0   Alfred   M   112.5
    2   13    65.3  Barbara   F    98.0
    4   14    63.5    Henry   M   102.5
    5   12    57.3    James   M    83.0
    7   15    62.5    Janet   F   112.5

查询指定的列::

    In [66]: print(student[['Name','Height','Weight']].head())
          Name  Height  Weight
    0   Alfred    69.0   112.5
    1    Alice    56.5    84.0
    2  Barbara    65.3    98.0
    3    Carol    62.8   102.5
    4    Henry    63.5   102.5

也可以通过loc索引标签查询指定的列::

    In [67]: print(student.loc[:,['Name','Height','Weight']].head())
          Name  Height  Weight
    0   Alfred    69.0   112.5
    1    Alice    56.5    84.0
    2  Barbara    65.3    98.0
    3    Carol    62.8   102.5
    4    Henry    63.5   102.5

查询出所有12岁以上的女生信息::

    In [68]: print(student[(student['Sex']=='F') & (student['Age']>12)])
        Age  Height     Name Sex  Weight
    1    13    56.5    Alice   F    84.0
    2    13    65.3  Barbara   F    98.0
    3    14    62.8    Carol   F   102.5
    7    15    62.5    Janet   F   112.5
    11   14    64.3     Judy   F    90.0
    13   15    66.5    Marry   F   112.0

查询出所有12岁以上的女生姓名、身高和体重::

    In [69]: print(student[(student['Sex']=='F') & (student['Age']>12)][['Name','Height','Weight']])
           Name  Height  Weight
    1     Alice    56.5    84.0
    2   Barbara    65.3    98.0
    3     Carol    62.8   102.5
    7     Janet    62.5   112.5
    11     Judy    64.3    90.0
    13    Marry    66.5   112.0

统计离散变量的观测数、唯一值个数、众数水平及个数::

    In [83]: print(student['Sex'].describe())
    count     19
    unique     2
    top        M
    freq      10
    Name: Sex, dtype: object