主页

索引

模块索引

搜索页面

5.7.9. API-General functions

Constructor

简介:

One-dimensional ndarray with axis labels (including time series).

结构:

Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)

参数:

data: array-like, Iterable, dict, or scalar value
index: array-like or Index (1d)
dtype: str, numpy.dtype, or ExtensionDtype, optional
name: str, optional
copy: bool, default False

实例:

>>> d = {'a': 1, 'b': 2, 'c': 3}
>>> ser = pd.Series(data=d, index=['a', 'b', 'c'])
>>> ser
a   1
b   2
c   3
dtype: int64

实例:

>>> d = {'a': 1, 'b': 2, 'c': 3}
>>> ser = pd.Series(data=d, index=['x', 'y', 'z'])
>>> ser
x   NaN
y   NaN
z   NaN
dtype: float64

pivot_table

语法:

pivot_table(data,values=None,
            index=None,
            columns=None,
            aggfunc='mean',
            fill_value=None,
            margins=False,
            dropna=True,
            margins_name='All')
data:需要进行数据透视表操作的数据框
values:指定需要聚合的字段
index:指定某些原始变量作为行索引
columns:指定哪些离散的分组变量
aggfunc:指定相应的聚合函数
fill_value:使用一个常数替代缺失值,默认不替换
margins:是否进行行或列的汇总,默认不汇总
dropna:默认所有观测为缺失的列
margins_name:默认行汇总或列汇总的名称为'All'

实例

仍然以 student 表为例,来认识一下数据透视表pivot_table函数的用法:

// 对一个分组变量(Sex),一个数值变量(Height)作统计汇总
In [120]: Table1 = pd.pivot_table(student, values=['Height'], columns=['Sex'])
     ...: print(Table1)
Sex             F      M
Height  60.588889  63.91

对一个分组变量(Sex),两个数值变量(Height,Weight)作统计汇总:

In [129]: Table2 = pd.pivot_table(student, values=['Height','Weight'], columns=['Sex'])
     ...: print(Table2)
Sex             F       M
Height  60.588889   63.91
Weight  90.111111  108.95

对两个分组变量(Sex,Age),两个数值变量(Height,Weight)作统计汇总:

In [130]: Table3 = pd.pivot_table(student, values=['Height','Weight'], columns=['Sex','Age'])
     ...: print(Table3)
        Sex  Age
Height  F    11      51.300000
             12      58.050000
             13      60.900000
             14      63.550000
             15      64.500000
        M    11      57.500000
             12      60.366667
             13      62.500000
             14      66.250000
             15      66.750000
             16      72.000000
Weight  F    11      50.500000
             12      80.750000
             13      91.000000
             14      96.250000
             15     112.250000
        M    11      85.000000
             12     103.500000
             13      84.000000
             14     107.500000
             15     122.500000
             16     150.000000
dtype: float64

变成列联表的形式: 只需将结果进行非堆叠操作(unstack)即可:

In [131]: Table4 = pd.pivot_table(student,
      values=['Height','Weight'], columns=['Sex','Age']).unstack()
     ...: print(Table4)
Age           11          12    13      14      15     16
       Sex
Height F    51.3   58.050000  60.9   63.55   64.50    NaN
       M    57.5   60.366667  62.5   66.25   66.75   72.0
Weight F    50.5   80.750000  91.0   96.25  112.25    NaN
       M    85.0  103.500000  84.0  107.50  122.50  150.0

使用多个聚合函数:

In [133]: Table5 = pd.pivot_table(student, values=['Height','Weight'], columns=['Sex'],
    aggfunc=[np.mean,np.median,np.std])
     ...: print(Table5)
             mean         median                std
Sex             F       M      F       M          F          M
Height  60.588889   63.91   62.5   64.15   5.018328   4.937937
Weight  90.111111  108.95   90.0  107.25  19.383914  22.727186

主页

索引

模块索引

搜索页面