5.7.9. API-General functions¶
Constructor¶
简介:
One-dimensional ndarray with axis labels (including time series).
结构:
Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)
参数:
data: array-like, Iterable, dict, or scalar value
index: array-like or Index (1d)
dtype: str, numpy.dtype, or ExtensionDtype, optional
name: str, optional
copy: bool, default False
实例:
>>> d = {'a': 1, 'b': 2, 'c': 3}
>>> ser = pd.Series(data=d, index=['a', 'b', 'c'])
>>> ser
a 1
b 2
c 3
dtype: int64
实例:
>>> d = {'a': 1, 'b': 2, 'c': 3}
>>> ser = pd.Series(data=d, index=['x', 'y', 'z'])
>>> ser
x NaN
y NaN
z NaN
dtype: float64
pivot_table¶
语法:
pivot_table(data,values=None,
index=None,
columns=None,
aggfunc='mean',
fill_value=None,
margins=False,
dropna=True,
margins_name='All')
data:需要进行数据透视表操作的数据框
values:指定需要聚合的字段
index:指定某些原始变量作为行索引
columns:指定哪些离散的分组变量
aggfunc:指定相应的聚合函数
fill_value:使用一个常数替代缺失值,默认不替换
margins:是否进行行或列的汇总,默认不汇总
dropna:默认所有观测为缺失的列
margins_name:默认行汇总或列汇总的名称为'All'
实例¶
仍然以 student 表为例,来认识一下数据透视表pivot_table函数的用法:
// 对一个分组变量(Sex),一个数值变量(Height)作统计汇总
In [120]: Table1 = pd.pivot_table(student, values=['Height'], columns=['Sex'])
...: print(Table1)
Sex F M
Height 60.588889 63.91
对一个分组变量(Sex),两个数值变量(Height,Weight)作统计汇总:
In [129]: Table2 = pd.pivot_table(student, values=['Height','Weight'], columns=['Sex'])
...: print(Table2)
Sex F M
Height 60.588889 63.91
Weight 90.111111 108.95
对两个分组变量(Sex,Age),两个数值变量(Height,Weight)作统计汇总:
In [130]: Table3 = pd.pivot_table(student, values=['Height','Weight'], columns=['Sex','Age'])
...: print(Table3)
Sex Age
Height F 11 51.300000
12 58.050000
13 60.900000
14 63.550000
15 64.500000
M 11 57.500000
12 60.366667
13 62.500000
14 66.250000
15 66.750000
16 72.000000
Weight F 11 50.500000
12 80.750000
13 91.000000
14 96.250000
15 112.250000
M 11 85.000000
12 103.500000
13 84.000000
14 107.500000
15 122.500000
16 150.000000
dtype: float64
变成列联表的形式: 只需将结果进行非堆叠操作(unstack)即可:
In [131]: Table4 = pd.pivot_table(student,
values=['Height','Weight'], columns=['Sex','Age']).unstack()
...: print(Table4)
Age 11 12 13 14 15 16
Sex
Height F 51.3 58.050000 60.9 63.55 64.50 NaN
M 57.5 60.366667 62.5 66.25 66.75 72.0
Weight F 50.5 80.750000 91.0 96.25 112.25 NaN
M 85.0 103.500000 84.0 107.50 122.50 150.0
使用多个聚合函数:
In [133]: Table5 = pd.pivot_table(student, values=['Height','Weight'], columns=['Sex'],
aggfunc=[np.mean,np.median,np.std])
...: print(Table5)
mean median std
Sex F M F M F M
Height 60.588889 63.91 62.5 64.15 5.018328 4.937937
Weight 90.111111 108.95 90.0 107.25 19.383914 22.727186