读书笔记 pandas 第1章 Series对象

https://github.com/jakevdp/PythonDataScienceHandbook

Pandas 有三个基本数据结构:SeriesDataFrameIndex

NumPy 数组通过隐式定义的整数索引获取数值,

而 Pandas 的 Series 对象用一种显式定义的索引与数值关联。

导入

1
2
import numpy as np
import pandas as pd

Series对象

1
2
3
4
5
6
7
8
data = pd.Series([0.25, 0.5, 0.75, 1.0], index=['a', 'b', 'c', 'd'])
data

a 0.25
b 0.50
c 0.75
d 1.00
dtype: float64
1
2
3
data.values

>>> array([ 0.25, 0.5 , 0.75, 1. ])
1
2
3
data.index

>>> RangeIndex(start=0, stop=4, step=1)
1
2
3
4
5
data[1:3]

>>> 1 0.50
2 0.75
dtype: float64
1
2
3
data['b']

>>> 0.5
1
2
3
'a' in data

>>> True
1
2
3
data.keys()

>>> Index(['a', 'b', 'c', 'd'], dtype='object')
1
2
3
list(data.items())

>>> [('a', 0.25), ('b', 0.5), ('c', 0.75), ('d', 1.0)]
1
2
3
4
5
6
7
8
9
data['e'] = 1.25
data

a 0.25
b 0.50
c 0.75
d 1.00
e 1.25
dtype: float64

字典

1
2
3
4
5
6
7
8
9
10
11
12
13
14
population_dict = {'California': 38332521,
'Texas': 26448193,
'New York': 19651127,
'Florida': 19552860,
'Illinois': 12882135}
population = pd.Series(population_dict)
population

California 38332521
Florida 19552860
Illinois 12882135
New York 19651127
Texas 26448193
dtype: int64
1
2
3
population['California']

>>> 38332521
1
2
3
4
5
6
population['California':'Illinois']

>>> California 38332521
Florida 19552860
Illinois 12882135
dtype: int64

显式指定索引

1
2
3
4
5
6
pd.Series([2, 4, 6])

>>> 0 2
1 4
2 6
dtype: int64
1
2
3
4
5
pd.Series({2:'a', 1:'b', 3:'c'}, index=[3, 2])

>>> 3 c
2 a
dtype: object