pandas 1.0最新版本特性抢先看
pandas1.0最新版本特性抢先看
!pip3 install pandas==1.0.0rc0
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Requirement already satisfied: pandas==1.0.0rc0 in /usr/local/lib/python3.7/site-packages (1.0.0rc0)
Requirement already satisfied: python-dateutil>=2.6.1 in /usr/local/lib/python3.7/site-packages (from pandas==1.0.0rc0) (2.8.0)
Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib/python3.7/site-packages (from pandas==1.0.0rc0) (1.17.3)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.7/site-packages (from pandas==1.0.0rc0) (2019.3)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/site-packages (from python-dateutil>=2.6.1->pandas==1.0.0rc0) (1.12.0)
import pandas as pd
pd.__version__
'1.0.0rc0'
df.info()
dataframe最新的df.info()可以提供更加详尽的信息
import pandas as pd
df = pd.DataFrame({
'A': [1,2,3],
'B': ["goodbye", "cruel","world"],
'C': [False, True, False]})
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 A 3 non-null int64
1 B 3 non-null object
2 C 3 non-null bool
dtypes: bool(1), int64(1), object(1)
memory usage: 179.0+ bytes
df.to_markdown()
可以输出markdown表格,这个是我最喜欢的特性。因为公众号不支持pandas的输出结果,每次我都是截图粘贴。
print(df.to_markdown())
| | A | B | C |
|---:|----:|:--------|:------|
| 0 | 1 | goodbye | False |
| 1 | 2 | cruel | True |
| 2 | 3 | world | False |
新增bool和stirng两种数据类型
dataframe之前只支持object、int和float,其中object就是python对象。新版本新增bool(布尔型)和string(字符串)。
目前这个改变是试验性质,所以使用该api接口要注意。但pandas建议使用类型声明,未来pandas会改善这部分,可能增加更强大的正则匹配功能。
默认,pandas还会使用之前的object,除非你对字段数据类型声明为string或者bool。
import pandas as pd
B = pd.Series(["goodbye", "cruel", "world"], dtype="string")
C = pd.Series([False, True, False], dtype="bool")
df = pd.DataFrame({'B':B, 'C':C})
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 B 3 non-null string
1 C 3 non-null bool
dtypes: bool(1), string(1)
memory usage: 155.0 bytes
选择指定数据类型的字段
pandas的1.0更新,最有用的特性就是筛选指定数据类型的字段
df.select_dtypes("string")