Pands

Pandas:提供便于分析的数据类型,提供数据分析的各种函数 import pandas as pd pandas基于numpy实现,常与numpy和matplotlib一同使用提供的数据类型:Series(一维标签数据),DataFrame(二维-多维) 基于ndarray(数据的结构表达—维度)的扩展数据类型(应用表达—数据与索引之间)是基于索引的数据结构,对数据的操作基于对索引的操作

Series

Series类型:由一组数据及与之相关的数据索引组成 自动索引,自定义索引生成series:多种方法基本操作: a.index a.values a["a"] a[0] 切片a[1:] 判断索引是否在series列表中"c" in a 两个或多个series之间的对齐操作:对相同的索引进行对齐 series修改a["a"]=9,随时修改,随时应用

import pandas as pd

import numpy as np

# python列表创建

a = pd.Series([1,2,3,4], index=["a","b","c","d"])

print(a)

# 标量值创建,不能省略index

b = pd.Series(2, index=["a","b","c","d"])

print(b)

# 字典类型创建

c = pd.Series({"a":1,"b":2})

print(c)

d = pd.Series({"a":1,"b":2}, index = {"c", "a", "b"})# 索引指定挑取值

print(d)

# ndarray创建

e = pd.Series(np.arange(5))

print(e)

f = pd.Series(np.arange(5), index=np.arange(9,4,-1))# 创建值-索引

print(f)

# 基本操作—索引与值的读取

a = pd.Series([1,2,3,4], ["a","b","c","d"])

print(a.index)

print(a.values)

print(a["b"])# 两种索引可以单独使用,但不可混合使用

print(a[1:3])

"c" in a

a.get("f",100)

a 1

b 2

c 3

d 4

dtype: int64

a 2

b 2

c 2

d 2

dtype: int64

a 1

b 2

dtype: int64

b 2.0

a 1.0

c NaN

dtype: float64

0 0

1 1

2 2

3 3

4 4

dtype: int32

9 0

8 1

7 2

6 3

5 4

dtype: int32

Index(['a', 'b', 'c', 'd'], dtype='object')

[1 2 3 4]

2

b 2

c 3

dtype: int64

100

DataFrame类型

共用同一索引的多列表格 index(行索引)—Column(列索引),0开始创建:

import pandas as pd

import numpy as np

# 从二维ndarray创建

a = pd.DataFrame(np.arange(10).reshape(2,5))

print(a)

# 从字典创建

b = {"one":pd.Series([1,2,3],index=["a","b","c"]),

"two":pd.Series([6,7,8,9], index=["a","b","c","d"])}

c = pd.DataFrame(b)

print(c)

print(pd.DataFrame(b, index=["a","d"],columns=["one"]))

# 从列表类型的字典创建

dl = {"one":[1,2,3],"two":[6,7,8]}

d = pd.DataFrame(dl, index=["a","b","c"])

print(d)

0 1 2 3 4

0 0 1 2 3 4

1 5 6 7 8 9

one two

a 1.0 6

b 2.0 7

c 3.0 8

d NaN 9

one

a 1.0

d NaN

one two

a 1 6

b 2 7

c 3 8

Pandas库的数据类型操作

改变结构: 增加或重排:重新索引,reindex 删除:drop fill_value:填充缺失值索引类型的操作: .append(idx):连接另一个index对象 .diff(idx):计算差集,产生新的index对象 ……

import numpy as np

import pandas as pd

dl = {"one":[1,2,3],"two":[6,7,8],"three":[4,5,9]}

d = pd.DataFrame(dl, index=["a","b","c"])

print(d)

print(d.drop("a"))

print(d.drop("one",axis=1))# axis=1代表横向

d = d.reindex(index=["b","c","a"])# 行重排

print(d)

d = d.reindex(columns=["three","one","two"])# 列重排

print(d)

# f = d.columns.insert(4,"新增")

# f = d.reindex(columns+f, fill_value=20)

# print(f)

# 索引的操作

nc = d.columns.delete(2)

print(nc)

ni = d.index.insert(3,"m")

print(ni)

nd = d.reindex(index=ni,columns=nc)

print(nd)

n = pd.Series([1,2,3,4],index=["j","k","l","o"])

print(n)

print(n.drop(["j"]))# .drop函数会产生新的series,而不改变原来的series

print(n)

one two three

a 1 6 4

b 2 7 5

c 3 8 9

one two three

b 2 7 5

c 3 8 9

two three

a 6 4

b 7 5

c 8 9

one two three

b 2 7 5

c 3 8 9

a 1 6 4

three one two

b 5 2 7

c 9 3 8

a 4 1 6

Index(['three', 'one'], dtype='object')

Index(['b', 'c', 'a', 'm'], dtype='object')

three one

b 5.0 2.0

c 9.0 3.0

a 4.0 1.0

m NaN NaN

j 1

k 2

l 3

o 4

dtype: int64

k 2

l 3

o 4

dtype: int64

j 1

k 2

l 3

o 4

dtype: int64

Pandas算术运算

广播运算:不同维度,不同尺寸就补齐(NaN)后运算,值为NaN四则运算:符号运算,参数运算,两种方式series与DataFrame之间的运算:series默认在axis=1参与运算比较运算:同维度运算,需要尺寸一致;不同维度,默认在一轴

import pandas as pd

import numpy as np

a = pd.DataFrame(np.arange(12).reshape(3,4))

print(a)

b = pd.DataFrame(np.arange(20).reshape(4,5))

print(b)

print(a+b)# 出现补齐运算

# 四则运算使用参数进行运算,好处是可以增加参数

print(a.add(b,fill_value=10))# 将缺失值补为某个确定的值

c = pd.Series(np.arange(4))

print(c)

print(b-c)# series默认在axis=1参与运算

# 比较运算

# print(a>b) # 报错

print(a>c)

0 1 2 3

0 0 1 2 3

1 4 5 6 7

2 8 9 10 11

0 1 2 3 4

0 0 1 2 3 4

1 5 6 7 8 9

2 10 11 12 13 14

3 15 16 17 18 19

0 1 2 3 4

0 0.0 2.0 4.0 6.0 NaN

1 9.0 11.0 13.0 15.0 NaN

2 18.0 20.0 22.0 24.0 NaN

3 NaN NaN NaN NaN NaN

0 1 2 3 4

0 0.0 2.0 4.0 6.0 14.0

1 9.0 11.0 13.0 15.0 19.0

2 18.0 20.0 22.0 24.0 24.0

3 25.0 26.0 27.0 28.0 29.0

0 0

1 1

2 2

3 3

dtype: int32

0 1 2 3 4

0 0.0 0.0 0.0 0.0 NaN

1 5.0 5.0 5.0 5.0 NaN

2 10.0 10.0 10.0 10.0 NaN

3 15.0 15.0 15.0 15.0 NaN

0 1 2 3

0 False False False False

1 True True True True

2 True True True True

数据的排序

指定轴上进行索引排序.sort_index():默认升序,默认是0轴操作,也就是纵向,指定1,横向操作指定轴上根据数值进行排序.sort_values():默认升序,默认是0轴操作

import pandas as pd

import numpy as np

# 索引排序

a = pd.DataFrame(np.arange(12).reshape(3,4), index=["a","b","c"])

print(a)

b = a.sort_index(ascending=False)# 默认在0轴操作

print(b)

c = a.sort_index(axis=1, ascending=False)

print(c)

# 值排序

d = a.sort_values(2, ascending=False)# 根据column=2这一列进行排序

print(d)

e = a.sort_values("a", axis=1, ascending=False)# 根据index="a"这一行进行排序

print(e)

0 1 2 3

a 0 1 2 3

b 4 5 6 7

c 8 9 10 11

0 1 2 3

c 8 9 10 11

b 4 5 6 7

a 0 1 2 3

3 2 1 0

a 3 2 1 0

b 7 6 5 4

c 11 10 9 8

0 1 2 3

c 8 9 10 11

b 4 5 6 7

a 0 1 2 3

3 2 1 0

a 3 2 1 0

b 7 6 5 4

c 11 10 9 8

数据基本统计分析

一些函数:.sum()...... .decribe():包含多种信息

import pandas as pd

import numpy as np

# series

a = pd.Series(np.arange(3), index=["a","b","c"])

print(a)

print(a.describe())# 是包含多种计算的series类型,可以根据索引获得其中的值

print(a.describe()["mean"])

# dataframe类型

b = pd.DataFrame(np.arange(12).reshape(3,4), index=["a","b","c"])

print(b.describe())

print(b.describe()[2])

a 0

b 1

c 2

dtype: int32

count 3.0

mean 1.0

std 1.0

min 0.0

25% 0.5

50% 1.0

75% 1.5

max 2.0

dtype: float64

1.0

0 1 2 3

count 3.0 3.0 3.0 3.0

mean 4.0 5.0 6.0 7.0

std 4.0 4.0 4.0 4.0

min 0.0 1.0 2.0 3.0

25% 2.0 3.0 4.0 5.0

50% 4.0 5.0 6.0 7.0

75% 6.0 7.0 8.0 9.0

max 8.0 9.0 10.0 11.0

count 3.0

mean 6.0

std 4.0

min 2.0

25% 4.0

50% 6.0

75% 8.0

max 10.0

Name: 2, dtype: float64

累计统计

前n个元素累计求和,运算 窗口计算:滚动计算

import numpy as np

import pandas as pd

b = pd.DataFrame(np.arange(12).reshape(3,4), index=["a","b","c"])

print(b)

print(b.cumsum())# 默认0轴

print(b.cummin())

print(b.rolling(2).sum())# 凑不够相邻元素的就NaN,

0 1 2 3

a 0 1 2 3

b 4 5 6 7

c 8 9 10 11

0 1 2 3

a 0 1 2 3

b 4 6 8 10

c 12 15 18 21

0 1 2 3

a 0 1 2 3

b 0 1 2 3

c 0 1 2 3

0 1 2 3

a NaN NaN NaN NaN

b 4.0 6.0 8.0 10.0

c 12.0 14.0 16.0 18.0

数据的相关分析

协方差>0,正相关:.cov(),协方差矩阵 Pearson相关系数:.corr(),相关系矩阵

相关文章

评论可见,请评论后查看内容,谢谢!!!
 您阅读本篇文章共花了: