开发语言机器学习笔记经验分享 Optional Lab: Python, NumPy and Vectorization

A brief introduction to some of the scientific computing used in this course. In particular the NumPy scientific computing package and its use with python.

1 Outline

1.1 Goals

In this lab, you will review the features of NumPy and Python that are used in Course 1.

1.2 Useful References

NumPy Documentation including a basic introduction: NumPy.orgA challenging feature topic: NumPy Broadcasting

2 Python and NumPy

Python是我们本课程使用的编程语言，其有一系列的数据类型和算术运算；NumPy是一个库，扩展了Python的基本功能，以添加更丰富的数据集，包括更多的数据类型、向量、矩阵和许多矩阵函数二者可以无缝衔接协同工作，Python的算术运算符可以处理NumPy的数据类型，许多NumPy函数可以接受Python的数据类型

import numpy as np # it is an unofficial standard to use np for numpy

import time

3 Vector

3.1 Abstract

vector是有序的数字数组，用小写粗体字母表示

\mathbf{x}

xvector中的元素都是相同类型，不能同时包含字符和数字vector中元素的数量通常被称为维度，数学家称其为秩vector中的索引为0至n - 1，可以用索引进行引用，单独引用时会写在下标，如

x_0

x0 ，此时不加粗

3.2 NumPy Arrays

NumPy的基本数据结构是一个可索引的n维数组（n-demensional array），包含相同类型（dtype）的元素上面，维度指向量中元素的数量，这里指数组的索引数量一维数组1-D array有一个索引，在course 1中，将vectors表示为NumPy的1-D arrays

1-D array, shape (n,): n elements indexed [0] through [n-1]

3.3 Vector Creation

NumPy的数据创建通常会由第一个参数，代表对象的shape，this can either be a single value for a 1-D result or a tuple (n,m,…) specifying the shape of the result.

# NumPy routines which allocate memory and fill arrays with value

a = np.zeros(4); print(f"np.zeros(4) : a = {a}, a shape = {a.shape}, a data type = {a.dtype}")

a = np.zeros((4,)); print(f"np.zeros(4,) : a = {a}, a shape = {a.shape}, a data type = {a.dtype}")

a = np.random.random_sample(4); print(f"np.random.random_sample(4): a = {a}, a shape = {a.shape}, a data type = {a.dtype}")

输出如下

np.zeros(4) : a = [0. 0. 0. 0.], a shape = (4,), a data type = float64

np.zeros(4,) : a = [0. 0. 0. 0.], a shape = (4,), a data type = float64

np.random.random_sample(4): a = [0.38919476 0.38019795 0.86953179 0.1653972 ], a shape = (4,), a data type = float64

Some data creation routines do not take a shape tuple.

# NumPy routines which allocate memory and fill arrays with value but do not accept shape as input argument

a = np.arange(4.); print(f"np.arange(4.): a = {a}, a shape = {a.shape}, a data type = {a.dtype}")

a = np.random.rand(4); print(f"np.random.rand(4): a = {a}, a shape = {a.shape}, a data type = {a.dtype}")

输出如下

np.arange(4.): a = [0. 1. 2. 3.], a shape = (4,), a data type = float64

np.random.rand(4): a = [0.56777089 0.44204559 0.45052726 0.41138661], a shape = (4,), a data type = float64

values can be specified manually as well.

# NumPy routines which allocate memory and fill with user specified values

a = np.array([5,4,3,2]); print(f"np.array([5,4,3,2]): a = {a}, a shape = {a.shape}, a data type = {a.dtype}")

a = np.array([5.,4,3,2]); print(f"np.array([5.,4,3,2]): a = {a}, a shape = {a.shape}, a data type = {a.dtype}")

输出如下

np.array([5,4,3,2]): a = [5 4 3 2], a shape = (4,), a data type = int32

np.array([5.,4,3,2]): a = [5. 4. 3. 2.], a shape = (4,), a data type = float64

这些都是创建一个具有四个元素的one-dimensional vector的方法 a.shape返回尺寸，返回数据类型为tuple，对于n行m列的数组，返回值是 (n, m) 此处 a,shape = (4, ) 表示一个具有四个元素的一维数组

3.4 Operations on Vectors

Let’s explore some operations using vectors.

3.4.1 Indexing

可以通过索引和切片来访问向量的元素，NumPy提供了一套非常完整的索引和切片功能在这里只探索课程所需的基础知识有关更多详细信息，请参阅Slicing and Indexing

索引是指通过数组中某个元素的位置来引用该元素，切片意味着根据元素的索引从数组中获取元素的子集

NumPy从零开始索引，因此向量

\mathbf{a}

a 的第三个元素是a[2]

#vector indexing operations on 1-D vectors

a = np.arange(10)

print(a)

#access an element

print(f"a[2].shape: {a[2].shape} a[2] = {a[2]}, Accessing an element returns a scalar")

# access the last element, negative indexes count from the end

# -2是8，倒着来

print(f"a[-1] = {a[-1]}")

#indexs must be within the range of the vector or they will produce and error

try:

c = a[10]

except Exception as e:

print("The error message you'll see is:")

print(e)

输出如下

[0 1 2 3 4 5 6 7 8 9]

a[2].shape: () a[2] = 2, Accessing an element returns a scalar

a[-1] = 9

The error message you'll see is:

index 10 is out of bounds for axis 0 with size 10

3.4.2 Slicing

切片使用一组三个值(start: stop: step)创建一个索引数组，只有start / stop也是有效的

#vector slicing operations

a = np.arange(10)

print(f"a = {a}")

#access 5 consecutive elements (start:stop:step)

c = a[2:7:1]; print("a[2:7:1] = ", c)

# access 3 elements separated by two

c = a[2:7:2]; print("a[2:7:2] = ", c)

# access all elements index 3 and above

c = a[3:]; print("a[3:] = ", c)

# access all elements below index 3

c = a[:3]; print("a[:3] = ", c)

# access all elements

c = a[:]; print("a[:] = ", c)

输出如下

a = [0 1 2 3 4 5 6 7 8 9]

a[2:7:1] = [2 3 4 5 6]

a[2:7:2] = [2 4 6]

a[3:] = [3 4 5 6 7 8 9]

a[:3] = [0 1 2]

a[:] = [0 1 2 3 4 5 6 7 8 9]

3.4.3 Single Vector Operations

有许多有用的operations涉及对单个向量的操作

a = np.array([1,2,3,4])

print(f"a : {a}")

# negate elements of a

b = -a

print(f"b = -a : {b}")

# sum all elements of a, returns a scalar

b = np.sum(a)

print(f"b = np.sum(a) : {b}")

b = np.mean(a)

print(f"b = np.mean(a): {b}")

b = a**2

print(f"b = a**2 : {b}")

输出如下

a : [1 2 3 4]

b = -a : [-1 -2 -3 -4]

b = np.sum(a) : 10

b = np.mean(a): 2.5

b = a**2 : [ 1 4 9 16]

3.4.4 Vector Vector Element-wise Operations

大多数NumPy算术、逻辑和比较运算也适用于向量，这些操作符对逐个元素进行操作，如

∑

−

\mathbf{a} + \mathbf{b} = \sum_{i=0}^{n-1} a_i + b_i

a+b=i=0∑n−1ai+bi

a = np.array([ 1, 2, 3, 4])

b = np.array([-1,-2, 3, 4])

print(f"Binary operators work element wise: {a + b}")

输出如下

Binary operators work element wise: [0 0 6 8]

为了保证运算正确，进行运算的向量必须是相同大小的

#try a mismatched vector operation

c = np.array([1, 2])

try:

d = a + c

except Exception as e:

print("The error message you'll see is:")

print(e)

输出如下

The error message you'll see is:

operands could not be broadcast together with shapes (4,) (2,)

3.4.5 Scalar Vector Operations

vectors可以通过标量值进行缩放，标量值只是一个数字，乘以vectors的所有元素

a = np.array([1, 2, 3, 4])

# multiply a by a scalar

b = 5 * a

print(f"b = 5 * a : {b}")

输出如下

b = 5 * a : [ 5 10 15 20]

3.4.6 Vector Vector Dot Product

点积是线性代数和NumPy的主要内容，是本课程中广泛使用的一个操作

点积将两个vectors中的值逐元素相乘并对结果求和，要求两个vectors的尺寸相同使用for循环，实现一个返回两个vectors点积的函数，the function to return given inputs

a and

∑

−

x = \sum_{i=0}^{n-1} a_i b_i

x=i=0∑n−1aibi Assume both a and b are the same shape.

def my_dot(a, b):

"""

Compute the dot product of two vectors

Args:

a (ndarray (n,)): input vector

b (ndarray (n,)): input vector with same dimension as a

Returns:

x (scalar):

"""

x=0

for i in range(a.shape[0]):

x = x + a[i] * b[i]

return x

# test 1-D

a = np.array([1, 2, 3, 4])

b = np.array([-1, 4, 3, 2])

print(f"my_dot(a, b) = {my_dot(a, b)}")

输出如下

my_dot(a, b) = 24

注意，点积应返回标量值尝试使用np,dot来完成点积操作

# test 1-D

a = np.array([1, 2, 3, 4])

b = np.array([-1, 4, 3, 2])

c = np.dot(a, b)

print(f"NumPy 1-D np.dot(a, b) = {c}, np.dot(a, b).shape = {c.shape} ")

c = np.dot(b, a)

print(f"NumPy 1-D np.dot(b, a) = {c}, np.dot(a, b).shape = {c.shape} ")

输出如下

NumPy 1-D np.dot(a, b) = 24, np.dot(a, b).shape = ()

NumPy 1-D np.dot(b, a) = 24, np.dot(a, b).shape = ()

结果相同

3.4.7 The Need for Speed: Vector vs For-loop

使用NumPy库是因为其提高了速度和内存效率，演示如下

np.random.seed(1)

a = np.random.rand(10000000) # very large arrays

b = np.random.rand(10000000)

tic = time.time() # capture start time

c = np.dot(a, b)

toc = time.time() # capture end time

print(f"np.dot(a, b) = {c:.4f}")

print(f"Vectorized version duration: {1000*(toc-tic):.4f} ms ")

tic = time.time() # capture start time

c = my_dot(a,b)

toc = time.time() # capture end time

print(f"my_dot(a, b) = {c:.4f}")

print(f"loop version duration: {1000*(toc-tic):.4f} ms ")

del(a);del(b) #remove these big arrays from memory

输出如下

np.dot(a, b) = 2501072.5817

Vectorized version duration: 1107.5144 ms

my_dot(a, b) = 2501072.5817

loop version duration: 4505.1224 ms

因此在本例中，矢量化提供了很大的速度提升，这是因为NumPy在底层硬件对可用的数据并行性进行了更好地利用 GPU和现代CPU实现单指令多数据（SIMD）管道，允许并行发布多个操作，这在数据集通常非常大的机器学习中至关重要

3.4.8 Vector Vector Operations in Course 1

Vector Vector operations will appear frequently in course 1. 下面是原因：

接下来，我们的例子将存储在一个数组中，X_train of dimension (m,n). 需要注意的是这是一个二维数组或矩阵w will be a 1-dimensional vector of shape (n,).我们将通过循环遍历示例来执行操作，通过索引X来提取每个示例以单独处理，例如X[i]X[i]返回 a value of shape (n,), a 1-dimensional vector. 因此涉及X[i]的运算通常是vector-vector.

# show common Course 1 example

X = np.array([[1],[2],[3],[4]])

w = np.array([2])

c = np.dot(X[1], w)

print(f"X[1] has shape {X[1].shape}")

print(f"w has shape {w.shape}")

print(f"c has shape {c.shape}")

输出如下

X[1] has shape (1,)

w has shape (1,)

c has shape ()

Matrices

4.1 Abstract

矩阵是二维数组，用大写粗体字母表示

\mathbf{X}

X，元素均为同一类型在Lab中，m通常是行数，n通常是列数，矩阵中的元素可以用二维索引来引用

4.2 NumPy Arrays

Matrices have a two-dimensional (2-D) index [m,n]. 在Course 1中，2-D matrices用来保存训练数据 Training data is m examples by n features creating an (m,n) array. Course 1不直接对矩阵进行运算，但通常提取一个例子作为向量并对其进行运算

4.3 Matrix Creation

The same functions that created 1-D vectors will create 2-D or n-D arrays. Below, the shape tuple is provided to achieve a 2-D result. 请注意NumPy是如何使用括号来表示每个维度的，在打印时，NumPy将每行分别打印一行

a = np.zeros((1, 5))

print(f"a shape = {a.shape}, a = {a}")

a = np.zeros((2, 1))

print(f"a shape = {a.shape}, a = {a}")

a = np.random.random_sample((1, 1))

print(f"a shape = {a.shape}, a = {a}")

输出如下

a shape = (1, 5), a = [[0. 0. 0. 0. 0.]]

a shape = (2, 1), a = [[0.]

[0.]]

a shape = (1, 1), a = [[0.44236513]]

也可以手动指定数据，尺寸是用额外的括号指定的，与上面打印的格式相匹配

# NumPy routines which allocate memory and fill with user specified values

a = np.array([[5], [4], [3]]); print(f" a shape = {a.shape}, np.array: a = {a}")

a = np.array([[5], # One can also

[4], # separate values

[3]]); #into separate rows

print(f" a shape = {a.shape}, np.array: a = {a}")

输出如下

a shape = (3, 1), np.array: a = [[5]

[4]

[3]]

a shape = (3, 1), np.array: a = [[5]

[4]

[3]]

4.4 Operations on Matrices

Let’s explore some operations using matrices.

4.4.1 Indexing

矩阵包括第二个索引,这两个索引描述[row, column]，访问可以返回一个元素，也可以返回一行/列

#vector indexing operations on matrices

a = np.arange(6).reshape(-1, 2) #reshape is a convenient way to create matrices

print(f"a.shape: {a.shape}, \na= {a}")

#access an element

print(f"\na[2,0].shape: {a[2, 0].shape}, a[2,0] = {a[2, 0]}, type(a[2,0]) = {type(a[2, 0])} Accessing an element returns a scalar\n")

#access a row

print(f"a[2].shape: {a[2].shape}, a[2] = {a[2]}, type(a[2]) = {type(a[2])}")

输出如下

a.shape: (3, 2),

a= [[0 1]

[2 3]

[4 5]]

a[2,0].shape: (), a[2,0] = 4, type(a[2,0]) = Accessing an element returns a scalar

a[2].shape: (2,), a[2] = [4 5], type(a[2]) =

最后一个例子，仅通过指定行访问将返回一个1-D vector

Reshape

The previous example used reshape to shape the array. a = np.arange(6).reshape(-1, 2) This line of code first created a 1-D Vector of six elements. It then reshaped that vector into a 2-D array using the reshape command. This could have been written: a = np.arange(6).reshape(3, 2) To arrive at the same 3 row, 2 column array. The -1 argument tells the routine to compute the number of rows given the size of the array and the number of columns.

4.4.2 Slicing

切片使用一组三个值(start: stop: step)创建一个索引数组，只有start / stop也是有效的

#vector 2-D slicing operations

a = np.arange(20).reshape(-1, 10)

print(f"a = \n{a}")

#access 5 consecutive elements (start:stop:step)

print("a[0, 2:7:1] = ", a[0, 2:7:1], ", a[0, 2:7:1].shape =", a[0, 2:7:1].shape, "a 1-D array")

#access 5 consecutive elements (start:stop:step) in two rows

print("a[:, 2:7:1] = \n", a[:, 2:7:1], ", a[:, 2:7:1].shape =", a[:, 2:7:1].shape, "a 2-D array")

# access all elements

print("a[:,:] = \n", a[:,:], ", a[:,:].shape =", a[:,:].shape)

# access all elements in one row (very common usage)

print("a[1,:] = ", a[1,:], ", a[1,:].shape =", a[1,:].shape, "a 1-D array")

# same as

print("a[1] = ", a[1], ", a[1].shape =", a[1].shape, "a 1-D array")

输出如下

a =

[[ 0 1 2 3 4 5 6 7 8 9]

[10 11 12 13 14 15 16 17 18 19]]

a[0, 2:7:1] = [2 3 4 5 6] , a[0, 2:7:1].shape = (5,) a 1-D array

a[:, 2:7:1] =

[[ 2 3 4 5 6]

[12 13 14 15 16]] , a[:, 2:7:1].shape = (2, 5) a 2-D array

a[:,:] =

[[ 0 1 2 3 4 5 6 7 8 9]

[10 11 12 13 14 15 16 17 18 19]] , a[:,:].shape = (2, 10)

a[1,:] = [10 11 12 13 14 15 16 17 18 19] , a[1,:].shape = (10,) a 1-D array

a[1] = [10 11 12 13 14 15 16 17 18 19] , a[1].shape = (10,) a 1-D array

Congratulations!

In this lab you mastered the features of Python and NumPy that are needed for Course 1.

精彩内容

评论可见，请评论后查看内容，谢谢！！！

您阅读本篇文章共花了：

金钥匙

开发语言机器学习笔记经验分享 Optional Lab: Python, NumPy and Vectorization

深度学习人工智能 Autoformer算法与代码分析

pytorch 深度学习人工智能详细区分.numpy()、.item()、.cpu()、.clone()、.detach()和.data的用法&& tensor类型的转换

发表评论取消回复

金钥匙

开发语言 机器学习 笔记 经验分享 Optional Lab: Python, NumPy and Vectorization

深度学习 人工智能 Autoformer算法与代码分析

pytorch 深度学习 人工智能 详细区分.numpy()、.item()、.cpu()、.clone()、.detach()和.data的用法&& tensor类型的转换

相关文章

发表评论取消回复

开发语言机器学习笔记经验分享 Optional Lab: Python, NumPy and Vectorization

深度学习人工智能 Autoformer算法与代码分析

pytorch 深度学习人工智能详细区分.numpy()、.item()、.cpu()、.clone()、.detach()和.data的用法&& tensor类型的转换