A brief introduction to some of the scientific computing used in this course. In particular the NumPy scientific computing package and its use with python.

1 Outline

1.1 Goals

In this lab, you will review the features of NumPy and Python that are used in Course 1.

1.2 Useful References

NumPy Documentation including a basic introduction: NumPy.orgA challenging feature topic: NumPy Broadcasting

2 Python and NumPy

Python是我们本课程使用的编程语言,其有一系列的数据类型和算术运算;NumPy是一个库,扩展了Python的基本功能,以添加更丰富的数据集,包括更多的数据类型、向量、矩阵和许多矩阵函数   二者可以无缝衔接协同工作,Python的算术运算符可以处理NumPy的数据类型,许多NumPy函数可以接受Python的数据类型

import numpy as np # it is an unofficial standard to use np for numpy

import time

3 Vector

3.1 Abstract

vector是有序的数字数组,用小写粗体字母表示

x

\mathbf{x}

xvector中的元素都是相同类型,不能同时包含字符和数字vector中元素的数量通常被称为维度,数学家称其为秩vector中的索引为0至n - 1,可以用索引进行引用,单独引用时会写在下标,如

x

0

x_0

x0​ ,此时不加粗

3.2 NumPy Arrays

NumPy的基本数据结构是一个可索引的n维数组(n-demensional array),包含相同类型(dtype)的元素 上面,维度指向量中元素的数量,这里指数组的索引数量 一维数组1-D array有一个索引,在course 1中,将vectors表示为NumPy的1-D arrays

1-D array, shape (n,): n elements indexed [0] through [n-1]

3.3 Vector Creation

NumPy的数据创建通常会由第一个参数,代表对象的shape,this can either be a single value for a 1-D result or a tuple (n,m,…) specifying the shape of the result.

# NumPy routines which allocate memory and fill arrays with value

a = np.zeros(4); print(f"np.zeros(4) : a = {a}, a shape = {a.shape}, a data type = {a.dtype}")

a = np.zeros((4,)); print(f"np.zeros(4,) : a = {a}, a shape = {a.shape}, a data type = {a.dtype}")

a = np.random.random_sample(4); print(f"np.random.random_sample(4): a = {a}, a shape = {a.shape}, a data type = {a.dtype}")

输出如下

np.zeros(4) : a = [0. 0. 0. 0.], a shape = (4,), a data type = float64

np.zeros(4,) : a = [0. 0. 0. 0.], a shape = (4,), a data type = float64

np.random.random_sample(4): a = [0.38919476 0.38019795 0.86953179 0.1653972 ], a shape = (4,), a data type = float64

Some data creation routines do not take a shape tuple.

# NumPy routines which allocate memory and fill arrays with value but do not accept shape as input argument

a = np.arange(4.); print(f"np.arange(4.): a = {a}, a shape = {a.shape}, a data type = {a.dtype}")

a = np.random.rand(4); print(f"np.random.rand(4): a = {a}, a shape = {a.shape}, a data type = {a.dtype}")

输出如下

np.arange(4.): a = [0. 1. 2. 3.], a shape = (4,), a data type = float64

np.random.rand(4): a = [0.56777089 0.44204559 0.45052726 0.41138661], a shape = (4,), a data type = float64

values can be specified manually as well.

# NumPy routines which allocate memory and fill with user specified values

a = np.array([5,4,3,2]); print(f"np.array([5,4,3,2]): a = {a}, a shape = {a.shape}, a data type = {a.dtype}")

a = np.array([5.,4,3,2]); print(f"np.array([5.,4,3,2]): a = {a}, a shape = {a.shape}, a data type = {a.dtype}")

输出如下

np.array([5,4,3,2]): a = [5 4 3 2], a shape = (4,), a data type = int32

np.array([5.,4,3,2]): a = [5. 4. 3. 2.], a shape = (4,), a data type = float64

这些都是创建一个具有四个元素的one-dimensional vector的方法 a.shape返回尺寸,返回数据类型为tuple,对于n行m列的数组,返回值是 (n, m) 此处 a,shape = (4, ) 表示一个具有四个元素的一维数组

3.4 Operations on Vectors

Let’s explore some operations using vectors.

3.4.1 Indexing

可以通过索引和切片来访问向量的元素,NumPy提供了一套非常完整的索引和切片功能 在这里只探索课程所需的基础知识有关更多详细信息,请参阅Slicing and Indexing

索引是指通过数组中某个元素的位置来引用该元素,切片意味着根据元素的索引从数组中获取元素的子集

NumPy从零开始索引,因此向量

a

\mathbf{a}

a 的第三个元素是a[2]

#vector indexing operations on 1-D vectors

a = np.arange(10)

print(a)

#access an element

print(f"a[2].shape: {a[2].shape} a[2] = {a[2]}, Accessing an element returns a scalar")

# access the last element, negative indexes count from the end

# -2是8,倒着来

print(f"a[-1] = {a[-1]}")

#indexs must be within the range of the vector or they will produce and error

try:

c = a[10]

except Exception as e:

print("The error message you'll see is:")

print(e)

输出如下

[0 1 2 3 4 5 6 7 8 9]

a[2].shape: () a[2] = 2, Accessing an element returns a scalar

a[-1] = 9

The error message you'll see is:

index 10 is out of bounds for axis 0 with size 10

3.4.2 Slicing

切片使用一组三个值(start: stop: step)创建一个索引数组,只有start / stop也是有效的

#vector slicing operations

a = np.arange(10)

print(f"a = {a}")

#access 5 consecutive elements (start:stop:step)

c = a[2:7:1]; print("a[2:7:1] = ", c)

# access 3 elements separated by two

c = a[2:7:2]; print("a[2:7:2] = ", c)

# access all elements index 3 and above

c = a[3:]; print("a[3:] = ", c)

# access all elements below index 3

c = a[:3]; print("a[:3] = ", c)

# access all elements

c = a[:]; print("a[:] = ", c)

输出如下

a = [0 1 2 3 4 5 6 7 8 9]

a[2:7:1] = [2 3 4 5 6]

a[2:7:2] = [2 4 6]

a[3:] = [3 4 5 6 7 8 9]

a[:3] = [0 1 2]

a[:] = [0 1 2 3 4 5 6 7 8 9]

3.4.3 Single Vector Operations

有许多有用的operations涉及对单个向量的操作

a = np.array([1,2,3,4])

print(f"a : {a}")

# negate elements of a

b = -a

print(f"b = -a : {b}")

# sum all elements of a, returns a scalar

b = np.sum(a)

print(f"b = np.sum(a) : {b}")

b = np.mean(a)

print(f"b = np.mean(a): {b}")

b = a**2

print(f"b = a**2 : {b}")

输出如下

a : [1 2 3 4]

b = -a : [-1 -2 -3 -4]

b = np.sum(a) : 10

b = np.mean(a): 2.5

b = a**2 : [ 1 4 9 16]

3.4.4 Vector Vector Element-wise Operations

大多数NumPy算术、逻辑和比较运算也适用于向量,这些操作符对逐个元素进行操作,如

a

+

b

=

i

=

0

n

1

a

i

+

b

i

\mathbf{a} + \mathbf{b} = \sum_{i=0}^{n-1} a_i + b_i

a+b=i=0∑n−1​ai​+bi​

a = np.array([ 1, 2, 3, 4])

b = np.array([-1,-2, 3, 4])

print(f"Binary operators work element wise: {a + b}")

输出如下

Binary operators work element wise: [0 0 6 8]

为了保证运算正确,进行运算的向量必须是相同大小的

#try a mismatched vector operation

c = np.array([1, 2])

try:

d = a + c

except Exception as e:

print("The error message you'll see is:")

print(e)

输出如下

The error message you'll see is:

operands could not be broadcast together with shapes (4,) (2,)

3.4.5 Scalar Vector Operations

vectors可以通过标量值进行缩放,标量值只是一个数字,乘以vectors的所有元素

a = np.array([1, 2, 3, 4])

# multiply a by a scalar

b = 5 * a

print(f"b = 5 * a : {b}")

输出如下

b = 5 * a : [ 5 10 15 20]

3.4.6 Vector Vector Dot Product

点积是线性代数和NumPy的主要内容,是本课程中广泛使用的一个操作

点积将两个vectors中的值逐元素相乘并对结果求和,要求两个vectors的尺寸相同 使用for循环,实现一个返回两个vectors点积的函数,the function to return given inputs

a

a

a and

b

b

b:

x

=

i

=

0

n

1

a

i

b

i

x = \sum_{i=0}^{n-1} a_i b_i

x=i=0∑n−1​ai​bi​ Assume both a and b are the same shape.

def my_dot(a, b):

"""

Compute the dot product of two vectors

Args:

a (ndarray (n,)): input vector

b (ndarray (n,)): input vector with same dimension as a

Returns:

x (scalar):

"""

x=0

for i in range(a.shape[0]):

x = x + a[i] * b[i]

return x

# test 1-D

a = np.array([1, 2, 3, 4])

b = np.array([-1, 4, 3, 2])

print(f"my_dot(a, b) = {my_dot(a, b)}")

输出如下

my_dot(a, b) = 24

注意,点积应返回标量值 尝试使用np,dot来完成点积操作

# test 1-D

a = np.array([1, 2, 3, 4])

b = np.array([-1, 4, 3, 2])

c = np.dot(a, b)

print(f"NumPy 1-D np.dot(a, b) = {c}, np.dot(a, b).shape = {c.shape} ")

c = np.dot(b, a)

print(f"NumPy 1-D np.dot(b, a) = {c}, np.dot(a, b).shape = {c.shape} ")

输出如下

NumPy 1-D np.dot(a, b) = 24, np.dot(a, b).shape = ()

NumPy 1-D np.dot(b, a) = 24, np.dot(a, b).shape = ()

结果相同

3.4.7 The Need for Speed: Vector vs For-loop

使用NumPy库是因为其提高了速度和内存效率,演示如下

np.random.seed(1)

a = np.random.rand(10000000) # very large arrays

b = np.random.rand(10000000)

tic = time.time() # capture start time

c = np.dot(a, b)

toc = time.time() # capture end time

print(f"np.dot(a, b) = {c:.4f}")

print(f"Vectorized version duration: {1000*(toc-tic):.4f} ms ")

tic = time.time() # capture start time

c = my_dot(a,b)

toc = time.time() # capture end time

print(f"my_dot(a, b) = {c:.4f}")

print(f"loop version duration: {1000*(toc-tic):.4f} ms ")

del(a);del(b) #remove these big arrays from memory

输出如下

np.dot(a, b) = 2501072.5817

Vectorized version duration: 1107.5144 ms

my_dot(a, b) = 2501072.5817

loop version duration: 4505.1224 ms

因此在本例中,矢量化提供了很大的速度提升,这是因为NumPy在底层硬件对可用的数据并行性进行了更好地利用 GPU和现代CPU实现单指令多数据(SIMD)管道,允许并行发布多个操作,这在数据集通常非常大的机器学习中至关重要

3.4.8 Vector Vector Operations in Course 1

Vector Vector operations will appear frequently in course 1. 下面是原因:

接下来,我们的例子将存储在一个数组中,X_train of dimension (m,n). 需要注意的是这是一个二维数组或矩阵w will be a 1-dimensional vector of shape (n,).我们将通过循环遍历示例来执行操作,通过索引X来提取每个示例以单独处理,例如X[i]X[i]返回 a value of shape (n,), a 1-dimensional vector. 因此涉及X[i]的运算通常是vector-vector.

# show common Course 1 example

X = np.array([[1],[2],[3],[4]])

w = np.array([2])

c = np.dot(X[1], w)

print(f"X[1] has shape {X[1].shape}")

print(f"w has shape {w.shape}")

print(f"c has shape {c.shape}")

输出如下

X[1] has shape (1,)

w has shape (1,)

c has shape ()

Matrices

4.1 Abstract

矩阵是二维数组,用大写粗体字母表示

X

\mathbf{X}

X,元素均为同一类型 在Lab中,m通常是行数,n通常是列数,矩阵中的元素可以用二维索引来引用

4.2 NumPy Arrays

Matrices have a two-dimensional (2-D) index [m,n]. 在Course 1中,2-D matrices用来保存训练数据 Training data is m examples by n features creating an (m,n) array. Course 1不直接对矩阵进行运算,但通常提取一个例子作为向量并对其进行运算

4.3 Matrix Creation

The same functions that created 1-D vectors will create 2-D or n-D arrays. Below, the shape tuple is provided to achieve a 2-D result. 请注意NumPy是如何使用括号来表示每个维度的,在打印时,NumPy将每行分别打印一行

a = np.zeros((1, 5))

print(f"a shape = {a.shape}, a = {a}")

a = np.zeros((2, 1))

print(f"a shape = {a.shape}, a = {a}")

a = np.random.random_sample((1, 1))

print(f"a shape = {a.shape}, a = {a}")

输出如下

a shape = (1, 5), a = [[0. 0. 0. 0. 0.]]

a shape = (2, 1), a = [[0.]

[0.]]

a shape = (1, 1), a = [[0.44236513]]

也可以手动指定数据,尺寸是用额外的括号指定的,与上面打印的格式相匹配

# NumPy routines which allocate memory and fill with user specified values

a = np.array([[5], [4], [3]]); print(f" a shape = {a.shape}, np.array: a = {a}")

a = np.array([[5], # One can also

[4], # separate values

[3]]); #into separate rows

print(f" a shape = {a.shape}, np.array: a = {a}")

输出如下

a shape = (3, 1), np.array: a = [[5]

[4]

[3]]

a shape = (3, 1), np.array: a = [[5]

[4]

[3]]

4.4 Operations on Matrices

Let’s explore some operations using matrices.

4.4.1 Indexing

矩阵包括第二个索引,这两个索引描述[row, column],访问可以返回一个元素,也可以返回一行/列

#vector indexing operations on matrices

a = np.arange(6).reshape(-1, 2) #reshape is a convenient way to create matrices

print(f"a.shape: {a.shape}, \na= {a}")

#access an element

print(f"\na[2,0].shape: {a[2, 0].shape}, a[2,0] = {a[2, 0]}, type(a[2,0]) = {type(a[2, 0])} Accessing an element returns a scalar\n")

#access a row

print(f"a[2].shape: {a[2].shape}, a[2] = {a[2]}, type(a[2]) = {type(a[2])}")

输出如下

a.shape: (3, 2),

a= [[0 1]

[2 3]

[4 5]]

a[2,0].shape: (), a[2,0] = 4, type(a[2,0]) = Accessing an element returns a scalar

a[2].shape: (2,), a[2] = [4 5], type(a[2]) =

最后一个例子,仅通过指定行访问将返回一个1-D vector

Reshape

The previous example used reshape to shape the array. a = np.arange(6).reshape(-1, 2) This line of code first created a 1-D Vector of six elements. It then reshaped that vector into a 2-D array using the reshape command. This could have been written: a = np.arange(6).reshape(3, 2) To arrive at the same 3 row, 2 column array. The -1 argument tells the routine to compute the number of rows given the size of the array and the number of columns.

4.4.2 Slicing

切片使用一组三个值(start: stop: step)创建一个索引数组,只有start / stop也是有效的

#vector 2-D slicing operations

a = np.arange(20).reshape(-1, 10)

print(f"a = \n{a}")

#access 5 consecutive elements (start:stop:step)

print("a[0, 2:7:1] = ", a[0, 2:7:1], ", a[0, 2:7:1].shape =", a[0, 2:7:1].shape, "a 1-D array")

#access 5 consecutive elements (start:stop:step) in two rows

print("a[:, 2:7:1] = \n", a[:, 2:7:1], ", a[:, 2:7:1].shape =", a[:, 2:7:1].shape, "a 2-D array")

# access all elements

print("a[:,:] = \n", a[:,:], ", a[:,:].shape =", a[:,:].shape)

# access all elements in one row (very common usage)

print("a[1,:] = ", a[1,:], ", a[1,:].shape =", a[1,:].shape, "a 1-D array")

# same as

print("a[1] = ", a[1], ", a[1].shape =", a[1].shape, "a 1-D array")

输出如下

a =

[[ 0 1 2 3 4 5 6 7 8 9]

[10 11 12 13 14 15 16 17 18 19]]

a[0, 2:7:1] = [2 3 4 5 6] , a[0, 2:7:1].shape = (5,) a 1-D array

a[:, 2:7:1] =

[[ 2 3 4 5 6]

[12 13 14 15 16]] , a[:, 2:7:1].shape = (2, 5) a 2-D array

a[:,:] =

[[ 0 1 2 3 4 5 6 7 8 9]

[10 11 12 13 14 15 16 17 18 19]] , a[:,:].shape = (2, 10)

a[1,:] = [10 11 12 13 14 15 16 17 18 19] , a[1,:].shape = (10,) a 1-D array

a[1] = [10 11 12 13 14 15 16 17 18 19] , a[1].shape = (10,) a 1-D array

Congratulations!

In this lab you mastered the features of Python and NumPy that are needed for Course 1.

精彩内容

评论可见,请评论后查看内容,谢谢!!!
 您阅读本篇文章共花了: