6. Numpy

네이버 부스트캠프 AI Tech/Pre-course

6. Numpy

중앙백 2021. 12. 6. 19:50

Numpy

- Numerical Python

- 파이썬의 고성능 과학 계산용 패키지

- Matrix와 Vector와 같은 Array 연산의 사실상의 표준

* 특징

- 일반 list에 비해 빠르고 메모리 효율적

- 반복문 없이 데이터 배열에 대한 처리를 지원함

- 선형대수와 관련된 다양한 기능을 제공함

- C, C++, 포트란 등의 언어와 통합 가능

* Install

activate ml
conda install numpy

- Windows 환경에선 conda로 패키지 관리 필요

- jupyter 등을 설치한 상태에서는 추가 설치 필요 없음

1. ndarray

import numpy as np

- numpy의 호출 방법

- 일반적으로 numpy는 np라는 alias(별칭) 이용해서 호출함

* array creation

test_array = np.array([1, 4, 5, 8], float)    # float : dtype
print(test_array)
type(test_array[3])

- numpy는 np.array 함수를 활용 배열을 생성함 ->ndarray 객체

- numpy는 하나의 데이터 type만 배열에 넣을 수 있음

- List와 가장 큰 차이점 : dynamic typing not supported

- C의 Array를 사용하여 배열을 생성함

- shape : numpy array의 dimension 구성을 반환함

- dtype : numpy array의 데이터 type을 반환함

test_array = np.array([1, 4, 5, "8"], float)    # String Type의 데이터를 입력해도
print(test_array)
print(type(test_array[3]))    # Float Type으로 자동 형변환을 실시
print(test_array.dtype)    # Array 전체의 데이터 Type을 반환함
print(test_array.shape)    # Array의 shape을 반환함 : (4,)

- array의 RANK에 따라 불리는 이름이 있음

Rank	Name	Example
0	scalar	7
1	vector	[10, 10]
2	matrix	[[10, 10], [15, 15]]
3	3-tensor	[[[1,2],[3,4]],[[5,6],[7,8]]]
n	n-tensor

- ndim - number of dimensions

- size - data의 개수(element 개수)

- nbytes - ndarray object의 메모리 크기를 반환함

2. Handling shape

* reshape

- Array의 shape의 크기를 변경함, element의 개수는 동일. (예) (2,4) -> (8,)

test_matrix = [[1,2,3,4],[1,2,5,8]]
np.array(test_matrix).shape    # (2,4)
np.array(test_matrix).reshape(8,)   # array([1,2,3,4,1,2,5,8])
np.array(test_matrix).reshape(8,).shape    # (8,)
np.array(test_matrix).reshape(-1,2).shpae    # (4,2) : -1은 size를 기반으로 row개수 설정
np.array(test_matrix).reshape(2,2,2)    # array([[[1,2],[3,4]],[[1,2],[5,8]]])

* flatten : 다차원 array를 1차원 array로 변환

test_matrix = [[[1,2,3,4],[1,2,5,8]],[[1,2,3,4],[1,2,5,8]]]
np.array(test_matrix).flatten().size    # (16,)

3. Indexing & slicing

(1) Indexing for numpy array

- list와 달리 이차원 배열에서 [0,0] 표기법을 제공함

- matrix일 경우 앞은 row 뒤는 column을 의미함

a = np.array([[1,2,3],[4.5,5,6]],int)
print(a)
print(a[0,0])
print(a[0][0])

a[0,0] = 12
print(a)
a[0][0] = 5
print(a)

* slicing for numpy array

- list와 달리 행과 열 부분을 나눠서 slicing이 가능함

- matrix의 부분 집합을 추출할 때 유용함

a = np.array([[1,2,3,4,5],[6,7,8,9,10]],int)
a[:,2:]    # 전체 Row의 2열 이상
a[1,1:3]    # 1 Row의 1열~2열
a[1:3]    # 1Row ~ 2Row의 전체

4. creation function

* arange : array의 범위를 지정하여 값의 list를 생성하는 명령어

np.arrange(30)    # range : list의 range와 같은 효과, interger로 0 ~ 29까지 배열 추출
np.arrange(0,5,0.5)   # floating point도 표시 가능함
np.arrange(30).reshape(5,6)

* ones, zeros and empty

- zeros - 0으로 가득찬 ndarray 생성

- ones - 1로 가득찬 ndarray 생성

- empty - shape만 주어지고 비어있는 ndarray 생성(memory initialization이 되지 않음)

np.zeros(shape=(10,),dtype=np.int8)    # np.zeros(shape, dtype, order)
np.zeros((2,5))

np.empty(shape=(10,),dtype=np.int8)
np.empty((3,5))

* something_like : 기존 ndarray의 shape 크기 만큼 1,0 또는 empty array를 반환

test_matrix = np.arrange(30).reshape(5,6)
np.ones_like(test_matrix)

* identity : 단위 행렬(i 행렬)을 생성함

np.identity(n=3, dtype=np.int8)
np.identity(5)

* eye : 대각선이 1인 행렬, k값의 시작 index의 변경이 가능

np.eye(3)    # 단위행렬과 동일
np.eye(3,5,k=2)
np.eye(N=3, M=5, dtype=np.int8)

* diag : 대각 행렬의 값을 추출함

matrix = np.arange(9).reshape(3,3)
np.diag(matrix)    # array([0,4,8])

np.diag(matrix, k=1)    # k : start index   array([1,5])

* random sampling : 데이터 분포에 따른 sampling으로 array를 생성

np.random.uniform(0,1,10).reshape(2,5)    # 균등분포  # (시작값, 끝값, 데이터 개수)
np.random.normal(0,1,10).reshape(2,5)    # 정규분포

5. operation functions

* sum : ndarray의 element들 간의 합을 구함

* axis : 모든 operation function을 실행할 때 기준이 되는 dimension 축

(예) (3, 4)면 3이 axis=0, 4가 axis=1.

test_array = np.arange(1, 13).reshape(3, 4)
test_array.sum(dtype=np.float)
test_array.sum(axis=1)    # array([10, 26, 42])
test_array.sum(axis=0)    # array([15, 18, 21, 24])

* mean & std : ndarray의 element들 간의 평균 또는 표준 편차를 반환

* 그 외에도 다양한 수학 연산자를 제공함 (np.something 호출)

* concatenate : numpy array를 합치는(붙이는) 함수

- vstack : vector를 위아래 붙이기

- hstack : 세로 벡터를 좌우로 붙이기

- concatenate : numpy array를 붙이는 함수

(참고) newaxis : 축 하나 추가

b = np.array([5, 6])
b = b[np.newaxis,:]    # array([[5,6]])

6. array operations

* Operations b/t arrays : numpy는 array간의 기본적인 사칙 연산을 지원함

- Element-wise operations : array간 shape가 같을 때 일어나는 연산

(예) + , - , * 는 같은 포지션의 원소끼리만 연산

* Dot product : Matrix의 기본 연산, dot 함수 사용

test_a = np.arange(1,7).reshape(2,3)
test_b = np.arange(7,13).reshape(3,2)
test_a.dot(test_b)    # array([[58, 64],[139,154]])

* transpose : transpose 또는 T attribute 사용

test_a = np.arange(1,7).reshape(2,3)
test_a.transpose()   # 혹은 test_a.T

* broadcasting : Shape이 다른 배열 간 연산을 지원하는 기능

test_matrix = np.array([[1,2,3],[4,5,6]],float)
scalar = 3
test_matrix + scalar    # array([[4.,5.,6.],[7.,8.,9.]])

- Scalar - vector 외에도 vector - matrix 간의 연산도 지원

7. comparisons

* All & Any : array 데이터 전부(and) 또는 일부(or)가 조건에 만족 여부 반환

a = np.arange(10)
a < 4    # array([True,True,True,True,False,False,False,False,False,False])
np.any(a>5),np.any(a<0)    # (True, False)
np.all(a>5),np.all(a<10)   # (False, True)

* comparison operation

- numpy는 배열의 크기가 동일할 때 element간 비교 결과를 Boolean type으로 변환

test_a = np.array([1,3,0],float)
test_b = np.array([5,2,1],float)
test_a > test_b    # array([False, True, False], dtype=bool)
(test_a > test_b).any()    # True

a = np.array([1,3,0],float)
np.logical_and(a>0,a<3)    # and 조건의 condition # array([True,False,False],dtype=bool)

b = np.array([True,False,True],bool)
np.logical_not(b)     # not 조건의 condition  # array([False,True,False],dtype=bool)

c = np.array([False,True,False],bool)
np.logical_or(b,c)    # OR 조건의 condition #array([True,True,True],dtype=bool)

* np.where

# 두 가지 용법이 있음
np.where(a > 0, 3, 2) # where(condition, True, False)
                      # array([3,3,2]) # True면 3, False면 2에 대응

a = np.arange(10)
np.where(a>5)    # Index 값 반환  # array([6,7,8,9])

a = np.array([1,np.NaN,np.Inf],float)   
np.isnan(a)    # array([False,True,False],dtype=bool) # Not a Number
np.isfinite(a)    # array([True,False,False], dtype=bool)  # is finite number

* argmax & argmin

- array내 최대값 또는 최소값의 index를 반환함

a = np.array([1,2,4,5,8,78,23])
np.argmax(a), np.argmin(a)    # (5, 0)

- axis 기반의 반환

a = np.array([[1,2,4,7],[9,88,6,45],[9,76,3,4]])
np.argmax(a, axis=1), np.argmin(a, axis=0)    # (array([3,1,1]),array([0,0,2,2])

- .argsort() : 정렬해서 index반환

8. boolean & fancy index

* boolean index

- 특정 조건에 따른 값을 배열 형태로 추출

- Comparison operation 함수들도 모두 사용 가능

test_array = np.array([1,4,0,2,3,8,9,7],float)
test_array > 3 # array({F,T,F,F,f,t,T,T],dtype=bool)
test_array[test_array>3] # 조건이 True인 index의 element만 추출 # array([4.,8.,9.,7.])

* fancy index

- numpy는 array를 index value로 사용해서 값 추출

a = np.array([2,4,6,8],float)
b = np.array([0,0,1,3,2,1], int) # 반드시 integer로 선언
a[b]  # bracket index, b배열의 값을 index로 하여 a의 값들을 추출함
      # array9{2.,2.,4.,8.,6.,4.])
a.take(b)  # take함수 :bracket index와 같은 효과
           # array([2., 2., 4., 8., 6., 4.])

- matrix 형태의 데이터도 가능

a = np.array([[1,4,],[9,16]],float)
b = np.array([0,0,1,1,0],int)
c = np.array([0,1,1,1,1],int)
a[b,c] # b를 row index, c를 column index로 변환하여 표시함
       # array([1.,4.,16.,16.,4.])

9. numpy data i/o

* loadtxt & savetxt : text type의 데이터를 읽고 저장하는 기능