GridSearchCV로 하이퍼파라미터 튜닝하기

Hyperparameter Tuning in Python with GridSearchCV

Hyperparameter란

하이퍼 파라미터는 학습 알고리즘이 학습되기 전에 값이 선택되는 머신러닝 파라미터입니다. 하이퍼 파라미터는 매개 변수 와 혼동해서는 안 됩니다. 기계 학습에서 레이블 매개 변수는 훈련 중에 학습되는 값을 나타내는 변수를 식별하는 데 사용됩니다.

Hyperparameter 종류

Learning Rate.
Number of Epochs.
Momentum.
Regularization constant.
Number of branches in a decision tree.
Number of clusters in a clustering algorithm (like k-means).

GridSearchCV란

GridSearchCV는 sklearn의 model_selection 패키지의 구성원인 라이브러리 함수입니다. 미리 정의된 하이퍼 파라미터를 반복적으로 살펴보고 학습 세트에 estimator(모델)를 맞추는 데 도움이 됩니다. 따라서 나열된 하이퍼 파라미터에서 최적의 파라미터를 선택할 수 있습니다.

GridSearchCV는 사전에 전달된 값의 모든 조합을 시도하고 교차 검증 방법을 사용하여 각 조합에 대해 모델을 평가합니다. 따라서 이 기능을 사용하면 모든 하이퍼 파라미터 조합에 대해 정확도/손실을 얻을 수 있으며, 가장 성능이 좋은 것을 선택할 수 있습니다.

Library

1
2
3
4

import numpy as np
from sklearn.datasets import fetch_openml
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV

cs

MNIST 데이터 셋 로드

1
2
3
4
5

mnist = fetch_openml("mnist_784", version=1)
X, y = mnist['data'], mnist['target']
X = X.to_numpy()
y = y.to_numpy()
X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]

cs

학습 grid_search.fit(X_train, y_train)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

Fitting 3 folds for each of 24 candidates, totalling 72 fits
[CV 1/3; 1/24] START leaf_size=10, n_neighbors=1, weights=uniform...............
[CV 1/3; 1/24] END leaf_size=10, n_neighbors=1, weights=uniform;, score=(train=1.000, test=0.969) total time=  44.6s
[CV 2/3; 1/24] START leaf_size=10, n_neighbors=1, weights=uniform...............
[CV 2/3; 1/24] END leaf_size=10, n_neighbors=1, weights=uniform;, score=(train=1.000, test=0.967) total time=  44.4s
[CV 3/3; 1/24] START leaf_size=10, n_neighbors=1, weights=uniform...............
[CV 3/3; 1/24] END leaf_size=10, n_neighbors=1, weights=uniform;, score=(train=1.000, test=0.967) total time=  44.1s
[CV 1/3; 2/24] START leaf_size=10, n_neighbors=1, weights=distance..............
[CV 1/3; 2/24] END leaf_size=10, n_neighbors=1, weights=distance;, score=(train=1.000, test=0.969) total time=  43.4s
[CV 2/3; 2/24] START leaf_size=10, n_neighbors=1, weights=distance..............
[CV 2/3; 2/24] END leaf_size=10, n_neighbors=1, weights=distance;, score=(train=1.000, test=0.967) total time=  43.4s
[CV 3/3; 2/24] START leaf_size=10, n_neighbors=1, weights=distance..............
[CV 3/3; 2/24] END leaf_size=10, n_neighbors=1, weights=distance;, score=(train=1.000, test=0.967) total time=  43.5s
[CV 1/3; 3/24] START leaf_size=10, n_neighbors=3, weights=uniform...............
[CV 1/3; 3/24] END leaf_size=10, n_neighbors=3, weights=uniform;, score=(train=0.985, test=0.969) total time=  46.8s
[CV 2/3; 3/24] START leaf_size=10, n_neighbors=3, weights=uniform...............
[CV 2/3; 3/24] END leaf_size=10, n_neighbors=3, weights=uniform;, score=(train=0.985, test=0.968) total time=  46.9s
[CV 3/3; 3/24] START leaf_size=10, n_neighbors=3, weights=uniform...............
.......
.......
Colored by Color Scripter

cs

최적의 Hyperparameter 찾기

1
2
3
4

final_clf = grid_search.best_estimator_
final_clf
 
KNeighborsClassifier(n_neighbors=3, weights='distance')

cs

학습 (grid_search.fit(X_train, y_train)이 마치면 그 중 가장 학습 효과가 좋은 hyperparameter를 부를 수 있습니다.

위 코드의 결과는 KNeighborsClassifier(n_neighbors=3, weights = 'distance')이기 때문에 다음 학습을 진행할 때에는 아래와 같이 hyperparameter를 설정할 수 있습니다.

1
2
3

param_grid = [{
    "n_neighbors": [3], "weights": ["distance"]
}]
Colored by Color Scripter

cs

왜냐하면 이제 최적의 학습 결과를 가져다 주는 hyperparameter를 알기 때문이죠.
 

Full Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

import numpy as np
from sklearn.datasets import fetch_openml
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV
 
mnist = fetch_openml("mnist_784", version=1)
X, y = mnist['data'], mnist['target']
X = X.to_numpy()
y = y.to_numpy()
X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]
 
 
from sklearn.neighbors import KNeighborsClassifier
 
 
clf = KNeighborsClassifier()
 
param_grid = [{
    "n_neighbors": [1,3,5,7], "weights": ["uniform", "distance"], "leaf_size": [10, 30, 50]
}]
 
grid_search = GridSearchCV(clf, param_grid, cv=3, scoring="accuracy", return_train_score=True, verbose=10)
 
grid_search.fit(X_train, y_train)
Colored by Color Scripter

cs

'공부 > 파이썬 Python' 카테고리의 다른 글

Python 대역폭 모니터 만들어보기 (Bandwidth Monitor Using Python) (0)	2022.03.17
지문 (fingerprint) 일치 알고리즘 구현하기 (Python) (0)	2022.02.27
Python Django 파이썬 장고 프로젝트 생성 방법 (0)	2022.02.02
Python Django 파이썬 장고 설치 방법 (0)	2022.02.02
Python pytest 테스팅 기초 (QA) 02강 (0)	2022.02.01

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

혼밥맨

GridSearchCV로 하이퍼파라미터 튜닝하기