본문 바로가기
공부/파이썬 Python

Python을 사용한 유방암 검출 튜토리얼 (Breast Cancer Detection with Python)

by 혼밥맨 2022. 3. 23.
반응형

Python을 사용한 유방암 검출 튜토리얼

Breast Cancer Detection with Python

 

파일럿 프로젝트를 진행함에 앞서 우리 조는 프로젝트 주제를 도전할 가치가 있고, 사회적으로 기여할 수 있는 것이면 좋겠다는 공통 의견을 모았다. 의료 분야라면 도전적 가치·사회 모두 만족할 것이라 생각해 의료 분야를 큰 주제로 선정했다. 그럼 다양한 의료 분야중 어떤 세부 주제를 선택할 것인가 현대 인 간 삶의 수준이 향상됨에 따라 건강한 삶의 욕구 및 관심도가 증가하고 있다. 하지만 건강에 대한 욕구와 다르게 환경적·유전적 요인 등 다양한 요인에 의해 암 발병률이 높아지고 있는 추세다.

그중 유방암은 전 세계적으로 여성암 1위 를 차지할 정도로 발병률이 높다. 관련 자료를 조사하면서 유방암 의심 환자들 중 실제 암으로 판정받는 비율이 0.6%에 그친다는 신문 기사를 접할 수 있었다. 이렇듯 검사 비용 대 비 판정률이 낮게 나오면서 의료보험 등 사회적 비용이 낭비되는 것은 아닌 가, 실제 검강검진의 목적인 조기진단의 역할을 생각하게 되었다. 우리 조 가 제시하는 모형이 오분류를 낮출 수 있다면, 검진으로 낭비되는 비용을 줄 이고 더 빠르게 유방암을 판별해 조기 치료에 도움이 될 것으로 본다. 문제 는 유방암 데이터 중 어떤 것을 사용할 수 있는지, 실제 데이터를 구할 수 있 는지다.

Data

https://archive.ics.uci.edu/ml/datasets/breast+cancer

1
2
3
4
5
6
from sklearn.datasets import load_breast_cancer
 
 
data = load_breast_cancer()
print(data)
print(data.keys())
cs

데이터 확인

 

Library

1
2
3
4
5
6
7
8
9
10
# pip install numpy pandas matplotlib scikit-learn seaborn
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
 
 
import random
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
cs

 

 

Full Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# pip install numpy pandas matplotlib scikit-learn seaborn
 
 
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
 
 
import random
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
 
 
data = load_breast_cancer()
 
 
= data['data']
= data['target']
 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2#, random_state=10)
 
clf = KNeighborsClassifier()
clf.fit(X_train, y_train)
 
print(clf.score(X_test, y_test))
print(len(data['feature_names']))
 
# X_new = np.array(random.sample(range(0, 50), 30))
# print(data['target_names'][clf.predict([X_new])[0]])
 
column_data = np.concatenate([data['data'], data['target'][:, None]], axis=1)
# print(column_data)
coloumn_names = np.concatenate([data['feature_names'], ['Class']])
 
df = pd.DataFrame(column_data, columns=column_names)
 
print(df.corr())
 
sns.heatmap(df.corr(), cmap="coolwarm", annot=True, annot_kws={"fontsize"8})
plt.tight_layout()
plt.show()
 
cs

 

Result

 

반응형

댓글