파이썬을 활용한 이커머스 데이터분석_강의를 듣고 따라했던 코딩과 요점을 정리하였다.
- 출처: fast campus
Chapter04. 고객 이탈 예측 (KNN)¶
분석의 목적¶
KNN 알고리즘으로 고객 이탈(Customer Churn)을 예측
- Binary Classification : 'Yer' or 'No' 를 예측
이번에는 이동통신사 데이터를 다룰 것이다.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
data = pd.read_csv('./data/churn.csv')
data
customerID | gender | SeniorCitizen | Partner | Dependents | tenure | PhoneService | MultipleLines | InternetService | OnlineSecurity | OnlineBackup | DeviceProtection | TechSupport | StreamingTV | StreamingMovies | Contract | PaperlessBilling | PaymentMethod | MonthlyCharges | TotalCharges | Churn | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 7590-VHVEG | Female | 0 | Yes | No | 1 | No | No phone service | DSL | No | Yes | No | No | No | No | Month-to-month | Yes | Electronic check | 29.85 | 29.85 | No |
1 | 5575-GNVDE | Male | 0 | No | No | 34 | Yes | No | DSL | Yes | No | Yes | No | No | No | One year | No | Mailed check | 56.95 | 1889.50 | No |
2 | 3668-QPYBK | Male | 0 | No | No | 2 | Yes | No | DSL | Yes | Yes | No | No | No | No | Month-to-month | Yes | Mailed check | 53.85 | 108.15 | Yes |
3 | 7795-CFOCW | Male | 0 | No | No | 45 | No | No phone service | DSL | Yes | No | Yes | Yes | No | No | One year | No | Bank transfer (automatic) | 42.30 | 1840.75 | No |
4 | 9237-HQITU | Female | 0 | No | No | 2 | Yes | No | Fiber optic | No | No | No | No | No | No | Month-to-month | Yes | Electronic check | 70.70 | 151.65 | Yes |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
7038 | 6840-RESVB | Male | 0 | Yes | Yes | 24 | Yes | Yes | DSL | Yes | No | Yes | Yes | Yes | Yes | One year | Yes | Mailed check | 84.80 | 1990.50 | No |
7039 | 2234-XADUH | Female | 0 | Yes | Yes | 72 | Yes | Yes | Fiber optic | No | Yes | Yes | No | Yes | Yes | One year | Yes | Credit card (automatic) | 103.20 | 7362.90 | No |
7040 | 4801-JZAZL | Female | 0 | Yes | Yes | 11 | No | No phone service | DSL | Yes | No | No | No | No | No | Month-to-month | Yes | Electronic check | 29.60 | 346.45 | No |
7041 | 8361-LTMKD | Male | 1 | Yes | No | 4 | Yes | Yes | Fiber optic | No | No | No | No | No | No | Month-to-month | Yes | Mailed check | 74.40 | 306.60 | Yes |
7042 | 3186-AJIEK | Male | 0 | No | No | 66 | Yes | No | Fiber optic | Yes | No | Yes | Yes | Yes | Yes | Two year | Yes | Bank transfer (automatic) | 105.65 | 6844.50 | No |
7043 rows × 21 columns
data.head()
customerID | gender | SeniorCitizen | Partner | Dependents | tenure | PhoneService | MultipleLines | InternetService | OnlineSecurity | OnlineBackup | DeviceProtection | TechSupport | StreamingTV | StreamingMovies | Contract | PaperlessBilling | PaymentMethod | MonthlyCharges | TotalCharges | Churn | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 7590-VHVEG | Female | 0 | Yes | No | 1 | No | No phone service | DSL | No | Yes | No | No | No | No | Month-to-month | Yes | Electronic check | 29.85 | 29.85 | No |
1 | 5575-GNVDE | Male | 0 | No | No | 34 | Yes | No | DSL | Yes | No | Yes | No | No | No | One year | No | Mailed check | 56.95 | 1889.50 | No |
2 | 3668-QPYBK | Male | 0 | No | No | 2 | Yes | No | DSL | Yes | Yes | No | No | No | No | Month-to-month | Yes | Mailed check | 53.85 | 108.15 | Yes |
3 | 7795-CFOCW | Male | 0 | No | No | 45 | No | No phone service | DSL | Yes | No | Yes | Yes | No | No | One year | No | Bank transfer (automatic) | 42.30 | 1840.75 | No |
4 | 9237-HQITU | Female | 0 | No | No | 2 | Yes | No | Fiber optic | No | No | No | No | No | No | Month-to-month | Yes | Electronic check | 70.70 | 151.65 | Yes |
글자수가 길어서 컬럼중에 'OnlineSecurity...'으로 표시가 되었다. 판다스에서는 이것을 해결해주는 기능이 있다. 다음의 코드를 실행해 보자.
pd.set_option('display.max_columns', 30)
data.head()
customerID | gender | SeniorCitizen | Partner | Dependents | tenure | PhoneService | MultipleLines | InternetService | OnlineSecurity | OnlineBackup | DeviceProtection | TechSupport | StreamingTV | StreamingMovies | Contract | PaperlessBilling | PaymentMethod | MonthlyCharges | TotalCharges | Churn | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 7590-VHVEG | Female | 0 | Yes | No | 1 | No | No phone service | DSL | No | Yes | No | No | No | No | Month-to-month | Yes | Electronic check | 29.85 | 29.85 | No |
1 | 5575-GNVDE | Male | 0 | No | No | 34 | Yes | No | DSL | Yes | No | Yes | No | No | No | One year | No | Mailed check | 56.95 | 1889.50 | No |
2 | 3668-QPYBK | Male | 0 | No | No | 2 | Yes | No | DSL | Yes | Yes | No | No | No | No | Month-to-month | Yes | Mailed check | 53.85 | 108.15 | Yes |
3 | 7795-CFOCW | Male | 0 | No | No | 45 | No | No phone service | DSL | Yes | No | Yes | Yes | No | No | One year | No | Bank transfer (automatic) | 42.30 | 1840.75 | No |
4 | 9237-HQITU | Female | 0 | No | No | 2 | Yes | No | Fiber optic | No | No | No | No | No | No | Month-to-month | Yes | Electronic check | 70.70 | 151.65 | Yes |
이번에는 열 100줄을 생략없이 보고싶을때 할 수 있는 방법이다.
data.head(100)
customerID | gender | SeniorCitizen | Partner | Dependents | tenure | PhoneService | MultipleLines | InternetService | OnlineSecurity | OnlineBackup | DeviceProtection | TechSupport | StreamingTV | StreamingMovies | Contract | PaperlessBilling | PaymentMethod | MonthlyCharges | TotalCharges | Churn | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 7590-VHVEG | Female | 0 | Yes | No | 1 | No | No phone service | DSL | No | Yes | No | No | No | No | Month-to-month | Yes | Electronic check | 29.85 | 29.85 | No |
1 | 5575-GNVDE | Male | 0 | No | No | 34 | Yes | No | DSL | Yes | No | Yes | No | No | No | One year | No | Mailed check | 56.95 | 1889.50 | No |
2 | 3668-QPYBK | Male | 0 | No | No | 2 | Yes | No | DSL | Yes | Yes | No | No | No | No | Month-to-month | Yes | Mailed check | 53.85 | 108.15 | Yes |
3 | 7795-CFOCW | Male | 0 | No | No | 45 | No | No phone service | DSL | Yes | No | Yes | Yes | No | No | One year | No | Bank transfer (automatic) | 42.30 | 1840.75 | No |
4 | 9237-HQITU | Female | 0 | No | No | 2 | Yes | No | Fiber optic | No | No | No | No | No | No | Month-to-month | Yes | Electronic check | 70.70 | 151.65 | Yes |
5 | 9305-CDSKC | Female | 0 | No | No | 8 | Yes | Yes | Fiber optic | No | No | Yes | No | Yes | Yes | Month-to-month | Yes | Electronic check | 99.65 | 820.50 | Yes |
6 | 1452-KIOVK | Male | 0 | No | Yes | 22 | Yes | Yes | Fiber optic | No | Yes | No | No | Yes | No | Month-to-month | Yes | Credit card (automatic) | 89.10 | 1949.40 | No |
7 | 6713-OKOMC | Female | 0 | No | No | 10 | No | No phone service | DSL | Yes | No | No | No | No | No | Month-to-month | No | Mailed check | 29.75 | 301.90 | No |
8 | 7892-POOKP | Female | 0 | Yes | No | 28 | Yes | Yes | Fiber optic | No | No | Yes | Yes | Yes | Yes | Month-to-month | Yes | Electronic check | 104.80 | 3046.05 | Yes |
9 | 6388-TABGU | Male | 0 | No | Yes | 62 | Yes | No | DSL | Yes | Yes | No | No | No | No | One year | No | Bank transfer (automatic) | 56.15 | 3487.95 | No |
10 | 9763-GRSKD | Male | 0 | Yes | Yes | 13 | Yes | No | DSL | Yes | No | No | No | No | No | Month-to-month | Yes | Mailed check | 49.95 | 587.45 | No |
11 | 7469-LKBCI | Male | 0 | No | No | 16 | Yes | No | No | No internet service | No internet service | No internet service | No internet service | No internet service | No internet service | Two year | No | Credit card (automatic) | 18.95 | 326.80 | No |
12 | 8091-TTVAX | Male | 0 | Yes | No | 58 | Yes | Yes | Fiber optic | No | No | Yes | No | Yes | Yes | One year | No | Credit card (automatic) | 100.35 | 5681.10 | No |
13 | 0280-XJGEX | Male | 0 | No | No | 49 | Yes | Yes | Fiber optic | No | Yes | Yes | No | Yes | Yes | Month-to-month | Yes | Bank transfer (automatic) | 103.70 | 5036.30 | Yes |
14 | 5129-JLPIS | Male | 0 | No | No | 25 | Yes | No | Fiber optic | Yes | No | Yes | Yes | Yes | Yes | Month-to-month | Yes | Electronic check | 105.50 | 2686.05 | No |
15 | 3655-SNQYZ | Female | 0 | Yes | Yes | 69 | Yes | Yes | Fiber optic | Yes | Yes | Yes | Yes | Yes | Yes | Two year | No | Credit card (automatic) | 113.25 | 7895.15 | No |
16 | 8191-XWSZG | Female | 0 | No | No | 52 | Yes | No | No | No internet service | No internet service | No internet service | No internet service | No internet service | No internet service | One year | No | Mailed check | 20.65 | 1022.95 | No |
17 | 9959-WOFKT | Male | 0 | No | Yes | 71 | Yes | Yes | Fiber optic | Yes | No | Yes | No | Yes | Yes | Two year | No | Bank transfer (automatic) | 106.70 | 7382.25 | No |
18 | 4190-MFLUW | Female | 0 | Yes | Yes | 10 | Yes | No | DSL | No | No | Yes | Yes | No | No | Month-to-month | No | Credit card (automatic) | 55.20 | 528.35 | Yes |
19 | 4183-MYFRB | Female | 0 | No | No | 21 | Yes | No | Fiber optic | No | Yes | Yes | No | No | Yes | Month-to-month | Yes | Electronic check | 90.05 | 1862.90 | No |
20 | 8779-QRDMV | Male | 1 | No | No | 1 | No | No phone service | DSL | No | No | Yes | No | No | Yes | Month-to-month | Yes | Electronic check | 39.65 | 39.65 | Yes |
21 | 1680-VDCWW | Male | 0 | Yes | No | 12 | Yes | No | No | No internet service | No internet service | No internet service | No internet service | No internet service | No internet service | One year | No | Bank transfer (automatic) | 19.80 | 202.25 | No |
22 | 1066-JKSGK | Male | 0 | No | No | 1 | Yes | No | No | No internet service | No internet service | No internet service | No internet service | No internet service | No internet service | Month-to-month | No | Mailed check | 20.15 | 20.15 | Yes |
23 | 3638-WEABW | Female | 0 | Yes | No | 58 | Yes | Yes | DSL | No | Yes | No | Yes | No | No | Two year | Yes | Credit card (automatic) | 59.90 | 3505.10 | No |
24 | 6322-HRPFA | Male | 0 | Yes | Yes | 49 | Yes | No | DSL | Yes | Yes | No | Yes | No | No | Month-to-month | No | Credit card (automatic) | 59.60 | 2970.30 | No |
25 | 6865-JZNKO | Female | 0 | No | No | 30 | Yes | No | DSL | Yes | Yes | No | No | No | No | Month-to-month | Yes | Bank transfer (automatic) | 55.30 | 1530.60 | No |
26 | 6467-CHFZW | Male | 0 | Yes | Yes | 47 | Yes | Yes | Fiber optic | No | Yes | No | No | Yes | Yes | Month-to-month | Yes | Electronic check | 99.35 | 4749.15 | Yes |
27 | 8665-UTDHZ | Male | 0 | Yes | Yes | 1 | No | No phone service | DSL | No | Yes | No | No | No | No | Month-to-month | No | Electronic check | 30.20 | 30.20 | Yes |
28 | 5248-YGIJN | Male | 0 | Yes | No | 72 | Yes | Yes | DSL | Yes | Yes | Yes | Yes | Yes | Yes | Two year | Yes | Credit card (automatic) | 90.25 | 6369.45 | No |
29 | 8773-HHUOZ | Female | 0 | No | Yes | 17 | Yes | No | DSL | No | No | No | No | Yes | Yes | Month-to-month | Yes | Mailed check | 64.70 | 1093.10 | Yes |
30 | 3841-NFECX | Female | 1 | Yes | No | 71 | Yes | Yes | Fiber optic | Yes | Yes | Yes | Yes | No | No | Two year | Yes | Credit card (automatic) | 96.35 | 6766.95 | No |
31 | 4929-XIHVW | Male | 1 | Yes | No | 2 | Yes | No | Fiber optic | No | No | Yes | No | Yes | Yes | Month-to-month | Yes | Credit card (automatic) | 95.50 | 181.65 | No |
32 | 6827-IEAUQ | Female | 0 | Yes | Yes | 27 | Yes | No | DSL | Yes | Yes | Yes | Yes | No | No | One year | No | Mailed check | 66.15 | 1874.45 | No |
33 | 7310-EGVHZ | Male | 0 | No | No | 1 | Yes | No | No | No internet service | No internet service | No internet service | No internet service | No internet service | No internet service | Month-to-month | No | Bank transfer (automatic) | 20.20 | 20.20 | No |
34 | 3413-BMNZE | Male | 1 | No | No | 1 | Yes | No | DSL | No | No | No | No | No | No | Month-to-month | No | Bank transfer (automatic) | 45.25 | 45.25 | No |
35 | 6234-RAAPL | Female | 0 | Yes | Yes | 72 | Yes | Yes | Fiber optic | Yes | Yes | No | Yes | Yes | No | Two year | No | Bank transfer (automatic) | 99.90 | 7251.70 | No |
36 | 6047-YHPVI | Male | 0 | No | No | 5 | Yes | No | Fiber optic | No | No | No | No | No | No | Month-to-month | Yes | Electronic check | 69.70 | 316.90 | Yes |
37 | 6572-ADKRS | Female | 0 | No | No | 46 | Yes | No | Fiber optic | No | No | Yes | No | No | No | Month-to-month | Yes | Credit card (automatic) | 74.80 | 3548.30 | No |
38 | 5380-WJKOV | Male | 0 | No | No | 34 | Yes | Yes | Fiber optic | No | Yes | Yes | No | Yes | Yes | Month-to-month | Yes | Electronic check | 106.35 | 3549.25 | Yes |
39 | 8168-UQWWF | Female | 0 | No | No | 11 | Yes | Yes | Fiber optic | No | No | Yes | No | Yes | Yes | Month-to-month | Yes | Bank transfer (automatic) | 97.85 | 1105.40 | Yes |
40 | 8865-TNMNX | Male | 0 | Yes | Yes | 10 | Yes | No | DSL | No | Yes | No | No | No | No | One year | No | Mailed check | 49.55 | 475.70 | No |
41 | 9489-DEDVP | Female | 0 | Yes | Yes | 70 | Yes | Yes | DSL | Yes | Yes | No | No | Yes | No | Two year | Yes | Credit card (automatic) | 69.20 | 4872.35 | No |
42 | 9867-JCZSP | Female | 0 | Yes | Yes | 17 | Yes | No | No | No internet service | No internet service | No internet service | No internet service | No internet service | No internet service | One year | No | Mailed check | 20.75 | 418.25 | No |
43 | 4671-VJLCL | Female | 0 | No | No | 63 | Yes | Yes | DSL | Yes | Yes | Yes | Yes | Yes | No | Two year | Yes | Credit card (automatic) | 79.85 | 4861.45 | No |
44 | 4080-IIARD | Female | 0 | Yes | No | 13 | Yes | Yes | DSL | Yes | Yes | No | Yes | Yes | No | Month-to-month | Yes | Electronic check | 76.20 | 981.45 | No |
45 | 3714-NTNFO | Female | 0 | No | No | 49 | Yes | Yes | Fiber optic | No | No | No | No | No | Yes | Month-to-month | Yes | Electronic check | 84.50 | 3906.70 | No |
46 | 5948-UJZLF | Male | 0 | No | No | 2 | Yes | No | DSL | No | Yes | No | No | No | No | Month-to-month | No | Mailed check | 49.25 | 97.00 | No |
47 | 7760-OYPDY | Female | 0 | No | No | 2 | Yes | No | Fiber optic | No | No | No | No | Yes | No | Month-to-month | Yes | Electronic check | 80.65 | 144.15 | Yes |
48 | 7639-LIAYI | Male | 0 | No | No | 52 | Yes | Yes | DSL | Yes | No | No | Yes | Yes | Yes | Two year | Yes | Credit card (automatic) | 79.75 | 4217.80 | No |
49 | 2954-PIBKO | Female | 0 | Yes | Yes | 69 | Yes | Yes | DSL | Yes | No | Yes | Yes | No | No | Two year | Yes | Credit card (automatic) | 64.15 | 4254.10 | No |
50 | 8012-SOUDQ | Female | 1 | No | No | 43 | Yes | Yes | Fiber optic | No | Yes | No | No | Yes | No | Month-to-month | Yes | Electronic check | 90.25 | 3838.75 | No |
51 | 9420-LOJKX | Female | 0 | No | No | 15 | Yes | No | Fiber optic | Yes | Yes | No | No | Yes | Yes | Month-to-month | Yes | Credit card (automatic) | 99.10 | 1426.40 | Yes |
52 | 6575-SUVOI | Female | 1 | Yes | No | 25 | Yes | Yes | DSL | Yes | No | No | Yes | Yes | No | Month-to-month | Yes | Credit card (automatic) | 69.50 | 1752.65 | No |
53 | 7495-OOKFY | Female | 1 | Yes | No | 8 | Yes | Yes | Fiber optic | No | Yes | No | No | No | No | Month-to-month | Yes | Credit card (automatic) | 80.65 | 633.30 | Yes |
54 | 4667-QONEA | Female | 1 | Yes | Yes | 60 | Yes | No | DSL | Yes | Yes | Yes | Yes | No | Yes | One year | Yes | Credit card (automatic) | 74.85 | 4456.35 | No |
55 | 1658-BYGOY | Male | 1 | No | No | 18 | Yes | Yes | Fiber optic | No | No | No | No | Yes | Yes | Month-to-month | Yes | Electronic check | 95.45 | 1752.55 | Yes |
56 | 8769-KKTPH | Female | 0 | Yes | Yes | 63 | Yes | Yes | Fiber optic | Yes | No | No | No | Yes | Yes | One year | Yes | Credit card (automatic) | 99.65 | 6311.20 | No |
57 | 5067-XJQFU | Male | 1 | Yes | Yes | 66 | Yes | Yes | Fiber optic | No | Yes | Yes | Yes | Yes | Yes | One year | Yes | Electronic check | 108.45 | 7076.35 | No |
58 | 3957-SQXML | Female | 0 | Yes | Yes | 34 | Yes | Yes | No | No internet service | No internet service | No internet service | No internet service | No internet service | No internet service | Two year | No | Credit card (automatic) | 24.95 | 894.30 | No |
59 | 5954-BDFSG | Female | 0 | No | No | 72 | Yes | Yes | Fiber optic | No | No | Yes | Yes | Yes | Yes | Two year | Yes | Credit card (automatic) | 107.50 | 7853.70 | No |
60 | 0434-CSFON | Female | 0 | Yes | No | 47 | Yes | Yes | Fiber optic | No | No | Yes | No | Yes | Yes | Month-to-month | Yes | Electronic check | 100.50 | 4707.10 | No |
61 | 1215-FIGMP | Male | 0 | No | No | 60 | Yes | Yes | Fiber optic | No | Yes | No | No | Yes | No | Month-to-month | Yes | Bank transfer (automatic) | 89.90 | 5450.70 | No |
62 | 0526-SXDJP | Male | 0 | Yes | No | 72 | No | No phone service | DSL | Yes | Yes | Yes | No | No | No | Two year | No | Bank transfer (automatic) | 42.10 | 2962.00 | No |
63 | 0557-ASKVU | Female | 0 | Yes | Yes | 18 | Yes | No | DSL | No | No | Yes | Yes | No | No | One year | Yes | Credit card (automatic) | 54.40 | 957.10 | No |
64 | 5698-BQJOH | Female | 0 | No | No | 9 | Yes | Yes | Fiber optic | No | No | No | No | Yes | Yes | Month-to-month | No | Electronic check | 94.40 | 857.25 | Yes |
65 | 5122-CYFXA | Female | 0 | No | No | 3 | Yes | No | DSL | No | Yes | No | Yes | Yes | Yes | Month-to-month | Yes | Electronic check | 75.30 | 244.10 | No |
66 | 8627-ZYGSZ | Male | 0 | Yes | No | 47 | Yes | Yes | Fiber optic | No | Yes | No | No | No | No | One year | Yes | Electronic check | 78.90 | 3650.35 | No |
67 | 3410-YOQBQ | Female | 0 | No | No | 31 | Yes | No | DSL | No | Yes | Yes | Yes | Yes | Yes | Two year | No | Mailed check | 79.20 | 2497.20 | No |
68 | 3170-NMYVV | Female | 0 | Yes | Yes | 50 | Yes | No | No | No internet service | No internet service | No internet service | No internet service | No internet service | No internet service | Two year | No | Bank transfer (automatic) | 20.15 | 930.90 | No |
69 | 7410-OIEDU | Male | 0 | No | No | 10 | Yes | No | Fiber optic | Yes | No | Yes | No | No | No | Month-to-month | Yes | Mailed check | 79.85 | 887.35 | No |
70 | 2273-QCKXA | Male | 0 | No | No | 1 | Yes | No | DSL | No | No | No | Yes | No | No | Month-to-month | No | Mailed check | 49.05 | 49.05 | No |
71 | 0731-EBJQB | Female | 0 | Yes | Yes | 52 | Yes | No | No | No internet service | No internet service | No internet service | No internet service | No internet service | No internet service | One year | Yes | Electronic check | 20.40 | 1090.65 | No |
72 | 1891-QRQSA | Male | 1 | Yes | Yes | 64 | Yes | Yes | Fiber optic | Yes | No | Yes | Yes | Yes | Yes | Two year | Yes | Bank transfer (automatic) | 111.60 | 7099.00 | No |
73 | 8028-PNXHQ | Male | 0 | Yes | Yes | 62 | Yes | Yes | No | No internet service | No internet service | No internet service | No internet service | No internet service | No internet service | Two year | Yes | Bank transfer (automatic) | 24.25 | 1424.60 | No |
74 | 5630-AHZIL | Female | 0 | No | Yes | 3 | Yes | No | DSL | Yes | No | No | Yes | No | Yes | Month-to-month | Yes | Bank transfer (automatic) | 64.50 | 177.40 | No |
75 | 2673-CXQEU | Female | 1 | No | No | 56 | Yes | Yes | Fiber optic | Yes | Yes | Yes | No | Yes | Yes | One year | No | Electronic check | 110.50 | 6139.50 | No |
76 | 6416-JNVRK | Female | 0 | No | No | 46 | Yes | No | DSL | No | No | No | No | No | Yes | One year | No | Credit card (automatic) | 55.65 | 2688.85 | No |
77 | 5590-ZSKRV | Female | 0 | Yes | Yes | 8 | Yes | No | DSL | Yes | Yes | No | No | No | No | Month-to-month | No | Mailed check | 54.65 | 482.25 | No |
78 | 0191-ZHSKZ | Male | 1 | No | No | 30 | Yes | No | DSL | Yes | Yes | No | No | Yes | Yes | Month-to-month | Yes | Electronic check | 74.75 | 2111.30 | No |
79 | 3887-PBQAO | Female | 0 | Yes | Yes | 45 | Yes | Yes | No | No internet service | No internet service | No internet service | No internet service | No internet service | No internet service | One year | Yes | Credit card (automatic) | 25.90 | 1216.60 | No |
80 | 5919-TMRGD | Female | 0 | No | Yes | 1 | Yes | No | Fiber optic | No | No | No | No | Yes | No | Month-to-month | Yes | Electronic check | 79.35 | 79.35 | Yes |
81 | 8108-UXRQN | Female | 0 | Yes | Yes | 11 | No | No phone service | DSL | Yes | No | No | No | Yes | Yes | Month-to-month | No | Electronic check | 50.55 | 565.35 | No |
82 | 9191-MYQKX | Female | 0 | Yes | No | 7 | Yes | No | Fiber optic | No | No | Yes | No | No | No | Month-to-month | Yes | Bank transfer (automatic) | 75.15 | 496.90 | Yes |
83 | 9919-YLNNG | Female | 0 | No | No | 42 | Yes | No | Fiber optic | No | Yes | Yes | Yes | Yes | Yes | Month-to-month | Yes | Bank transfer (automatic) | 103.80 | 4327.50 | No |
84 | 0318-ZOPWS | Female | 0 | Yes | No | 49 | Yes | No | No | No internet service | No internet service | No internet service | No internet service | No internet service | No internet service | Two year | Yes | Bank transfer (automatic) | 20.15 | 973.35 | No |
85 | 4445-ZJNMU | Male | 0 | No | No | 9 | Yes | Yes | Fiber optic | No | Yes | No | No | Yes | Yes | Month-to-month | Yes | Credit card (automatic) | 99.30 | 918.75 | No |
86 | 4808-YNLEU | Female | 0 | Yes | No | 35 | Yes | No | DSL | Yes | No | No | No | Yes | No | One year | Yes | Bank transfer (automatic) | 62.15 | 2215.45 | No |
87 | 1862-QRWPE | Female | 0 | Yes | Yes | 48 | Yes | No | No | No internet service | No internet service | No internet service | No internet service | No internet service | No internet service | Two year | No | Bank transfer (automatic) | 20.65 | 1057.00 | No |
88 | 2796-NNUFI | Female | 0 | Yes | Yes | 46 | Yes | No | No | No internet service | No internet service | No internet service | No internet service | No internet service | No internet service | Two year | Yes | Mailed check | 19.95 | 927.10 | No |
89 | 3016-KSVCP | Male | 0 | Yes | No | 29 | No | No phone service | DSL | No | No | No | No | Yes | No | Month-to-month | No | Mailed check | 33.75 | 1009.25 | No |
90 | 4767-HZZHQ | Male | 0 | Yes | Yes | 30 | Yes | No | Fiber optic | No | Yes | Yes | No | No | No | Month-to-month | No | Bank transfer (automatic) | 82.05 | 2570.20 | No |
91 | 2424-WVHPL | Male | 1 | No | No | 1 | Yes | No | Fiber optic | No | No | No | Yes | No | No | Month-to-month | No | Electronic check | 74.70 | 74.70 | No |
92 | 7233-PAHHL | Male | 0 | Yes | Yes | 66 | Yes | Yes | DSL | Yes | No | Yes | Yes | Yes | Yes | Two year | Yes | Mailed check | 84.00 | 5714.25 | No |
93 | 6067-NGCEU | Female | 0 | No | No | 65 | Yes | Yes | Fiber optic | Yes | Yes | Yes | No | Yes | Yes | Month-to-month | Yes | Credit card (automatic) | 111.05 | 7107.00 | No |
94 | 9848-JQJTX | Male | 0 | No | No | 72 | Yes | Yes | Fiber optic | No | Yes | Yes | No | Yes | Yes | Two year | Yes | Bank transfer (automatic) | 100.90 | 7459.05 | No |
95 | 8637-XJIVR | Female | 0 | No | No | 12 | Yes | Yes | Fiber optic | Yes | No | No | No | No | No | Month-to-month | Yes | Electronic check | 78.95 | 927.35 | Yes |
96 | 9803-FTJCG | Male | 0 | Yes | Yes | 71 | Yes | Yes | DSL | Yes | Yes | No | Yes | No | No | One year | Yes | Credit card (automatic) | 66.85 | 4748.70 | No |
97 | 0278-YXOOG | Male | 0 | No | No | 5 | Yes | No | No | No internet service | No internet service | No internet service | No internet service | No internet service | No internet service | Month-to-month | No | Mailed check | 21.05 | 113.85 | Yes |
98 | 3212-KXOCR | Male | 0 | No | No | 52 | Yes | No | No | No internet service | No internet service | No internet service | No internet service | No internet service | No internet service | Two year | No | Bank transfer (automatic) | 21.00 | 1107.20 | No |
99 | 4598-XLKNJ | Female | 1 | Yes | No | 25 | Yes | No | Fiber optic | No | Yes | Yes | No | Yes | Yes | Month-to-month | Yes | Electronic check | 98.50 | 2514.50 | Yes |
5 부터 ~ 94 까지 '...'으로 생략되어 있다. 이것을 보고싶을때도 위에서 했던거와 방법론적으로는 같다. row 로만 교체하면 된다.
pd.set_option('display.max_row', 100)
data.head(100)
customerID | gender | SeniorCitizen | Partner | Dependents | tenure | PhoneService | MultipleLines | InternetService | OnlineSecurity | OnlineBackup | DeviceProtection | TechSupport | StreamingTV | StreamingMovies | Contract | PaperlessBilling | PaymentMethod | MonthlyCharges | TotalCharges | Churn | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 7590-VHVEG | Female | 0 | Yes | No | 1 | No | No phone service | DSL | No | Yes | No | No | No | No | Month-to-month | Yes | Electronic check | 29.85 | 29.85 | No |
1 | 5575-GNVDE | Male | 0 | No | No | 34 | Yes | No | DSL | Yes | No | Yes | No | No | No | One year | No | Mailed check | 56.95 | 1889.50 | No |
2 | 3668-QPYBK | Male | 0 | No | No | 2 | Yes | No | DSL | Yes | Yes | No | No | No | No | Month-to-month | Yes | Mailed check | 53.85 | 108.15 | Yes |
3 | 7795-CFOCW | Male | 0 | No | No | 45 | No | No phone service | DSL | Yes | No | Yes | Yes | No | No | One year | No | Bank transfer (automatic) | 42.30 | 1840.75 | No |
4 | 9237-HQITU | Female | 0 | No | No | 2 | Yes | No | Fiber optic | No | No | No | No | No | No | Month-to-month | Yes | Electronic check | 70.70 | 151.65 | Yes |
5 | 9305-CDSKC | Female | 0 | No | No | 8 | Yes | Yes | Fiber optic | No | No | Yes | No | Yes | Yes | Month-to-month | Yes | Electronic check | 99.65 | 820.50 | Yes |
6 | 1452-KIOVK | Male | 0 | No | Yes | 22 | Yes | Yes | Fiber optic | No | Yes | No | No | Yes | No | Month-to-month | Yes | Credit card (automatic) | 89.10 | 1949.40 | No |
7 | 6713-OKOMC | Female | 0 | No | No | 10 | No | No phone service | DSL | Yes | No | No | No | No | No | Month-to-month | No | Mailed check | 29.75 | 301.90 | No |
8 | 7892-POOKP | Female | 0 | Yes | No | 28 | Yes | Yes | Fiber optic | No | No | Yes | Yes | Yes | Yes | Month-to-month | Yes | Electronic check | 104.80 | 3046.05 | Yes |
9 | 6388-TABGU | Male | 0 | No | Yes | 62 | Yes | No | DSL | Yes | Yes | No | No | No | No | One year | No | Bank transfer (automatic) | 56.15 | 3487.95 | No |
10 | 9763-GRSKD | Male | 0 | Yes | Yes | 13 | Yes | No | DSL | Yes | No | No | No | No | No | Month-to-month | Yes | Mailed check | 49.95 | 587.45 | No |
11 | 7469-LKBCI | Male | 0 | No | No | 16 | Yes | No | No | No internet service | No internet service | No internet service | No internet service | No internet service | No internet service | Two year | No | Credit card (automatic) | 18.95 | 326.80 | No |
12 | 8091-TTVAX | Male | 0 | Yes | No | 58 | Yes | Yes | Fiber optic | No | No | Yes | No | Yes | Yes | One year | No | Credit card (automatic) | 100.35 | 5681.10 | No |
13 | 0280-XJGEX | Male | 0 | No | No | 49 | Yes | Yes | Fiber optic | No | Yes | Yes | No | Yes | Yes | Month-to-month | Yes | Bank transfer (automatic) | 103.70 | 5036.30 | Yes |
14 | 5129-JLPIS | Male | 0 | No | No | 25 | Yes | No | Fiber optic | Yes | No | Yes | Yes | Yes | Yes | Month-to-month | Yes | Electronic check | 105.50 | 2686.05 | No |
15 | 3655-SNQYZ | Female | 0 | Yes | Yes | 69 | Yes | Yes | Fiber optic | Yes | Yes | Yes | Yes | Yes | Yes | Two year | No | Credit card (automatic) | 113.25 | 7895.15 | No |
16 | 8191-XWSZG | Female | 0 | No | No | 52 | Yes | No | No | No internet service | No internet service | No internet service | No internet service | No internet service | No internet service | One year | No | Mailed check | 20.65 | 1022.95 | No |
17 | 9959-WOFKT | Male | 0 | No | Yes | 71 | Yes | Yes | Fiber optic | Yes | No | Yes | No | Yes | Yes | Two year | No | Bank transfer (automatic) | 106.70 | 7382.25 | No |
18 | 4190-MFLUW | Female | 0 | Yes | Yes | 10 | Yes | No | DSL | No | No | Yes | Yes | No | No | Month-to-month | No | Credit card (automatic) | 55.20 | 528.35 | Yes |
19 | 4183-MYFRB | Female | 0 | No | No | 21 | Yes | No | Fiber optic | No | Yes | Yes | No | No | Yes | Month-to-month | Yes | Electronic check | 90.05 | 1862.90 | No |
20 | 8779-QRDMV | Male | 1 | No | No | 1 | No | No phone service | DSL | No | No | Yes | No | No | Yes | Month-to-month | Yes | Electronic check | 39.65 | 39.65 | Yes |
21 | 1680-VDCWW | Male | 0 | Yes | No | 12 | Yes | No | No | No internet service | No internet service | No internet service | No internet service | No internet service | No internet service | One year | No | Bank transfer (automatic) | 19.80 | 202.25 | No |
22 | 1066-JKSGK | Male | 0 | No | No | 1 | Yes | No | No | No internet service | No internet service | No internet service | No internet service | No internet service | No internet service | Month-to-month | No | Mailed check | 20.15 | 20.15 | Yes |
23 | 3638-WEABW | Female | 0 | Yes | No | 58 | Yes | Yes | DSL | No | Yes | No | Yes | No | No | Two year | Yes | Credit card (automatic) | 59.90 | 3505.10 | No |
24 | 6322-HRPFA | Male | 0 | Yes | Yes | 49 | Yes | No | DSL | Yes | Yes | No | Yes | No | No | Month-to-month | No | Credit card (automatic) | 59.60 | 2970.30 | No |
25 | 6865-JZNKO | Female | 0 | No | No | 30 | Yes | No | DSL | Yes | Yes | No | No | No | No | Month-to-month | Yes | Bank transfer (automatic) | 55.30 | 1530.60 | No |
26 | 6467-CHFZW | Male | 0 | Yes | Yes | 47 | Yes | Yes | Fiber optic | No | Yes | No | No | Yes | Yes | Month-to-month | Yes | Electronic check | 99.35 | 4749.15 | Yes |
27 | 8665-UTDHZ | Male | 0 | Yes | Yes | 1 | No | No phone service | DSL | No | Yes | No | No | No | No | Month-to-month | No | Electronic check | 30.20 | 30.20 | Yes |
28 | 5248-YGIJN | Male | 0 | Yes | No | 72 | Yes | Yes | DSL | Yes | Yes | Yes | Yes | Yes | Yes | Two year | Yes | Credit card (automatic) | 90.25 | 6369.45 | No |
29 | 8773-HHUOZ | Female | 0 | No | Yes | 17 | Yes | No | DSL | No | No | No | No | Yes | Yes | Month-to-month | Yes | Mailed check | 64.70 | 1093.10 | Yes |
30 | 3841-NFECX | Female | 1 | Yes | No | 71 | Yes | Yes | Fiber optic | Yes | Yes | Yes | Yes | No | No | Two year | Yes | Credit card (automatic) | 96.35 | 6766.95 | No |
31 | 4929-XIHVW | Male | 1 | Yes | No | 2 | Yes | No | Fiber optic | No | No | Yes | No | Yes | Yes | Month-to-month | Yes | Credit card (automatic) | 95.50 | 181.65 | No |
32 | 6827-IEAUQ | Female | 0 | Yes | Yes | 27 | Yes | No | DSL | Yes | Yes | Yes | Yes | No | No | One year | No | Mailed check | 66.15 | 1874.45 | No |
33 | 7310-EGVHZ | Male | 0 | No | No | 1 | Yes | No | No | No internet service | No internet service | No internet service | No internet service | No internet service | No internet service | Month-to-month | No | Bank transfer (automatic) | 20.20 | 20.20 | No |
34 | 3413-BMNZE | Male | 1 | No | No | 1 | Yes | No | DSL | No | No | No | No | No | No | Month-to-month | No | Bank transfer (automatic) | 45.25 | 45.25 | No |
35 | 6234-RAAPL | Female | 0 | Yes | Yes | 72 | Yes | Yes | Fiber optic | Yes | Yes | No | Yes | Yes | No | Two year | No | Bank transfer (automatic) | 99.90 | 7251.70 | No |
36 | 6047-YHPVI | Male | 0 | No | No | 5 | Yes | No | Fiber optic | No | No | No | No | No | No | Month-to-month | Yes | Electronic check | 69.70 | 316.90 | Yes |
37 | 6572-ADKRS | Female | 0 | No | No | 46 | Yes | No | Fiber optic | No | No | Yes | No | No | No | Month-to-month | Yes | Credit card (automatic) | 74.80 | 3548.30 | No |
38 | 5380-WJKOV | Male | 0 | No | No | 34 | Yes | Yes | Fiber optic | No | Yes | Yes | No | Yes | Yes | Month-to-month | Yes | Electronic check | 106.35 | 3549.25 | Yes |
39 | 8168-UQWWF | Female | 0 | No | No | 11 | Yes | Yes | Fiber optic | No | No | Yes | No | Yes | Yes | Month-to-month | Yes | Bank transfer (automatic) | 97.85 | 1105.40 | Yes |
40 | 8865-TNMNX | Male | 0 | Yes | Yes | 10 | Yes | No | DSL | No | Yes | No | No | No | No | One year | No | Mailed check | 49.55 | 475.70 | No |
41 | 9489-DEDVP | Female | 0 | Yes | Yes | 70 | Yes | Yes | DSL | Yes | Yes | No | No | Yes | No | Two year | Yes | Credit card (automatic) | 69.20 | 4872.35 | No |
42 | 9867-JCZSP | Female | 0 | Yes | Yes | 17 | Yes | No | No | No internet service | No internet service | No internet service | No internet service | No internet service | No internet service | One year | No | Mailed check | 20.75 | 418.25 | No |
43 | 4671-VJLCL | Female | 0 | No | No | 63 | Yes | Yes | DSL | Yes | Yes | Yes | Yes | Yes | No | Two year | Yes | Credit card (automatic) | 79.85 | 4861.45 | No |
44 | 4080-IIARD | Female | 0 | Yes | No | 13 | Yes | Yes | DSL | Yes | Yes | No | Yes | Yes | No | Month-to-month | Yes | Electronic check | 76.20 | 981.45 | No |
45 | 3714-NTNFO | Female | 0 | No | No | 49 | Yes | Yes | Fiber optic | No | No | No | No | No | Yes | Month-to-month | Yes | Electronic check | 84.50 | 3906.70 | No |
46 | 5948-UJZLF | Male | 0 | No | No | 2 | Yes | No | DSL | No | Yes | No | No | No | No | Month-to-month | No | Mailed check | 49.25 | 97.00 | No |
47 | 7760-OYPDY | Female | 0 | No | No | 2 | Yes | No | Fiber optic | No | No | No | No | Yes | No | Month-to-month | Yes | Electronic check | 80.65 | 144.15 | Yes |
48 | 7639-LIAYI | Male | 0 | No | No | 52 | Yes | Yes | DSL | Yes | No | No | Yes | Yes | Yes | Two year | Yes | Credit card (automatic) | 79.75 | 4217.80 | No |
49 | 2954-PIBKO | Female | 0 | Yes | Yes | 69 | Yes | Yes | DSL | Yes | No | Yes | Yes | No | No | Two year | Yes | Credit card (automatic) | 64.15 | 4254.10 | No |
50 | 8012-SOUDQ | Female | 1 | No | No | 43 | Yes | Yes | Fiber optic | No | Yes | No | No | Yes | No | Month-to-month | Yes | Electronic check | 90.25 | 3838.75 | No |
51 | 9420-LOJKX | Female | 0 | No | No | 15 | Yes | No | Fiber optic | Yes | Yes | No | No | Yes | Yes | Month-to-month | Yes | Credit card (automatic) | 99.10 | 1426.40 | Yes |
52 | 6575-SUVOI | Female | 1 | Yes | No | 25 | Yes | Yes | DSL | Yes | No | No | Yes | Yes | No | Month-to-month | Yes | Credit card (automatic) | 69.50 | 1752.65 | No |
53 | 7495-OOKFY | Female | 1 | Yes | No | 8 | Yes | Yes | Fiber optic | No | Yes | No | No | No | No | Month-to-month | Yes | Credit card (automatic) | 80.65 | 633.30 | Yes |
54 | 4667-QONEA | Female | 1 | Yes | Yes | 60 | Yes | No | DSL | Yes | Yes | Yes | Yes | No | Yes | One year | Yes | Credit card (automatic) | 74.85 | 4456.35 | No |
55 | 1658-BYGOY | Male | 1 | No | No | 18 | Yes | Yes | Fiber optic | No | No | No | No | Yes | Yes | Month-to-month | Yes | Electronic check | 95.45 | 1752.55 | Yes |
56 | 8769-KKTPH | Female | 0 | Yes | Yes | 63 | Yes | Yes | Fiber optic | Yes | No | No | No | Yes | Yes | One year | Yes | Credit card (automatic) | 99.65 | 6311.20 | No |
57 | 5067-XJQFU | Male | 1 | Yes | Yes | 66 | Yes | Yes | Fiber optic | No | Yes | Yes | Yes | Yes | Yes | One year | Yes | Electronic check | 108.45 | 7076.35 | No |
58 | 3957-SQXML | Female | 0 | Yes | Yes | 34 | Yes | Yes | No | No internet service | No internet service | No internet service | No internet service | No internet service | No internet service | Two year | No | Credit card (automatic) | 24.95 | 894.30 | No |
59 | 5954-BDFSG | Female | 0 | No | No | 72 | Yes | Yes | Fiber optic | No | No | Yes | Yes | Yes | Yes | Two year | Yes | Credit card (automatic) | 107.50 | 7853.70 | No |
60 | 0434-CSFON | Female | 0 | Yes | No | 47 | Yes | Yes | Fiber optic | No | No | Yes | No | Yes | Yes | Month-to-month | Yes | Electronic check | 100.50 | 4707.10 | No |
61 | 1215-FIGMP | Male | 0 | No | No | 60 | Yes | Yes | Fiber optic | No | Yes | No | No | Yes | No | Month-to-month | Yes | Bank transfer (automatic) | 89.90 | 5450.70 | No |
62 | 0526-SXDJP | Male | 0 | Yes | No | 72 | No | No phone service | DSL | Yes | Yes | Yes | No | No | No | Two year | No | Bank transfer (automatic) | 42.10 | 2962.00 | No |
63 | 0557-ASKVU | Female | 0 | Yes | Yes | 18 | Yes | No | DSL | No | No | Yes | Yes | No | No | One year | Yes | Credit card (automatic) | 54.40 | 957.10 | No |
64 | 5698-BQJOH | Female | 0 | No | No | 9 | Yes | Yes | Fiber optic | No | No | No | No | Yes | Yes | Month-to-month | No | Electronic check | 94.40 | 857.25 | Yes |
65 | 5122-CYFXA | Female | 0 | No | No | 3 | Yes | No | DSL | No | Yes | No | Yes | Yes | Yes | Month-to-month | Yes | Electronic check | 75.30 | 244.10 | No |
66 | 8627-ZYGSZ | Male | 0 | Yes | No | 47 | Yes | Yes | Fiber optic | No | Yes | No | No | No | No | One year | Yes | Electronic check | 78.90 | 3650.35 | No |
67 | 3410-YOQBQ | Female | 0 | No | No | 31 | Yes | No | DSL | No | Yes | Yes | Yes | Yes | Yes | Two year | No | Mailed check | 79.20 | 2497.20 | No |
68 | 3170-NMYVV | Female | 0 | Yes | Yes | 50 | Yes | No | No | No internet service | No internet service | No internet service | No internet service | No internet service | No internet service | Two year | No | Bank transfer (automatic) | 20.15 | 930.90 | No |
69 | 7410-OIEDU | Male | 0 | No | No | 10 | Yes | No | Fiber optic | Yes | No | Yes | No | No | No | Month-to-month | Yes | Mailed check | 79.85 | 887.35 | No |
70 | 2273-QCKXA | Male | 0 | No | No | 1 | Yes | No | DSL | No | No | No | Yes | No | No | Month-to-month | No | Mailed check | 49.05 | 49.05 | No |
71 | 0731-EBJQB | Female | 0 | Yes | Yes | 52 | Yes | No | No | No internet service | No internet service | No internet service | No internet service | No internet service | No internet service | One year | Yes | Electronic check | 20.40 | 1090.65 | No |
72 | 1891-QRQSA | Male | 1 | Yes | Yes | 64 | Yes | Yes | Fiber optic | Yes | No | Yes | Yes | Yes | Yes | Two year | Yes | Bank transfer (automatic) | 111.60 | 7099.00 | No |
73 | 8028-PNXHQ | Male | 0 | Yes | Yes | 62 | Yes | Yes | No | No internet service | No internet service | No internet service | No internet service | No internet service | No internet service | Two year | Yes | Bank transfer (automatic) | 24.25 | 1424.60 | No |
74 | 5630-AHZIL | Female | 0 | No | Yes | 3 | Yes | No | DSL | Yes | No | No | Yes | No | Yes | Month-to-month | Yes | Bank transfer (automatic) | 64.50 | 177.40 | No |
75 | 2673-CXQEU | Female | 1 | No | No | 56 | Yes | Yes | Fiber optic | Yes | Yes | Yes | No | Yes | Yes | One year | No | Electronic check | 110.50 | 6139.50 | No |
76 | 6416-JNVRK | Female | 0 | No | No | 46 | Yes | No | DSL | No | No | No | No | No | Yes | One year | No | Credit card (automatic) | 55.65 | 2688.85 | No |
77 | 5590-ZSKRV | Female | 0 | Yes | Yes | 8 | Yes | No | DSL | Yes | Yes | No | No | No | No | Month-to-month | No | Mailed check | 54.65 | 482.25 | No |
78 | 0191-ZHSKZ | Male | 1 | No | No | 30 | Yes | No | DSL | Yes | Yes | No | No | Yes | Yes | Month-to-month | Yes | Electronic check | 74.75 | 2111.30 | No |
79 | 3887-PBQAO | Female | 0 | Yes | Yes | 45 | Yes | Yes | No | No internet service | No internet service | No internet service | No internet service | No internet service | No internet service | One year | Yes | Credit card (automatic) | 25.90 | 1216.60 | No |
80 | 5919-TMRGD | Female | 0 | No | Yes | 1 | Yes | No | Fiber optic | No | No | No | No | Yes | No | Month-to-month | Yes | Electronic check | 79.35 | 79.35 | Yes |
81 | 8108-UXRQN | Female | 0 | Yes | Yes | 11 | No | No phone service | DSL | Yes | No | No | No | Yes | Yes | Month-to-month | No | Electronic check | 50.55 | 565.35 | No |
82 | 9191-MYQKX | Female | 0 | Yes | No | 7 | Yes | No | Fiber optic | No | No | Yes | No | No | No | Month-to-month | Yes | Bank transfer (automatic) | 75.15 | 496.90 | Yes |
83 | 9919-YLNNG | Female | 0 | No | No | 42 | Yes | No | Fiber optic | No | Yes | Yes | Yes | Yes | Yes | Month-to-month | Yes | Bank transfer (automatic) | 103.80 | 4327.50 | No |
84 | 0318-ZOPWS | Female | 0 | Yes | No | 49 | Yes | No | No | No internet service | No internet service | No internet service | No internet service | No internet service | No internet service | Two year | Yes | Bank transfer (automatic) | 20.15 | 973.35 | No |
85 | 4445-ZJNMU | Male | 0 | No | No | 9 | Yes | Yes | Fiber optic | No | Yes | No | No | Yes | Yes | Month-to-month | Yes | Credit card (automatic) | 99.30 | 918.75 | No |
86 | 4808-YNLEU | Female | 0 | Yes | No | 35 | Yes | No | DSL | Yes | No | No | No | Yes | No | One year | Yes | Bank transfer (automatic) | 62.15 | 2215.45 | No |
87 | 1862-QRWPE | Female | 0 | Yes | Yes | 48 | Yes | No | No | No internet service | No internet service | No internet service | No internet service | No internet service | No internet service | Two year | No | Bank transfer (automatic) | 20.65 | 1057.00 | No |
88 | 2796-NNUFI | Female | 0 | Yes | Yes | 46 | Yes | No | No | No internet service | No internet service | No internet service | No internet service | No internet service | No internet service | Two year | Yes | Mailed check | 19.95 | 927.10 | No |
89 | 3016-KSVCP | Male | 0 | Yes | No | 29 | No | No phone service | DSL | No | No | No | No | Yes | No | Month-to-month | No | Mailed check | 33.75 | 1009.25 | No |
90 | 4767-HZZHQ | Male | 0 | Yes | Yes | 30 | Yes | No | Fiber optic | No | Yes | Yes | No | No | No | Month-to-month | No | Bank transfer (automatic) | 82.05 | 2570.20 | No |
91 | 2424-WVHPL | Male | 1 | No | No | 1 | Yes | No | Fiber optic | No | No | No | Yes | No | No | Month-to-month | No | Electronic check | 74.70 | 74.70 | No |
92 | 7233-PAHHL | Male | 0 | Yes | Yes | 66 | Yes | Yes | DSL | Yes | No | Yes | Yes | Yes | Yes | Two year | Yes | Mailed check | 84.00 | 5714.25 | No |
93 | 6067-NGCEU | Female | 0 | No | No | 65 | Yes | Yes | Fiber optic | Yes | Yes | Yes | No | Yes | Yes | Month-to-month | Yes | Credit card (automatic) | 111.05 | 7107.00 | No |
94 | 9848-JQJTX | Male | 0 | No | No | 72 | Yes | Yes | Fiber optic | No | Yes | Yes | No | Yes | Yes | Two year | Yes | Bank transfer (automatic) | 100.90 | 7459.05 | No |
95 | 8637-XJIVR | Female | 0 | No | No | 12 | Yes | Yes | Fiber optic | Yes | No | No | No | No | No | Month-to-month | Yes | Electronic check | 78.95 | 927.35 | Yes |
96 | 9803-FTJCG | Male | 0 | Yes | Yes | 71 | Yes | Yes | DSL | Yes | Yes | No | Yes | No | No | One year | Yes | Credit card (automatic) | 66.85 | 4748.70 | No |
97 | 0278-YXOOG | Male | 0 | No | No | 5 | Yes | No | No | No internet service | No internet service | No internet service | No internet service | No internet service | No internet service | Month-to-month | No | Mailed check | 21.05 | 113.85 | Yes |
98 | 3212-KXOCR | Male | 0 | No | No | 52 | Yes | No | No | No internet service | No internet service | No internet service | No internet service | No internet service | No internet service | Two year | No | Bank transfer (automatic) | 21.00 | 1107.20 | No |
99 | 4598-XLKNJ | Female | 1 | Yes | No | 25 | Yes | No | Fiber optic | No | Yes | Yes | No | Yes | Yes | Month-to-month | Yes | Electronic check | 98.50 | 2514.50 | Yes |
컬럼(특성) 도메인 지식
- customID : 사용자 ID (ID의 경우 데이터분석에서 중요하게 다루지 않는다. 추후 드랍할 것이다.)
- gender : 성별
- SeniorCitizen : 고령자 (SeniorCitizen일 경우 '1' , 아니면 '0' )
- Partner : 배우자
- Dependents : 자녀
- tenure : 얼마나 사용했는지
- PhoneService : 핸드폰 서비스
- MultipleLines : 전화기 수가 2대 이상 일 경우
- InternetService : 인터넷 서비스
- OnlineSecurity ~ StreamingMovies : 통신사의 별도 서비스 이용 여부
- Contract : 년 단위 계약, 월 단위 계약 인지
- PaperlessBilling : 전자영수증 이용 여부
- PaymentMethod : 결제 수단
- MonthlyCharge : 월별 평균 요금 값
- TotalCharges : 요금 총합
- Churn : 고객이 이탈을 했는지, 안했는지
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 7043 entries, 0 to 7042 Data columns (total 21 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 customerID 7043 non-null object 1 gender 7043 non-null object 2 SeniorCitizen 7043 non-null int64 3 Partner 7043 non-null object 4 Dependents 7043 non-null object 5 tenure 7043 non-null int64 6 PhoneService 7043 non-null object 7 MultipleLines 7043 non-null object 8 InternetService 7043 non-null object 9 OnlineSecurity 7043 non-null object 10 OnlineBackup 7043 non-null object 11 DeviceProtection 7043 non-null object 12 TechSupport 7043 non-null object 13 StreamingTV 7043 non-null object 14 StreamingMovies 7043 non-null object 15 Contract 7043 non-null object 16 PaperlessBilling 7043 non-null object 17 PaymentMethod 7043 non-null object 18 MonthlyCharges 7043 non-null float64 19 TotalCharges 7043 non-null object 20 Churn 7043 non-null object dtypes: float64(1), int64(2), object(18) memory usage: 1.1+ MB
이번 데이터는 텍스트로 된 문자데이터가 많다. 숫자 데이터가 별로 없음을 알 수 있다.
- TotalCharges 을 데이터프레임으로 확인했을때는 숫자데이터 였는데, .info()로 실행결과는 object로 나와 있다.
- 그래서 강제로 바꾸어 주는 명령어를 실행하자.
pd.to_numeric(data['TotalCharges']) # ()안에 바꾸고자하는 컬럼을 넣어주자.
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) pandas/_libs/lib.pyx in pandas._libs.lib.maybe_convert_numeric() ValueError: Unable to parse string " " During handling of the above exception, another exception occurred: ValueError Traceback (most recent call last) <ipython-input-115-742d52be8376> in <module> ----> 1 pd.to_numeric(data['TotalCharges']) # ()안에 바꾸고자하는 컬럼을 넣어주자. ~/.local/lib/python3.6/site-packages/pandas/core/tools/numeric.py in to_numeric(arg, errors, downcast) 151 try: 152 values = lib.maybe_convert_numeric( --> 153 values, set(), coerce_numeric=coerce_numeric 154 ) 155 except (ValueError, TypeError): pandas/_libs/lib.pyx in pandas._libs.lib.maybe_convert_numeric() ValueError: Unable to parse string " " at position 488
에러가 나타났는데, 이 에러코드 내용은 488번 포지션에서 " " 공백이 있어서 에러가 나타났다는 뜻이다. 이러한 에러도 해결해 보자.
data.iloc[488] # row(열)을 인덱싱해서 시리즈 형태로 살펴보자. 488
customerID 4472-LVYGI gender Female SeniorCitizen 0 Partner Yes Dependents Yes tenure 0 PhoneService No MultipleLines No phone service InternetService DSL OnlineSecurity Yes OnlineBackup No DeviceProtection Yes TechSupport Yes StreamingTV Yes StreamingMovies No Contract Two year PaperlessBilling Yes PaymentMethod Bank transfer (automatic) MonthlyCharges 52.55 TotalCharges Churn No Name: 488, dtype: object
TotalCharges 에서 " " 공백이 있으므로, 이를 바꿔주자. .replace() 함수 사용
data['TotalCharges'] = data['TotalCharges'].replace(" ","") # " ",을 "" 공백없게 바꾸어주겠다.
data['TotalCharges']
0 29.85 1 1889.50 2 108.15 3 1840.75 4 151.65 ... 7038 1990.50 7039 7362.90 7040 346.45 7041 306.60 7042 6844.50 Name: TotalCharges, Length: 7043, dtype: object
488이 확인이 되지 않지만, 잘 실행 되었기에 굳이 확인할 필요는 없다.
그 후에 pd.to_numeric(data['TotalCharges']) 위에 오류가 났었던, 코드를 다시 실행해보자.
data['TotalCharges'] = pd.to_numeric(data['TotalCharges'])
data['TotalCharges']
0 29.85 1 1889.50 2 108.15 3 1840.75 4 151.65 ... 7038 1990.50 7039 7362.90 7040 346.45 7041 306.60 7042 6844.50 Name: TotalCharges, Length: 7043, dtype: float64
object에서 float64로 변경된 것을 확인 할 수 있다.
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 7043 entries, 0 to 7042 Data columns (total 21 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 customerID 7043 non-null object 1 gender 7043 non-null object 2 SeniorCitizen 7043 non-null int64 3 Partner 7043 non-null object 4 Dependents 7043 non-null object 5 tenure 7043 non-null int64 6 PhoneService 7043 non-null object 7 MultipleLines 7043 non-null object 8 InternetService 7043 non-null object 9 OnlineSecurity 7043 non-null object 10 OnlineBackup 7043 non-null object 11 DeviceProtection 7043 non-null object 12 TechSupport 7043 non-null object 13 StreamingTV 7043 non-null object 14 StreamingMovies 7043 non-null object 15 Contract 7043 non-null object 16 PaperlessBilling 7043 non-null object 17 PaymentMethod 7043 non-null object 18 MonthlyCharges 7043 non-null float64 19 TotalCharges 7032 non-null float64 20 Churn 7043 non-null object dtypes: float64(2), int64(2), object(17) memory usage: 1.1+ MB
최종적으로 TotalCharges이 숫자데이터로 변경된 것을 확인했다. (object 에서 float64로)
data.describe()
SeniorCitizen | tenure | MonthlyCharges | TotalCharges | |
---|---|---|---|---|
count | 7043.000000 | 7043.000000 | 7043.000000 | 7032.000000 |
mean | 0.162147 | 32.371149 | 64.761692 | 2283.300441 |
std | 0.368612 | 24.559481 | 30.090047 | 2266.771362 |
min | 0.000000 | 0.000000 | 18.250000 | 18.800000 |
25% | 0.000000 | 9.000000 | 35.500000 | 401.450000 |
50% | 0.000000 | 29.000000 | 70.350000 | 1397.475000 |
75% | 0.000000 | 55.000000 | 89.850000 | 3794.737500 |
max | 1.000000 | 72.000000 | 118.750000 | 8684.800000 |
sns.distplot(data['TotalCharges'])
/home/ubuntu/.local/lib/python3.6/site-packages/seaborn/distributions.py:2557: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms). warnings.warn(msg, FutureWarning)
<AxesSubplot:xlabel='TotalCharges', ylabel='Density'>
이제는 'gender' 컬럼을 분석에 용이하게 바꿔보자. '0' , '1' 로 바꾸어 주자.
data['gender'].nunique()
2
두개의 유니크한 요소가 있다는 것을 확인했다. 우리는 이미 알고 있지만 ( 남자, 여자 )
다음의 코드를 사용해서 바꾸어주자.
pd.get_dummies(data, columns=['gender'], drop_first = True)
customerID | SeniorCitizen | Partner | Dependents | tenure | PhoneService | MultipleLines | InternetService | OnlineSecurity | OnlineBackup | DeviceProtection | TechSupport | StreamingTV | StreamingMovies | Contract | PaperlessBilling | PaymentMethod | MonthlyCharges | TotalCharges | Churn | gender_Male | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 7590-VHVEG | 0 | Yes | No | 1 | No | No phone service | DSL | No | Yes | No | No | No | No | Month-to-month | Yes | Electronic check | 29.85 | 29.85 | No | 0 |
1 | 5575-GNVDE | 0 | No | No | 34 | Yes | No | DSL | Yes | No | Yes | No | No | No | One year | No | Mailed check | 56.95 | 1889.50 | No | 1 |
2 | 3668-QPYBK | 0 | No | No | 2 | Yes | No | DSL | Yes | Yes | No | No | No | No | Month-to-month | Yes | Mailed check | 53.85 | 108.15 | Yes | 1 |
3 | 7795-CFOCW | 0 | No | No | 45 | No | No phone service | DSL | Yes | No | Yes | Yes | No | No | One year | No | Bank transfer (automatic) | 42.30 | 1840.75 | No | 1 |
4 | 9237-HQITU | 0 | No | No | 2 | Yes | No | Fiber optic | No | No | No | No | No | No | Month-to-month | Yes | Electronic check | 70.70 | 151.65 | Yes | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
7038 | 6840-RESVB | 0 | Yes | Yes | 24 | Yes | Yes | DSL | Yes | No | Yes | Yes | Yes | Yes | One year | Yes | Mailed check | 84.80 | 1990.50 | No | 1 |
7039 | 2234-XADUH | 0 | Yes | Yes | 72 | Yes | Yes | Fiber optic | No | Yes | Yes | No | Yes | Yes | One year | Yes | Credit card (automatic) | 103.20 | 7362.90 | No | 0 |
7040 | 4801-JZAZL | 0 | Yes | Yes | 11 | No | No phone service | DSL | Yes | No | No | No | No | No | Month-to-month | Yes | Electronic check | 29.60 | 346.45 | No | 0 |
7041 | 8361-LTMKD | 1 | Yes | No | 4 | Yes | Yes | Fiber optic | No | No | No | No | No | No | Month-to-month | Yes | Mailed check | 74.40 | 306.60 | Yes | 1 |
7042 | 3186-AJIEK | 0 | No | No | 66 | Yes | No | Fiber optic | Yes | No | Yes | Yes | Yes | Yes | Two year | Yes | Bank transfer (automatic) | 105.65 | 6844.50 | No | 1 |
7043 rows × 21 columns
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 7043 entries, 0 to 7042 Data columns (total 21 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 customerID 7043 non-null object 1 gender 7043 non-null object 2 SeniorCitizen 7043 non-null int64 3 Partner 7043 non-null object 4 Dependents 7043 non-null object 5 tenure 7043 non-null int64 6 PhoneService 7043 non-null object 7 MultipleLines 7043 non-null object 8 InternetService 7043 non-null object 9 OnlineSecurity 7043 non-null object 10 OnlineBackup 7043 non-null object 11 DeviceProtection 7043 non-null object 12 TechSupport 7043 non-null object 13 StreamingTV 7043 non-null object 14 StreamingMovies 7043 non-null object 15 Contract 7043 non-null object 16 PaperlessBilling 7043 non-null object 17 PaymentMethod 7043 non-null object 18 MonthlyCharges 7043 non-null float64 19 TotalCharges 7032 non-null float64 20 Churn 7043 non-null object dtypes: float64(2), int64(2), object(17) memory usage: 1.1+ MB
위에 방법으로 object 을 float로 바꾸고 싶은데, 컬럼수가 너무 많다. 그래서 for문을 통해 처리하려 한다.
data['gender'].dtype
dtype('O')
data['gender'].dtype == 'O'
True
col_list = []
data.columns
Index(['customerID', 'gender', 'SeniorCitizen', 'Partner', 'Dependents', 'tenure', 'PhoneService', 'MultipleLines', 'InternetService', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling', 'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'Churn'], dtype='object')
for i in data.columns:
if data[i].dtype == 'O':
col_list.append(i)
col_list
['customerID', 'gender', 'Partner', 'Dependents', 'PhoneService', 'MultipleLines', 'InternetService', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling', 'PaymentMethod', 'Churn']
for i in col_list:
print(i, data[i].nunique())
customerID 7043 gender 2 Partner 2 Dependents 2 PhoneService 2 MultipleLines 3 InternetService 3 OnlineSecurity 3 OnlineBackup 3 DeviceProtection 3 TechSupport 3 StreamingTV 3 StreamingMovies 3 Contract 3 PaperlessBilling 2 PaymentMethod 4 Churn 2
customID 은 드랍시켜준다.
col_list = col_list[1:]
col_list
['gender', 'Partner', 'Dependents', 'PhoneService', 'MultipleLines', 'InternetService', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling', 'PaymentMethod', 'Churn']
사전 작업을 마무리 했다. 위에 썻던 코드를 다시 한번 불러와서 columns을 col_list 로 대체해주자.
data = pd.get_dummies(data, columns=col_list, drop_first = True)
data
customerID | SeniorCitizen | tenure | MonthlyCharges | TotalCharges | gender_Male | Partner_Yes | Dependents_Yes | PhoneService_Yes | MultipleLines_No phone service | MultipleLines_Yes | InternetService_Fiber optic | InternetService_No | OnlineSecurity_No internet service | OnlineSecurity_Yes | ... | DeviceProtection_No internet service | DeviceProtection_Yes | TechSupport_No internet service | TechSupport_Yes | StreamingTV_No internet service | StreamingTV_Yes | StreamingMovies_No internet service | StreamingMovies_Yes | Contract_One year | Contract_Two year | PaperlessBilling_Yes | PaymentMethod_Credit card (automatic) | PaymentMethod_Electronic check | PaymentMethod_Mailed check | Churn_Yes | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 7590-VHVEG | 0 | 1 | 29.85 | 29.85 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
1 | 5575-GNVDE | 0 | 34 | 56.95 | 1889.50 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
2 | 3668-QPYBK | 0 | 2 | 53.85 | 108.15 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
3 | 7795-CFOCW | 0 | 45 | 42.30 | 1840.75 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | ... | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 9237-HQITU | 0 | 2 | 70.70 | 151.65 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
7038 | 6840-RESVB | 0 | 24 | 84.80 | 1990.50 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | ... | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
7039 | 2234-XADUH | 0 | 72 | 103.20 | 7362.90 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | ... | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 |
7040 | 4801-JZAZL | 0 | 11 | 29.60 | 346.45 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
7041 | 8361-LTMKD | 1 | 4 | 74.40 | 306.60 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
7042 | 3186-AJIEK | 0 | 66 | 105.65 | 6844.50 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | ... | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
7043 rows × 32 columns
컬럼 수는 32개로 늘어났고, '0' , '1' 로 float 데이터로 변환을 성공적으로 했다.
data.isna()
customerID | SeniorCitizen | tenure | MonthlyCharges | TotalCharges | gender_Male | Partner_Yes | Dependents_Yes | PhoneService_Yes | MultipleLines_No phone service | MultipleLines_Yes | InternetService_Fiber optic | InternetService_No | OnlineSecurity_No internet service | OnlineSecurity_Yes | ... | DeviceProtection_No internet service | DeviceProtection_Yes | TechSupport_No internet service | TechSupport_Yes | StreamingTV_No internet service | StreamingTV_Yes | StreamingMovies_No internet service | StreamingMovies_Yes | Contract_One year | Contract_Two year | PaperlessBilling_Yes | PaymentMethod_Credit card (automatic) | PaymentMethod_Electronic check | PaymentMethod_Mailed check | Churn_Yes | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False |
1 | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False |
2 | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False |
3 | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False |
4 | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
7038 | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False |
7039 | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False |
7040 | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False |
7041 | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False |
7042 | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False |
7043 rows × 32 columns
data.isna().sum()
customerID 0 SeniorCitizen 0 tenure 0 MonthlyCharges 0 TotalCharges 11 gender_Male 0 Partner_Yes 0 Dependents_Yes 0 PhoneService_Yes 0 MultipleLines_No phone service 0 MultipleLines_Yes 0 InternetService_Fiber optic 0 InternetService_No 0 OnlineSecurity_No internet service 0 OnlineSecurity_Yes 0 OnlineBackup_No internet service 0 OnlineBackup_Yes 0 DeviceProtection_No internet service 0 DeviceProtection_Yes 0 TechSupport_No internet service 0 TechSupport_Yes 0 StreamingTV_No internet service 0 StreamingTV_Yes 0 StreamingMovies_No internet service 0 StreamingMovies_Yes 0 Contract_One year 0 Contract_Two year 0 PaperlessBilling_Yes 0 PaymentMethod_Credit card (automatic) 0 PaymentMethod_Electronic check 0 PaymentMethod_Mailed check 0 Churn_Yes 0 dtype: int64
TotalCharges 컬럼에 11개의 결측치를 확인했다. mean 값으로 넣을지, median 값으로 넣을지 고려해보자. 시각화한 그래프를 보고 결정하자.
data['TotalCharges'].mean()
2283.3004408418656
data['TotalCharges'].median()
1397.475
sns.distplot(data['TotalCharges'])
/home/ubuntu/.local/lib/python3.6/site-packages/seaborn/distributions.py:2557: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms). warnings.warn(msg, FutureWarning)
<AxesSubplot:xlabel='TotalCharges', ylabel='Density'>
이러한 분포의 데이터라면 median 값을 결측치로 채워주는게 좋은 방법 이다.
data['TotalCharges'] = data['TotalCharges'].fillna(data['TotalCharges'].median())
결측치가 이제는 없는지 확인해 보자.
data.isna().sum()
customerID 0 SeniorCitizen 0 tenure 0 MonthlyCharges 0 TotalCharges 0 gender_Male 0 Partner_Yes 0 Dependents_Yes 0 PhoneService_Yes 0 MultipleLines_No phone service 0 MultipleLines_Yes 0 InternetService_Fiber optic 0 InternetService_No 0 OnlineSecurity_No internet service 0 OnlineSecurity_Yes 0 OnlineBackup_No internet service 0 OnlineBackup_Yes 0 DeviceProtection_No internet service 0 DeviceProtection_Yes 0 TechSupport_No internet service 0 TechSupport_Yes 0 StreamingTV_No internet service 0 StreamingTV_Yes 0 StreamingMovies_No internet service 0 StreamingMovies_Yes 0 Contract_One year 0 Contract_Two year 0 PaperlessBilling_Yes 0 PaymentMethod_Credit card (automatic) 0 PaymentMethod_Electronic check 0 PaymentMethod_Mailed check 0 Churn_Yes 0 dtype: int64
전부 0으로, 완전한 데이터가 완성되었다.
스케일링 하는 방법 3가지를 알아보자.¶
StandardScaler, MinMaxScaler, RobustScaler 3가지 방법중 이번 예제 데이터는 MinMaxScaler 을 추천한다. 왜 그런지 살펴보자.
- MinMaxScaler
from sklearn.preprocessing import StandardScaler, MinMaxScaler, RobustScaler
mixmax = MinMaxScaler()
data.drop('customerID', axis=1, inplace=True)
data.head(3)
SeniorCitizen | tenure | MonthlyCharges | TotalCharges | gender_Male | Partner_Yes | Dependents_Yes | PhoneService_Yes | MultipleLines_No phone service | MultipleLines_Yes | InternetService_Fiber optic | InternetService_No | OnlineSecurity_No internet service | OnlineSecurity_Yes | OnlineBackup_No internet service | ... | DeviceProtection_No internet service | DeviceProtection_Yes | TechSupport_No internet service | TechSupport_Yes | StreamingTV_No internet service | StreamingTV_Yes | StreamingMovies_No internet service | StreamingMovies_Yes | Contract_One year | Contract_Two year | PaperlessBilling_Yes | PaymentMethod_Credit card (automatic) | PaymentMethod_Electronic check | PaymentMethod_Mailed check | Churn_Yes | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | 29.85 | 29.85 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
1 | 0 | 34 | 56.95 | 1889.50 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
2 | 0 | 2 | 53.85 | 108.15 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
3 rows × 31 columns
mixmax.fit(data)
MinMaxScaler()
mixmax.transform(data)
array([[0. , 0.01388889, 0.11542289, ..., 1. , 0. , 0. ], [0. , 0.47222222, 0.38507463, ..., 0. , 1. , 0. ], [0. , 0.02777778, 0.35422886, ..., 0. , 1. , 1. ], ..., [0. , 0.15277778, 0.11293532, ..., 1. , 0. , 0. ], [1. , 0.05555556, 0.55870647, ..., 0. , 1. , 1. ], [0. , 0.91666667, 0.86965174, ..., 0. , 0. , 0. ]])
scaled_data = mixmax.transform(data)
pd.DataFrame(scaled_data) # 위에 결과값을 데이터프레임으로 보기 위해
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | ... | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.0 | 0.013889 | 0.115423 | 0.001275 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 |
1 | 0.0 | 0.472222 | 0.385075 | 0.215867 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
2 | 0.0 | 0.027778 | 0.354229 | 0.010310 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 |
3 | 0.0 | 0.625000 | 0.239303 | 0.210241 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
4 | 0.0 | 0.027778 | 0.521891 | 0.015330 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
7038 | 0.0 | 0.333333 | 0.662189 | 0.227521 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 |
7039 | 0.0 | 1.000000 | 0.845274 | 0.847461 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 |
7040 | 0.0 | 0.152778 | 0.112935 | 0.037809 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 |
7041 | 1.0 | 0.055556 | 0.558706 | 0.033210 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 |
7042 | 0.0 | 0.916667 | 0.869652 | 0.787641 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 |
7043 rows × 31 columns
컬럼 값 또한 넘파이 array 값으로 나왔기 때문에, 다음의 코드로 바꿔주자.
data.columns
Index(['SeniorCitizen', 'tenure', 'MonthlyCharges', 'TotalCharges', 'gender_Male', 'Partner_Yes', 'Dependents_Yes', 'PhoneService_Yes', 'MultipleLines_No phone service', 'MultipleLines_Yes', 'InternetService_Fiber optic', 'InternetService_No', 'OnlineSecurity_No internet service', 'OnlineSecurity_Yes', 'OnlineBackup_No internet service', 'OnlineBackup_Yes', 'DeviceProtection_No internet service', 'DeviceProtection_Yes', 'TechSupport_No internet service', 'TechSupport_Yes', 'StreamingTV_No internet service', 'StreamingTV_Yes', 'StreamingMovies_No internet service', 'StreamingMovies_Yes', 'Contract_One year', 'Contract_Two year', 'PaperlessBilling_Yes', 'PaymentMethod_Credit card (automatic)', 'PaymentMethod_Electronic check', 'PaymentMethod_Mailed check', 'Churn_Yes'], dtype='object')
pd.DataFrame(scaled_data, columns=data.columns) # 컬럼 바꿔주기
SeniorCitizen | tenure | MonthlyCharges | TotalCharges | gender_Male | Partner_Yes | Dependents_Yes | PhoneService_Yes | MultipleLines_No phone service | MultipleLines_Yes | InternetService_Fiber optic | InternetService_No | OnlineSecurity_No internet service | OnlineSecurity_Yes | OnlineBackup_No internet service | ... | DeviceProtection_No internet service | DeviceProtection_Yes | TechSupport_No internet service | TechSupport_Yes | StreamingTV_No internet service | StreamingTV_Yes | StreamingMovies_No internet service | StreamingMovies_Yes | Contract_One year | Contract_Two year | PaperlessBilling_Yes | PaymentMethod_Credit card (automatic) | PaymentMethod_Electronic check | PaymentMethod_Mailed check | Churn_Yes | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.0 | 0.013889 | 0.115423 | 0.001275 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 |
1 | 0.0 | 0.472222 | 0.385075 | 0.215867 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
2 | 0.0 | 0.027778 | 0.354229 | 0.010310 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 |
3 | 0.0 | 0.625000 | 0.239303 | 0.210241 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
4 | 0.0 | 0.027778 | 0.521891 | 0.015330 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
7038 | 0.0 | 0.333333 | 0.662189 | 0.227521 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 |
7039 | 0.0 | 1.000000 | 0.845274 | 0.847461 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 |
7040 | 0.0 | 0.152778 | 0.112935 | 0.037809 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 |
7041 | 1.0 | 0.055556 | 0.558706 | 0.033210 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 |
7042 | 0.0 | 0.916667 | 0.869652 | 0.787641 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 |
7043 rows × 31 columns
data
SeniorCitizen | tenure | MonthlyCharges | TotalCharges | gender_Male | Partner_Yes | Dependents_Yes | PhoneService_Yes | MultipleLines_No phone service | MultipleLines_Yes | InternetService_Fiber optic | InternetService_No | OnlineSecurity_No internet service | OnlineSecurity_Yes | OnlineBackup_No internet service | ... | DeviceProtection_No internet service | DeviceProtection_Yes | TechSupport_No internet service | TechSupport_Yes | StreamingTV_No internet service | StreamingTV_Yes | StreamingMovies_No internet service | StreamingMovies_Yes | Contract_One year | Contract_Two year | PaperlessBilling_Yes | PaymentMethod_Credit card (automatic) | PaymentMethod_Electronic check | PaymentMethod_Mailed check | Churn_Yes | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | 29.85 | 29.85 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
1 | 0 | 34 | 56.95 | 1889.50 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
2 | 0 | 2 | 53.85 | 108.15 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
3 | 0 | 45 | 42.30 | 1840.75 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | ... | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 0 | 2 | 70.70 | 151.65 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
7038 | 0 | 24 | 84.80 | 1990.50 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | ... | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
7039 | 0 | 72 | 103.20 | 7362.90 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | ... | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 |
7040 | 0 | 11 | 29.60 | 346.45 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
7041 | 1 | 4 | 74.40 | 306.60 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
7042 | 0 | 66 | 105.65 | 6844.50 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | ... | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
7043 rows × 31 columns
기존 data와 비교해보자. 스케일링 하기전.
- tenure 컬럼을 예로들어 34라는 수치는 대략 47%정도 비중임을 알 수 있다.
- StandardScaler
- 종속변수 드랍하고 진행
standard = StandardScaler()
standard.fit(data.drop('Churn_Yes', axis=1))
StandardScaler()
scaled_st = standard.transform(data.drop('Churn_Yes', axis=1))
pd.DataFrame(scaled_st, columns = data.drop('Churn_Yes', axis=1).columns)
SeniorCitizen | tenure | MonthlyCharges | TotalCharges | gender_Male | Partner_Yes | Dependents_Yes | PhoneService_Yes | MultipleLines_No phone service | MultipleLines_Yes | InternetService_Fiber optic | InternetService_No | OnlineSecurity_No internet service | OnlineSecurity_Yes | OnlineBackup_No internet service | OnlineBackup_Yes | DeviceProtection_No internet service | DeviceProtection_Yes | TechSupport_No internet service | TechSupport_Yes | StreamingTV_No internet service | StreamingTV_Yes | StreamingMovies_No internet service | StreamingMovies_Yes | Contract_One year | Contract_Two year | PaperlessBilling_Yes | PaymentMethod_Credit card (automatic) | PaymentMethod_Electronic check | PaymentMethod_Mailed check | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | -0.439916 | -1.277445 | -1.160323 | -0.994242 | -1.009559 | 1.034530 | -0.654012 | -3.054010 | 3.054010 | -0.854176 | -0.885660 | -0.525927 | -0.525927 | -0.633933 | -0.525927 | 1.378241 | -0.525927 | -0.723968 | -0.525927 | -0.639439 | -0.525927 | -0.790132 | -0.525927 | -0.796070 | -0.514249 | -0.562975 | 0.829798 | -0.525047 | 1.406418 | -0.544807 |
1 | -0.439916 | 0.066327 | -0.259629 | -0.173244 | 0.990532 | -0.966622 | -0.654012 | 0.327438 | -0.327438 | -0.854176 | -0.885660 | -0.525927 | -0.525927 | 1.577454 | -0.525927 | -0.725563 | -0.525927 | 1.381277 | -0.525927 | -0.639439 | -0.525927 | -0.790132 | -0.525927 | -0.796070 | 1.944582 | -0.562975 | -1.205113 | -0.525047 | -0.711026 | 1.835513 |
2 | -0.439916 | -1.236724 | -0.362660 | -0.959674 | 0.990532 | -0.966622 | -0.654012 | 0.327438 | -0.327438 | -0.854176 | -0.885660 | -0.525927 | -0.525927 | 1.577454 | -0.525927 | 1.378241 | -0.525927 | -0.723968 | -0.525927 | -0.639439 | -0.525927 | -0.790132 | -0.525927 | -0.796070 | -0.514249 | -0.562975 | 0.829798 | -0.525047 | -0.711026 | 1.835513 |
3 | -0.439916 | 0.514251 | -0.746535 | -0.194766 | 0.990532 | -0.966622 | -0.654012 | -3.054010 | 3.054010 | -0.854176 | -0.885660 | -0.525927 | -0.525927 | 1.577454 | -0.525927 | -0.725563 | -0.525927 | 1.381277 | -0.525927 | 1.563872 | -0.525927 | -0.790132 | -0.525927 | -0.796070 | 1.944582 | -0.562975 | -1.205113 | -0.525047 | -0.711026 | -0.544807 |
4 | -0.439916 | -1.236724 | 0.197365 | -0.940470 | -1.009559 | -0.966622 | -0.654012 | 0.327438 | -0.327438 | -0.854176 | 1.129102 | -0.525927 | -0.525927 | -0.633933 | -0.525927 | -0.725563 | -0.525927 | -0.723968 | -0.525927 | -0.639439 | -0.525927 | -0.790132 | -0.525927 | -0.796070 | -0.514249 | -0.562975 | 0.829798 | -0.525047 | 1.406418 | -0.544807 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
7038 | -0.439916 | -0.340876 | 0.665992 | -0.128655 | 0.990532 | 1.034530 | 1.529024 | 0.327438 | -0.327438 | 1.170719 | -0.885660 | -0.525927 | -0.525927 | 1.577454 | -0.525927 | -0.725563 | -0.525927 | 1.381277 | -0.525927 | 1.563872 | -0.525927 | 1.265612 | -0.525927 | 1.256171 | 1.944582 | -0.562975 | 0.829798 | -0.525047 | -0.711026 | 1.835513 |
7039 | -0.439916 | 1.613701 | 1.277533 | 2.243151 | -1.009559 | 1.034530 | 1.529024 | 0.327438 | -0.327438 | 1.170719 | 1.129102 | -0.525927 | -0.525927 | -0.633933 | -0.525927 | 1.378241 | -0.525927 | 1.381277 | -0.525927 | -0.639439 | -0.525927 | 1.265612 | -0.525927 | 1.256171 | 1.944582 | -0.562975 | 0.829798 | 1.904590 | -0.711026 | -0.544807 |
7040 | -0.439916 | -0.870241 | -1.168632 | -0.854469 | -1.009559 | 1.034530 | 1.529024 | -3.054010 | 3.054010 | -0.854176 | -0.885660 | -0.525927 | -0.525927 | 1.577454 | -0.525927 | -0.725563 | -0.525927 | -0.723968 | -0.525927 | -0.639439 | -0.525927 | -0.790132 | -0.525927 | -0.796070 | -0.514249 | -0.562975 | 0.829798 | -0.525047 | 1.406418 | -0.544807 |
7041 | 2.273159 | -1.155283 | 0.320338 | -0.872062 | 0.990532 | 1.034530 | -0.654012 | 0.327438 | -0.327438 | 1.170719 | 1.129102 | -0.525927 | -0.525927 | -0.633933 | -0.525927 | -0.725563 | -0.525927 | -0.723968 | -0.525927 | -0.639439 | -0.525927 | -0.790132 | -0.525927 | -0.796070 | -0.514249 | -0.562975 | 0.829798 | -0.525047 | -0.711026 | 1.835513 |
7042 | -0.439916 | 1.369379 | 1.358961 | 2.014288 | 0.990532 | -0.966622 | -0.654012 | 0.327438 | -0.327438 | -0.854176 | 1.129102 | -0.525927 | -0.525927 | 1.577454 | -0.525927 | -0.725563 | -0.525927 | 1.381277 | -0.525927 | 1.563872 | -0.525927 | 1.265612 | -0.525927 | 1.256171 | -0.514249 | 1.776278 | 0.829798 | -0.525047 | -0.711026 | -0.544807 |
7043 rows × 30 columns
- RobustScaler
- 종속변수 드랍하고 진행
rob = RobustScaler()
rob.fit(data.drop('Churn_Yes', axis=1))
RobustScaler()
scaled_rob = rob.transform(data.drop('Churn_Yes', axis=1))
pd.DataFrame(scaled_rob, columns= data.drop('Churn_Yes', axis=1).columns)
SeniorCitizen | tenure | MonthlyCharges | TotalCharges | gender_Male | Partner_Yes | Dependents_Yes | PhoneService_Yes | MultipleLines_No phone service | MultipleLines_Yes | InternetService_Fiber optic | InternetService_No | OnlineSecurity_No internet service | OnlineSecurity_Yes | OnlineBackup_No internet service | OnlineBackup_Yes | DeviceProtection_No internet service | DeviceProtection_Yes | TechSupport_No internet service | TechSupport_Yes | StreamingTV_No internet service | StreamingTV_Yes | StreamingMovies_No internet service | StreamingMovies_Yes | Contract_One year | Contract_Two year | PaperlessBilling_Yes | PaymentMethod_Credit card (automatic) | PaymentMethod_Electronic check | PaymentMethod_Mailed check | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.0 | -0.608696 | -0.745170 | -0.404100 | -1.0 | 1.0 | 0.0 | -1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
1 | 0.0 | 0.108696 | -0.246550 | 0.145381 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | -1.0 | 0.0 | 0.0 | 1.0 |
2 | 0.0 | -0.586957 | -0.303588 | -0.380964 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
3 | 0.0 | 0.347826 | -0.516099 | 0.130977 | 0.0 | 0.0 | 0.0 | -1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | -1.0 | 0.0 | 0.0 | 0.0 |
4 | 0.0 | -0.586957 | 0.006440 | -0.368111 | -1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
7038 | 0.0 | -0.108696 | 0.265869 | 0.175224 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
7039 | 0.0 | 0.934783 | 0.604416 | 1.762637 | -1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 |
7040 | 0.0 | -0.391304 | -0.749770 | -0.310552 | -1.0 | 1.0 | 1.0 | -1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
7041 | 1.0 | -0.543478 | 0.074517 | -0.322327 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
7042 | 0.0 | 0.804348 | 0.649494 | 1.609463 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 |
7043 rows × 30 columns
이 데이터의 경우 MinMaxScaler 을 사용한다. 나머지는 정의를 하지 않고, MinMaxScaler 를 scaled_data 로 정의해주겠다.
scaled_data = pd.DataFrame(scaled_data, columns=data.columns)
그리고 train_test_split 을 하겠다.
from sklearn.model_selection import train_test_split
scaled_data
SeniorCitizen | tenure | MonthlyCharges | TotalCharges | gender_Male | Partner_Yes | Dependents_Yes | PhoneService_Yes | MultipleLines_No phone service | MultipleLines_Yes | InternetService_Fiber optic | InternetService_No | OnlineSecurity_No internet service | OnlineSecurity_Yes | OnlineBackup_No internet service | ... | DeviceProtection_No internet service | DeviceProtection_Yes | TechSupport_No internet service | TechSupport_Yes | StreamingTV_No internet service | StreamingTV_Yes | StreamingMovies_No internet service | StreamingMovies_Yes | Contract_One year | Contract_Two year | PaperlessBilling_Yes | PaymentMethod_Credit card (automatic) | PaymentMethod_Electronic check | PaymentMethod_Mailed check | Churn_Yes | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.0 | 0.013889 | 0.115423 | 0.001275 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 |
1 | 0.0 | 0.472222 | 0.385075 | 0.215867 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
2 | 0.0 | 0.027778 | 0.354229 | 0.010310 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 |
3 | 0.0 | 0.625000 | 0.239303 | 0.210241 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
4 | 0.0 | 0.027778 | 0.521891 | 0.015330 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
7038 | 0.0 | 0.333333 | 0.662189 | 0.227521 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 |
7039 | 0.0 | 1.000000 | 0.845274 | 0.847461 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 |
7040 | 0.0 | 0.152778 | 0.112935 | 0.037809 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 |
7041 | 1.0 | 0.055556 | 0.558706 | 0.033210 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 |
7042 | 0.0 | 0.916667 | 0.869652 | 0.787641 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 |
7043 rows × 31 columns
X = scaled_data.drop('Churn_Yes', axis = 1 )
y = scaled_data['Churn_Yes']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=100)
KNN 알고리즘을 불러온다.
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=10)
knn.fit(X_train, y_train)
KNeighborsClassifier(n_neighbors=10)
knn.predict(X_test)
array([0., 0., 0., ..., 0., 0., 0.])
pred = knn.predict(X_test)
pd.DataFrame(pred, y_test)
0 | |
---|---|
Churn_Yes | |
0.0 | 0.0 |
0.0 | 0.0 |
0.0 | 0.0 |
0.0 | 0.0 |
0.0 | 0.0 |
... | ... |
1.0 | 1.0 |
0.0 | 0.0 |
0.0 | 0.0 |
0.0 | 0.0 |
0.0 | 0.0 |
2113 rows × 1 columns
여기서 딕셔너리를 사용해서 바꿔준다. 보기 좋게
pd.DataFrame({'acutal_value': y_test, 'pred_value': pred})
acutal_value | pred_value | |
---|---|---|
4880 | 0.0 | 0.0 |
1541 | 0.0 | 0.0 |
1289 | 0.0 | 0.0 |
5745 | 0.0 | 0.0 |
4873 | 0.0 | 0.0 |
... | ... | ... |
1285 | 1.0 | 1.0 |
5092 | 0.0 | 0.0 |
5837 | 0.0 | 0.0 |
3597 | 0.0 | 0.0 |
3625 | 0.0 | 0.0 |
2113 rows × 2 columns
pd.DataFrame({'acutal_value': y_test, 'pred_value': pred}).head(30)
acutal_value | pred_value | |
---|---|---|
4880 | 0.0 | 0.0 |
1541 | 0.0 | 0.0 |
1289 | 0.0 | 0.0 |
5745 | 0.0 | 0.0 |
4873 | 0.0 | 0.0 |
4168 | 0.0 | 0.0 |
1557 | 0.0 | 0.0 |
2892 | 0.0 | 0.0 |
664 | 0.0 | 0.0 |
1588 | 0.0 | 0.0 |
1338 | 1.0 | 0.0 |
6000 | 0.0 | 0.0 |
2310 | 0.0 | 0.0 |
3294 | 1.0 | 1.0 |
290 | 1.0 | 1.0 |
2505 | 0.0 | 0.0 |
3171 | 0.0 | 0.0 |
1366 | 1.0 | 1.0 |
6560 | 0.0 | 0.0 |
2420 | 0.0 | 0.0 |
5210 | 1.0 | 0.0 |
2836 | 0.0 | 1.0 |
1325 | 1.0 | 1.0 |
4900 | 1.0 | 0.0 |
6311 | 0.0 | 0.0 |
1025 | 0.0 | 0.0 |
2031 | 0.0 | 0.0 |
4459 | 1.0 | 1.0 |
5324 | 0.0 | 0.0 |
3441 | 0.0 | 0.0 |
최적의 n_neighbors 찾기
from sklearn.metrics import accuracy_score, confusion_matrix
accuracy_score(y_test, pred)
0.7581637482252721
confusion_matrix(y_test, pred)
array([[1353, 194], [ 317, 249]])
for 문을 통해 n_neighbors = 1 ~ 100 대입해보기
error_list = []
for i in range(1,101):
knn = KNeighborsClassifier(n_neighbors = i )
knn.fit(X_train, y_train)
pred = knn.predict(X_test)
error_list.append(accuracy_score(y_test, pred))
error_list
[0.7136772361571225, 0.7491717936583058, 0.7439659252247989, 0.7557974443918599, 0.7482252721249408, 0.7581637482252721, 0.7458589682915286, 0.759110269758637, 0.7529578797917653, 0.7581637482252721, 0.7624230951254141, 0.7610033128253668, 0.7600567912920019, 0.7671557027922385, 0.7662091812588736, 0.7681022243256034, 0.7690487458589683, 0.7728348319924279, 0.7681022243256034, 0.7723615712257453, 0.7681022243256034, 0.7714150496923805, 0.7699952673923331, 0.7652626597255088, 0.767628963558921, 0.7714150496923805, 0.7733080927591103, 0.7728348319924279, 0.7733080927591103, 0.7714150496923805, 0.7699952673923331, 0.767628963558921, 0.7709417889256981, 0.7704685281590156, 0.7742546142924751, 0.7737813535257927, 0.7733080927591103, 0.77520113582584, 0.7718883104590629, 0.7737813535257927, 0.7723615712257453, 0.7775674396592522, 0.7766209181258874, 0.7747278750591576, 0.7756743965925225, 0.7766209181258874, 0.7775674396592522, 0.7775674396592522, 0.7742546142924751, 0.7709417889256981, 0.7714150496923805, 0.7756743965925225, 0.77520113582584, 0.780407004259347, 0.7742546142924751, 0.7770941788925698, 0.7761476573592049, 0.77520113582584, 0.7747278750591576, 0.7699952673923331, 0.7704685281590156, 0.7695220066256507, 0.7709417889256981, 0.7671557027922385, 0.7690487458589683, 0.7681022243256034, 0.7709417889256981, 0.7728348319924279, 0.7709417889256981, 0.7695220066256507, 0.7742546142924751, 0.7714150496923805, 0.7709417889256981, 0.7718883104590629, 0.7690487458589683, 0.7723615712257453, 0.7718883104590629, 0.7742546142924751, 0.7718883104590629, 0.7723615712257453, 0.7733080927591103, 0.7690487458589683, 0.7709417889256981, 0.7714150496923805, 0.7685754850922859, 0.7652626597255088, 0.7695220066256507, 0.7685754850922859, 0.7681022243256034, 0.7709417889256981, 0.7690487458589683, 0.7699952673923331, 0.7685754850922859, 0.7718883104590629, 0.7666824420255561, 0.7690487458589683, 0.767628963558921, 0.7695220066256507, 0.7681022243256034, 0.7690487458589683]
plt.figure(figsize=(20,10))
sns.lineplot(x = range(1,101), y = error_list, marker = 'o', markersize = 8 , markerfacecolor = 'red')
<AxesSubplot:>
y축의 값 즉, n_neighbors 을 몇으로 할때 가장 높은 값이 나왔나? 그래프상으론 대략 어느 지점이다라는 건은 알 수 있다. 코딩으로 그 지점을 찾는 방법을 알아보자.
첫번째 방법
max(error_list)
0.780407004259347
error_list.index(max(error_list))
53
n_neighbors = 54 을 쓸때, 0.780407004259347 로 가장 높은 수치가 나왔다. 0 부터 시작 하기 때문에 53이 아닌 54 이다.
두번째 방법 : 넘파이 형태로 바꿔서 하기 (기능이 있음)
np.array(error_list) # 넘파이 형태로 바꾸기
array([0.71367724, 0.74917179, 0.74396593, 0.75579744, 0.74822527, 0.75816375, 0.74585897, 0.75911027, 0.75295788, 0.75816375, 0.7624231 , 0.76100331, 0.76005679, 0.7671557 , 0.76620918, 0.76810222, 0.76904875, 0.77283483, 0.76810222, 0.77236157, 0.76810222, 0.77141505, 0.76999527, 0.76526266, 0.76762896, 0.77141505, 0.77330809, 0.77283483, 0.77330809, 0.77141505, 0.76999527, 0.76762896, 0.77094179, 0.77046853, 0.77425461, 0.77378135, 0.77330809, 0.77520114, 0.77188831, 0.77378135, 0.77236157, 0.77756744, 0.77662092, 0.77472788, 0.7756744 , 0.77662092, 0.77756744, 0.77756744, 0.77425461, 0.77094179, 0.77141505, 0.7756744 , 0.77520114, 0.780407 , 0.77425461, 0.77709418, 0.77614766, 0.77520114, 0.77472788, 0.76999527, 0.77046853, 0.76952201, 0.77094179, 0.7671557 , 0.76904875, 0.76810222, 0.77094179, 0.77283483, 0.77094179, 0.76952201, 0.77425461, 0.77141505, 0.77094179, 0.77188831, 0.76904875, 0.77236157, 0.77188831, 0.77425461, 0.77188831, 0.77236157, 0.77330809, 0.76904875, 0.77094179, 0.77141505, 0.76857549, 0.76526266, 0.76952201, 0.76857549, 0.76810222, 0.77094179, 0.76904875, 0.76999527, 0.76857549, 0.77188831, 0.76668244, 0.76904875, 0.76762896, 0.76952201, 0.76810222, 0.76904875])
np.array(error_list).argmax()
53
두가지 방법으로 결국 n_neighbors = 54 일때 가장 정확도가 높다는 것을 알았다. 이 치수로 넣어 다시 결과를 보자.
knn = KNeighborsClassifier(n_neighbors=54)
knn.fit(X_train, y_train)
KNeighborsClassifier(n_neighbors=54)
pred = knn.predict(X_test)
accuracy_score(y_test, pred)
0.780407004259347
confusion_matrix(y_test, pred)
array([[1332, 215], [ 249, 317]])
n_neighbors을 조절해서 정확도를 조금 높일 수 있었다.
- 출처:fast campus_파이썬을 활용한 이커머스 데이터 분석
'파이썬을 활용한 이커머스 데이터 분석' 카테고리의 다른 글
Chapter.07 고객 분류 (Kmeans) (0) | 2021.06.14 |
---|---|
Chapter.06 프로모션 효율 예측 (Random Forest) (0) | 2021.06.13 |
Chapter05.구매 요인 분석(Dicision Tree) (0) | 2021.06.12 |
Chapter03. 광고 반응률 예측 (Logistic Regression) (0) | 2021.06.10 |
Chapter02. 고객별 연간 지출액 예측 (Linear Regression) (0) | 2021.06.08 |