파이썬을 활용한 이커머스 데이터분석_강의를 듣고 따라했던 코딩과 요점을 정리하였다.
- 출처: fast campus
Chapter.08 쇼핑몰 매출 예측 (Times Series)¶
분석의 목적¶
시간 관련 데이터 다양하게 조작하기
fbprophet을 통한 시계열 예측
AutoRegressive 알고리즘을 통한 예측
1. 시간 관련 데이터 다양하게 조작하기¶
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
today =datetime.now()
today
datetime.datetime(2021, 7, 8, 8, 49, 6, 223234)
today.day
8
today.year
2021
today.month
7
today
datetime.datetime(2021, 7, 8, 8, 49, 6, 223234)
timedelta 의 파라미터값을 조정 할 수 있다. 년, 월, 일, 시간 ..
today + timedelta(days = 10)
datetime.datetime(2021, 7, 18, 8, 49, 6, 223234)
간단하게 DataFrame 을 만들어 보자. (의미는 없는 값)
data = pd.DataFrame({'date':['7/6/2021', '7/7/2021', '7/8/2021'], 'value':[10,20,30]})
data
date | value | |
---|---|---|
0 | 7/6/2021 | 10 |
1 | 7/7/2021 | 20 |
2 | 7/8/2021 | 30 |
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 3 entries, 0 to 2 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 date 3 non-null object 1 value 3 non-null int64 dtypes: int64(1), object(1) memory usage: 176.0+ bytes
Dtype을 보면 date가 object로 되어있다. string(문자열)로 인식 하고 있기 때문에 작업하는데에 에로사항이 많기에 바꾸어 주도록 하자.
- pd.to_datetime() 함수를 사용한다.
pd.to_datetime(data['date'])
0 2021-07-06 1 2021-07-07 2 2021-07-08 Name: date, dtype: datetime64[ns]
dtype 가 object에서 datetime64로 변경 되었다.
data['date'] = pd.to_datetime(data['date'])
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 3 entries, 0 to 2 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 date 3 non-null datetime64[ns] 1 value 3 non-null int64 dtypes: datetime64[ns](1), int64(1) memory usage: 176.0 bytes
data['date'].loc[0]
Timestamp('2021-07-06 00:00:00')
data['date'].loc[0].month
7
data = pd.DataFrame({'date':['7-6-2021 1:11:11', '7-7-2021 2:11:11', '7-8-2021 3:11:11'], 'value':[10,20,30]})
data
date | value | |
---|---|---|
0 | 7-6-2021 1:11:11 | 10 |
1 | 7-7-2021 2:11:11 | 20 |
2 | 7-8-2021 3:11:11 | 30 |
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 3 entries, 0 to 2 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 date 3 non-null object 1 value 3 non-null int64 dtypes: int64(1), object(1) memory usage: 176.0+ bytes
pd.to_datetime(data['date'])
0 2021-07-06 01:11:11 1 2021-07-07 02:11:11 2 2021-07-08 03:11:11 Name: date, dtype: datetime64[ns]
위와 했던 방법과 동일하게 바꾸어 주었다. dtype 변경 그리고 년-월-일 순으로 똑똑하게 알아서 바꿔준다는것을 눈으로 확인 해 볼 수 있다.
data['date'] = pd.to_datetime(data['date'])
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 3 entries, 0 to 2 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 date 3 non-null datetime64[ns] 1 value 3 non-null int64 dtypes: datetime64[ns](1), int64(1) memory usage: 176.0 bytes
data['date'].loc[0]
Timestamp('2021-07-06 01:11:11')
data['date'].loc[0].day
6
data['date'].loc[0].second
11
data = pd.DataFrame({'date': ['21-7-8', '21-7-9', '21-7-10'], 'value': [10,20,30] })
data
date | value | |
---|---|---|
0 | 21-7-8 | 10 |
1 | 21-7-9 | 20 |
2 | 21-7-10 | 30 |
pd.to_datetime(data['date'])
0 2008-07-21 1 2009-07-21 2 2010-07-21 Name: date, dtype: datetime64[ns]
미국식 날짜표현법 으로 나타났다. format 파라미터를 추가하여 이것을 재수정 해보자.
pd.to_datetime(data['date'], format = '%y-%m-%d')
0 2021-07-08 1 2021-07-09 2 2021-07-10 Name: date, dtype: datetime64[ns]
이렇게 우리가 원하는 형식으로 날짜표현법을 바꿀 수 있다.
마지막으로 datetime 컬럼에 관하여 알아볼 것이 있다. 엑셀 파일을 불러와 보자.
data = pd.read_excel('./data/Superstore.xls')
data.head()
Row ID | Order ID | Order Date | Ship Date | Ship Mode | Customer ID | Customer Name | Segment | Country | City | ... | Postal Code | Region | Product ID | Category | Sub-Category | Product Name | Sales | Quantity | Discount | Profit | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | CA-2016-152156 | 2016-11-08 | 2016-11-11 | Second Class | CG-12520 | Claire Gute | Consumer | United States | Henderson | ... | 42420 | South | FUR-BO-10001798 | Furniture | Bookcases | Bush Somerset Collection Bookcase | 261.9600 | 2 | 0.00 | 41.9136 |
1 | 2 | CA-2016-152156 | 2016-11-08 | 2016-11-11 | Second Class | CG-12520 | Claire Gute | Consumer | United States | Henderson | ... | 42420 | South | FUR-CH-10000454 | Furniture | Chairs | Hon Deluxe Fabric Upholstered Stacking Chairs,... | 731.9400 | 3 | 0.00 | 219.5820 |
2 | 3 | CA-2016-138688 | 2016-06-12 | 2016-06-16 | Second Class | DV-13045 | Darrin Van Huff | Corporate | United States | Los Angeles | ... | 90036 | West | OFF-LA-10000240 | Office Supplies | Labels | Self-Adhesive Address Labels for Typewriters b... | 14.6200 | 2 | 0.00 | 6.8714 |
3 | 4 | US-2015-108966 | 2015-10-11 | 2015-10-18 | Standard Class | SO-20335 | Sean O'Donnell | Consumer | United States | Fort Lauderdale | ... | 33311 | South | FUR-TA-10000577 | Furniture | Tables | Bretford CR4500 Series Slim Rectangular Table | 957.5775 | 5 | 0.45 | -383.0310 |
4 | 5 | US-2015-108966 | 2015-10-11 | 2015-10-18 | Standard Class | SO-20335 | Sean O'Donnell | Consumer | United States | Fort Lauderdale | ... | 33311 | South | OFF-ST-10000760 | Office Supplies | Storage | Eldon Fold 'N Roll Cart System | 22.3680 | 2 | 0.20 | 2.5164 |
5 rows × 21 columns
data가 깔끔한 컬럼만 인덱싱해 보자.
data = data[['Order Date', 'Ship Date', 'Sales', 'Profit']]
data
Order Date | Ship Date | Sales | Profit | |
---|---|---|---|---|
0 | 2016-11-08 | 2016-11-11 | 261.9600 | 41.9136 |
1 | 2016-11-08 | 2016-11-11 | 731.9400 | 219.5820 |
2 | 2016-06-12 | 2016-06-16 | 14.6200 | 6.8714 |
3 | 2015-10-11 | 2015-10-18 | 957.5775 | -383.0310 |
4 | 2015-10-11 | 2015-10-18 | 22.3680 | 2.5164 |
... | ... | ... | ... | ... |
9989 | 2014-01-21 | 2014-01-23 | 25.2480 | 4.1028 |
9990 | 2017-02-26 | 2017-03-03 | 91.9600 | 15.6332 |
9991 | 2017-02-26 | 2017-03-03 | 258.5760 | 19.3932 |
9992 | 2017-02-26 | 2017-03-03 | 29.6000 | 13.3200 |
9993 | 2017-05-04 | 2017-05-09 | 243.1600 | 72.9480 |
9994 rows × 4 columns
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 9994 entries, 0 to 9993 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Order Date 9994 non-null datetime64[ns] 1 Ship Date 9994 non-null datetime64[ns] 2 Sales 9994 non-null float64 3 Profit 9994 non-null float64 dtypes: datetime64[ns](2), float64(2) memory usage: 312.4 KB
여기서 확인해야 할 부분은 Date 컬럼의 Dtype가 datetime으로 나오는지. string으로 불러오질 경우는 pd.to_datetime() 함수를 사용한다.
같은 datetime은 '-' 이 될까? Order Date 와 Ship Date 를 빼보자. (주문 날짜와 배송 날짜)
data['gap'] = data['Ship Date'] - data['Order Date']
data
Order Date | Ship Date | Sales | Profit | gap | |
---|---|---|---|---|---|
0 | 2016-11-08 | 2016-11-11 | 261.9600 | 41.9136 | 3 days |
1 | 2016-11-08 | 2016-11-11 | 731.9400 | 219.5820 | 3 days |
2 | 2016-06-12 | 2016-06-16 | 14.6200 | 6.8714 | 4 days |
3 | 2015-10-11 | 2015-10-18 | 957.5775 | -383.0310 | 7 days |
4 | 2015-10-11 | 2015-10-18 | 22.3680 | 2.5164 | 7 days |
... | ... | ... | ... | ... | ... |
9989 | 2014-01-21 | 2014-01-23 | 25.2480 | 4.1028 | 2 days |
9990 | 2017-02-26 | 2017-03-03 | 91.9600 | 15.6332 | 5 days |
9991 | 2017-02-26 | 2017-03-03 | 258.5760 | 19.3932 | 5 days |
9992 | 2017-02-26 | 2017-03-03 | 29.6000 | 13.3200 | 5 days |
9993 | 2017-05-04 | 2017-05-09 | 243.1600 | 72.9480 | 5 days |
9994 rows × 5 columns
Linear Regression을 한번 만들어 보자. (test용)
from sklearn.linear_model import LinearRegression
model = LinearRegression()
X = data[['Profit', 'gap']]
y = data['Sales']
X
Profit | gap | |
---|---|---|
0 | 41.9136 | 3 days |
1 | 219.5820 | 3 days |
2 | 6.8714 | 4 days |
3 | -383.0310 | 7 days |
4 | 2.5164 | 7 days |
... | ... | ... |
9989 | 4.1028 | 2 days |
9990 | 15.6332 | 5 days |
9991 | 19.3932 | 5 days |
9992 | 13.3200 | 5 days |
9993 | 72.9480 | 5 days |
9994 rows × 2 columns
X.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 9994 entries, 0 to 9993 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Profit 9994 non-null float64 1 gap 9994 non-null timedelta64[ns] dtypes: float64(1), timedelta64[ns](1) memory usage: 156.3 KB
gap 컬럼은 timedelta64[ns] 으로 나타나는데, 독립변수로 쓸 수 있을지 확인해 보는 과정이다.
model.fit(X, y)
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-43-6d10fe8973eb> in <module> ----> 1 model.fit(X, y) ~/.local/lib/python3.6/site-packages/sklearn/linear_model/_base.py in fit(self, X, y, sample_weight) 517 518 X, y = self._validate_data(X, y, accept_sparse=accept_sparse, --> 519 y_numeric=True, multi_output=True) 520 521 if sample_weight is not None: ~/.local/lib/python3.6/site-packages/sklearn/base.py in _validate_data(self, X, y, reset, validate_separately, **check_params) 431 y = check_array(y, **check_y_params) 432 else: --> 433 X, y = check_X_y(X, y, **check_params) 434 out = X, y 435 ~/.local/lib/python3.6/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs) 61 extra_args = len(args) - len(all_args) 62 if extra_args <= 0: ---> 63 return f(*args, **kwargs) 64 65 # extra_args > 0 ~/.local/lib/python3.6/site-packages/sklearn/utils/validation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator) 876 ensure_min_samples=ensure_min_samples, 877 ensure_min_features=ensure_min_features, --> 878 estimator=estimator) 879 if multi_output: 880 y = check_array(y, accept_sparse='csr', force_all_finite=True, ~/.local/lib/python3.6/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs) 61 extra_args = len(args) - len(all_args) 62 if extra_args <= 0: ---> 63 return f(*args, **kwargs) 64 65 # extra_args > 0 ~/.local/lib/python3.6/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator) 595 596 if all(isinstance(dtype, np.dtype) for dtype in dtypes_orig): --> 597 dtype_orig = np.result_type(*dtypes_orig) 598 599 if dtype_numeric: <__array_function__ internals> in result_type(*args, **kwargs) TypeError: invalid type promotion
gap 컬럼은 timedelta64[ns] 으로 나타나는데, 독립변수로 쓸 수 없다. 에러가 나타났다. 이를 해결해 보자.
- 첫번째 방법 : .apply(lambda) 사용
X['gap'] = X['gap'].apply(lambda x: x.days)
/home/ubuntu/.local/lib/python3.6/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy """Entry point for launching an IPython kernel.
X
Profit | gap | |
---|---|---|
0 | 41.9136 | 3 |
1 | 219.5820 | 3 |
2 | 6.8714 | 4 |
3 | -383.0310 | 7 |
4 | 2.5164 | 7 |
... | ... | ... |
9989 | 4.1028 | 2 |
9990 | 15.6332 | 5 |
9991 | 19.3932 | 5 |
9992 | 13.3200 | 5 |
9993 | 72.9480 | 5 |
9994 rows × 2 columns
model.fit(X, y)
LinearRegression()
model.predict(X)
array([248.5051552 , 474.93985046, 202.01637137, ..., 216.14692021, 208.40675263, 284.40140338])
- 두번째 방법 : datetime 형태를 숫자형태로 바꿔주는 방법 -> 모든 시간을 '초' 로 나타내는 방법 / Unix Time stamp
data = pd.read_excel('./data/Superstore.xls')
data.head()
Row ID | Order ID | Order Date | Ship Date | Ship Mode | Customer ID | Customer Name | Segment | Country | City | ... | Postal Code | Region | Product ID | Category | Sub-Category | Product Name | Sales | Quantity | Discount | Profit | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | CA-2016-152156 | 2016-11-08 | 2016-11-11 | Second Class | CG-12520 | Claire Gute | Consumer | United States | Henderson | ... | 42420 | South | FUR-BO-10001798 | Furniture | Bookcases | Bush Somerset Collection Bookcase | 261.9600 | 2 | 0.00 | 41.9136 |
1 | 2 | CA-2016-152156 | 2016-11-08 | 2016-11-11 | Second Class | CG-12520 | Claire Gute | Consumer | United States | Henderson | ... | 42420 | South | FUR-CH-10000454 | Furniture | Chairs | Hon Deluxe Fabric Upholstered Stacking Chairs,... | 731.9400 | 3 | 0.00 | 219.5820 |
2 | 3 | CA-2016-138688 | 2016-06-12 | 2016-06-16 | Second Class | DV-13045 | Darrin Van Huff | Corporate | United States | Los Angeles | ... | 90036 | West | OFF-LA-10000240 | Office Supplies | Labels | Self-Adhesive Address Labels for Typewriters b... | 14.6200 | 2 | 0.00 | 6.8714 |
3 | 4 | US-2015-108966 | 2015-10-11 | 2015-10-18 | Standard Class | SO-20335 | Sean O'Donnell | Consumer | United States | Fort Lauderdale | ... | 33311 | South | FUR-TA-10000577 | Furniture | Tables | Bretford CR4500 Series Slim Rectangular Table | 957.5775 | 5 | 0.45 | -383.0310 |
4 | 5 | US-2015-108966 | 2015-10-11 | 2015-10-18 | Standard Class | SO-20335 | Sean O'Donnell | Consumer | United States | Fort Lauderdale | ... | 33311 | South | OFF-ST-10000760 | Office Supplies | Storage | Eldon Fold 'N Roll Cart System | 22.3680 | 2 | 0.20 | 2.5164 |
5 rows × 21 columns
data가 깔끔한 컬럼만 인덱싱해 보자.
data = data[['Order Date', 'Ship Date', 'Sales', 'Profit']]
data
Order Date | Ship Date | Sales | Profit | |
---|---|---|---|---|
0 | 2016-11-08 | 2016-11-11 | 261.9600 | 41.9136 |
1 | 2016-11-08 | 2016-11-11 | 731.9400 | 219.5820 |
2 | 2016-06-12 | 2016-06-16 | 14.6200 | 6.8714 |
3 | 2015-10-11 | 2015-10-18 | 957.5775 | -383.0310 |
4 | 2015-10-11 | 2015-10-18 | 22.3680 | 2.5164 |
... | ... | ... | ... | ... |
9989 | 2014-01-21 | 2014-01-23 | 25.2480 | 4.1028 |
9990 | 2017-02-26 | 2017-03-03 | 91.9600 | 15.6332 |
9991 | 2017-02-26 | 2017-03-03 | 258.5760 | 19.3932 |
9992 | 2017-02-26 | 2017-03-03 | 29.6000 | 13.3200 |
9993 | 2017-05-04 | 2017-05-09 | 243.1600 | 72.9480 |
9994 rows × 4 columns
pd.to_numeric(data['Order Date'])
0 1478563200000000000 1 1478563200000000000 2 1465689600000000000 3 1444521600000000000 4 1444521600000000000 ... 9989 1390262400000000000 9990 1488067200000000000 9991 1488067200000000000 9992 1488067200000000000 9993 1493856000000000000 Name: Order Date, Length: 9994, dtype: int64
0이 너무 많다. 제거해주자. 불필요하기때문에
pd.to_numeric(data['Order Date']) / 10**9
0 1.478563e+09 1 1.478563e+09 2 1.465690e+09 3 1.444522e+09 4 1.444522e+09 ... 9989 1.390262e+09 9990 1.488067e+09 9991 1.488067e+09 9992 1.488067e+09 9993 1.493856e+09 Name: Order Date, Length: 9994, dtype: float64
pd.set_option('display.float_format', lambda x: '%.0f' % x)
pd.to_numeric(data['Order Date']) / 10**9
0 1478563200 1 1478563200 2 1465689600 3 1444521600 4 1444521600 ... 9989 1390262400 9990 1488067200 9991 1488067200 9992 1488067200 9993 1493856000 Name: Order Date, Length: 9994, dtype: float64
data['Order Date'] = pd.to_numeric(data['Order Date']) / 10**9
data['Ship Date'] = pd.to_numeric(data['Ship Date']) / 10**9
data
Order Date | Ship Date | Sales | Profit | |
---|---|---|---|---|
0 | 1478563200 | 1478822400 | 262 | 42 |
1 | 1478563200 | 1478822400 | 732 | 220 |
2 | 1465689600 | 1466035200 | 15 | 7 |
3 | 1444521600 | 1445126400 | 958 | -383 |
4 | 1444521600 | 1445126400 | 22 | 3 |
... | ... | ... | ... | ... |
9989 | 1390262400 | 1390435200 | 25 | 4 |
9990 | 1488067200 | 1488499200 | 92 | 16 |
9991 | 1488067200 | 1488499200 | 259 | 19 |
9992 | 1488067200 | 1488499200 | 30 | 13 |
9993 | 1493856000 | 1494288000 | 243 | 73 |
9994 rows × 4 columns
data['gap'] = data['Ship Date'] - data['Order Date']
data
Order Date | Ship Date | Sales | Profit | gap | |
---|---|---|---|---|---|
0 | 1478563200 | 1478822400 | 262 | 42 | 259200 |
1 | 1478563200 | 1478822400 | 732 | 220 | 259200 |
2 | 1465689600 | 1466035200 | 15 | 7 | 345600 |
3 | 1444521600 | 1445126400 | 958 | -383 | 604800 |
4 | 1444521600 | 1445126400 | 22 | 3 | 604800 |
... | ... | ... | ... | ... | ... |
9989 | 1390262400 | 1390435200 | 25 | 4 | 172800 |
9990 | 1488067200 | 1488499200 | 92 | 16 | 432000 |
9991 | 1488067200 | 1488499200 | 259 | 19 | 432000 |
9992 | 1488067200 | 1488499200 | 30 | 13 | 432000 |
9993 | 1493856000 | 1494288000 | 243 | 73 | 432000 |
9994 rows × 5 columns
X = data[['Profit', 'gap']]
y = data['Sales']
model = LinearRegression()
model.fit(X,y)
LinearRegression()
에러가 나타나지 않고 잘 돌아가는 것을 확인 할 수 있다. 여기까지 1.시간데이터 다양하게 조작하기 를 알아보았다.
2. fbprophet을 통한 시계열 예측¶
!pip install pystan
Collecting pystan Downloading https://files.pythonhosted.org/packages/f3/67/983a2de85fe82b7c2813fe01801edca19c6f193db8cc5903e684f77ce870/pystan-2.19.1.1-cp27-cp27mu-manylinux1_x86_64.whl (67.2MB) 100% |████████████████████████████████| 67.2MB 17kB/s eta 0:00:01 Collecting numpy>=1.7 (from pystan) Downloading https://files.pythonhosted.org/packages/3a/5f/47e578b3ae79e2624e205445ab77a1848acdaa2929a00eeef6b16eaaeb20/numpy-1.16.6-cp27-cp27mu-manylinux1_x86_64.whl (17.0MB) 100% |████████████████████████████████| 17.0MB 70kB/s eta 0:00:01 Collecting Cython!=0.25.1,>=0.22 (from pystan) Downloading https://files.pythonhosted.org/packages/ae/85/9a00bfcd16d481676f2c78550c3f5352197c0436ae6fb9af445242be0edc/Cython-0.29.23-cp27-cp27mu-manylinux1_x86_64.whl (1.9MB) 100% |████████████████████████████████| 1.9MB 648kB/s eta 0:00:01 Installing collected packages: numpy, Cython, pystan Successfully installed Cython-0.29.23 numpy-1.16.6 pystan-2.19.1.1
!pip install plotly
Collecting plotly Using cached https://files.pythonhosted.org/packages/1f/f6/bd3c17c8003b6641df1228e80e1acac97ed8402635e46c2571f8e1ef63af/plotly-4.14.3-py2.py3-none-any.whl Collecting six (from plotly) Using cached https://files.pythonhosted.org/packages/d9/5a/e7c31adbe875f2abbb91bd84cf2dc52d792b5a01506781dbcf25c91daf11/six-1.16.0-py2.py3-none-any.whl Collecting retrying>=1.3.3 (from plotly) Using cached https://files.pythonhosted.org/packages/44/ef/beae4b4ef80902f22e3af073397f079c96969c69b2c7d52a57ea9ae61c9d/retrying-1.3.3.tar.gz Building wheels for collected packages: retrying Running setup.py bdist_wheel for retrying ... done Stored in directory: /home/ubuntu/.cache/pip/wheels/d7/a9/33/acc7b709e2a35caa7d4cae442f6fe6fbf2c43f80823d46460c Successfully built retrying Installing collected packages: six, retrying, plotly Successfully installed plotly-4.14.3 retrying-1.3.3 six-1.16.0
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
data = pd.read_excel('./data/Superstore.xls', index_col=0)
data.head()
Order ID | Order Date | Ship Date | Ship Mode | Customer ID | Customer Name | Segment | Country | City | State | Postal Code | Region | Product ID | Category | Sub-Category | Product Name | Sales | Quantity | Discount | Profit | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Row ID | ||||||||||||||||||||
1 | CA-2016-152156 | 2016-11-08 | 2016-11-11 | Second Class | CG-12520 | Claire Gute | Consumer | United States | Henderson | Kentucky | 42420 | South | FUR-BO-10001798 | Furniture | Bookcases | Bush Somerset Collection Bookcase | 261.9600 | 2 | 0.00 | 41.9136 |
2 | CA-2016-152156 | 2016-11-08 | 2016-11-11 | Second Class | CG-12520 | Claire Gute | Consumer | United States | Henderson | Kentucky | 42420 | South | FUR-CH-10000454 | Furniture | Chairs | Hon Deluxe Fabric Upholstered Stacking Chairs,... | 731.9400 | 3 | 0.00 | 219.5820 |
3 | CA-2016-138688 | 2016-06-12 | 2016-06-16 | Second Class | DV-13045 | Darrin Van Huff | Corporate | United States | Los Angeles | California | 90036 | West | OFF-LA-10000240 | Office Supplies | Labels | Self-Adhesive Address Labels for Typewriters b... | 14.6200 | 2 | 0.00 | 6.8714 |
4 | US-2015-108966 | 2015-10-11 | 2015-10-18 | Standard Class | SO-20335 | Sean O'Donnell | Consumer | United States | Fort Lauderdale | Florida | 33311 | South | FUR-TA-10000577 | Furniture | Tables | Bretford CR4500 Series Slim Rectangular Table | 957.5775 | 5 | 0.45 | -383.0310 |
5 | US-2015-108966 | 2015-10-11 | 2015-10-18 | Standard Class | SO-20335 | Sean O'Donnell | Consumer | United States | Fort Lauderdale | Florida | 33311 | South | OFF-ST-10000760 | Office Supplies | Storage | Eldon Fold 'N Roll Cart System | 22.3680 | 2 | 0.20 | 2.5164 |
Times Series의 경우 컬럼을 많이 살펴 볼 필요는 없다. 우리가 보고자하는 컬럼들만 보자.
data.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 9994 entries, 1 to 9994 Data columns (total 20 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Order ID 9994 non-null object 1 Order Date 9994 non-null datetime64[ns] 2 Ship Date 9994 non-null datetime64[ns] 3 Ship Mode 9994 non-null object 4 Customer ID 9994 non-null object 5 Customer Name 9994 non-null object 6 Segment 9994 non-null object 7 Country 9994 non-null object 8 City 9994 non-null object 9 State 9994 non-null object 10 Postal Code 9994 non-null int64 11 Region 9994 non-null object 12 Product ID 9994 non-null object 13 Category 9994 non-null object 14 Sub-Category 9994 non-null object 15 Product Name 9994 non-null object 16 Sales 9994 non-null float64 17 Quantity 9994 non-null int64 18 Discount 9994 non-null float64 19 Profit 9994 non-null float64 dtypes: datetime64[ns](2), float64(3), int64(2), object(13) memory usage: 1.6+ MB
결측치는 없고, Dtype를 확인한다.
data.describe()
Postal Code | Sales | Quantity | Discount | Profit | |
---|---|---|---|---|---|
count | 9994.000000 | 9994.000000 | 9994.000000 | 9994.000000 | 9994.000000 |
mean | 55190.379428 | 229.858001 | 3.789574 | 0.156203 | 28.656896 |
std | 32063.693350 | 623.245101 | 2.225110 | 0.206452 | 234.260108 |
min | 1040.000000 | 0.444000 | 1.000000 | 0.000000 | -6599.978000 |
25% | 23223.000000 | 17.280000 | 2.000000 | 0.000000 | 1.728750 |
50% | 56430.500000 | 54.490000 | 3.000000 | 0.200000 | 8.666500 |
75% | 90008.000000 | 209.940000 | 5.000000 | 0.200000 | 29.364000 |
max | 99301.000000 | 22638.480000 | 14.000000 | 0.800000 | 8399.976000 |
Sales에서 max 값은 아웃라이너 같다고 예상해볼 수 있다. 너무 값이 크기때문에.
data['Category'].unique()
array(['Furniture', 'Office Supplies', 'Technology'], dtype=object)
.unique() 함수를 사용하여 Category의 종류를 확인해 볼 수 있다.
data['Category'].value_counts()
Office Supplies 6026 Furniture 2121 Technology 1847 Name: Category, dtype: int64
.value_counts() 함수를 사용하여 Category의 종류와 수량을 확인 할 수 있다.
data['Category'] == 'Furniture' ## 불리안 True or False 확인
Row ID 1 True 2 True 3 False 4 True 5 False ... 9990 True 9991 True 9992 False 9993 False 9994 False Name: Category, Length: 9994, dtype: bool
data.head() # 위 코드와 대조해서 확인 해본다.
Order ID | Order Date | Ship Date | Ship Mode | Customer ID | Customer Name | Segment | Country | City | State | Postal Code | Region | Product ID | Category | Sub-Category | Product Name | Sales | Quantity | Discount | Profit | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Row ID | ||||||||||||||||||||
1 | CA-2016-152156 | 2016-11-08 | 2016-11-11 | Second Class | CG-12520 | Claire Gute | Consumer | United States | Henderson | Kentucky | 42420 | South | FUR-BO-10001798 | Furniture | Bookcases | Bush Somerset Collection Bookcase | 261.9600 | 2 | 0.00 | 41.9136 |
2 | CA-2016-152156 | 2016-11-08 | 2016-11-11 | Second Class | CG-12520 | Claire Gute | Consumer | United States | Henderson | Kentucky | 42420 | South | FUR-CH-10000454 | Furniture | Chairs | Hon Deluxe Fabric Upholstered Stacking Chairs,... | 731.9400 | 3 | 0.00 | 219.5820 |
3 | CA-2016-138688 | 2016-06-12 | 2016-06-16 | Second Class | DV-13045 | Darrin Van Huff | Corporate | United States | Los Angeles | California | 90036 | West | OFF-LA-10000240 | Office Supplies | Labels | Self-Adhesive Address Labels for Typewriters b... | 14.6200 | 2 | 0.00 | 6.8714 |
4 | US-2015-108966 | 2015-10-11 | 2015-10-18 | Standard Class | SO-20335 | Sean O'Donnell | Consumer | United States | Fort Lauderdale | Florida | 33311 | South | FUR-TA-10000577 | Furniture | Tables | Bretford CR4500 Series Slim Rectangular Table | 957.5775 | 5 | 0.45 | -383.0310 |
5 | US-2015-108966 | 2015-10-11 | 2015-10-18 | Standard Class | SO-20335 | Sean O'Donnell | Consumer | United States | Fort Lauderdale | Florida | 33311 | South | OFF-ST-10000760 | Office Supplies | Storage | Eldon Fold 'N Roll Cart System | 22.3680 | 2 | 0.20 | 2.5164 |
data[data['Category'] == 'Furniture' ] # data['Category'] == 'Furniture' 을 data[]에 넣어 주면 True만 불러와주는 기능을 한다.
Order ID | Order Date | Ship Date | Ship Mode | Customer ID | Customer Name | Segment | Country | City | State | Postal Code | Region | Product ID | Category | Sub-Category | Product Name | Sales | Quantity | Discount | Profit | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Row ID | ||||||||||||||||||||
1 | CA-2016-152156 | 2016-11-08 | 2016-11-11 | Second Class | CG-12520 | Claire Gute | Consumer | United States | Henderson | Kentucky | 42420 | South | FUR-BO-10001798 | Furniture | Bookcases | Bush Somerset Collection Bookcase | 261.9600 | 2 | 0.00 | 41.9136 |
2 | CA-2016-152156 | 2016-11-08 | 2016-11-11 | Second Class | CG-12520 | Claire Gute | Consumer | United States | Henderson | Kentucky | 42420 | South | FUR-CH-10000454 | Furniture | Chairs | Hon Deluxe Fabric Upholstered Stacking Chairs,... | 731.9400 | 3 | 0.00 | 219.5820 |
4 | US-2015-108966 | 2015-10-11 | 2015-10-18 | Standard Class | SO-20335 | Sean O'Donnell | Consumer | United States | Fort Lauderdale | Florida | 33311 | South | FUR-TA-10000577 | Furniture | Tables | Bretford CR4500 Series Slim Rectangular Table | 957.5775 | 5 | 0.45 | -383.0310 |
6 | CA-2014-115812 | 2014-06-09 | 2014-06-14 | Standard Class | BH-11710 | Brosina Hoffman | Consumer | United States | Los Angeles | California | 90032 | West | FUR-FU-10001487 | Furniture | Furnishings | Eldon Expressions Wood and Plastic Desk Access... | 48.8600 | 7 | 0.00 | 14.1694 |
11 | CA-2014-115812 | 2014-06-09 | 2014-06-14 | Standard Class | BH-11710 | Brosina Hoffman | Consumer | United States | Los Angeles | California | 90032 | West | FUR-TA-10001539 | Furniture | Tables | Chromcraft Rectangular Conference Tables | 1706.1840 | 9 | 0.20 | 85.3092 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
9963 | CA-2015-168088 | 2015-03-19 | 2015-03-22 | First Class | CM-12655 | Corinna Mitchell | Home Office | United States | Houston | Texas | 77041 | Central | FUR-BO-10004218 | Furniture | Bookcases | Bush Heritage Pine Collection 5-Shelf Bookcase... | 383.4656 | 4 | 0.32 | -67.6704 |
9965 | CA-2016-146374 | 2016-12-05 | 2016-12-10 | Second Class | HE-14800 | Harold Engle | Corporate | United States | Newark | Delaware | 19711 | East | FUR-FU-10002671 | Furniture | Furnishings | Electrix 20W Halogen Replacement Bulb for Zoom... | 13.4000 | 1 | 0.00 | 6.4320 |
9981 | US-2015-151435 | 2015-09-06 | 2015-09-09 | Second Class | SW-20455 | Shaun Weien | Consumer | United States | Lafayette | Louisiana | 70506 | South | FUR-TA-10001039 | Furniture | Tables | KI Adjustable-Height Table | 85.9800 | 1 | 0.00 | 22.3548 |
9990 | CA-2014-110422 | 2014-01-21 | 2014-01-23 | Second Class | TB-21400 | Tom Boeckenhauer | Consumer | United States | Miami | Florida | 33180 | South | FUR-FU-10001889 | Furniture | Furnishings | Ultra Door Pull Handle | 25.2480 | 3 | 0.20 | 4.1028 |
9991 | CA-2017-121258 | 2017-02-26 | 2017-03-03 | Standard Class | DB-13060 | Dave Brooks | Consumer | United States | Costa Mesa | California | 92627 | West | FUR-FU-10000747 | Furniture | Furnishings | Tenex B1-RE Series Chair Mats for Low Pile Car... | 91.9600 | 2 | 0.00 | 15.6332 |
2121 rows × 20 columns
data = data[data['Category'] == 'Furniture'] # data에 할당
data.head()
Order ID | Order Date | Ship Date | Ship Mode | Customer ID | Customer Name | Segment | Country | City | State | Postal Code | Region | Product ID | Category | Sub-Category | Product Name | Sales | Quantity | Discount | Profit | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Row ID | ||||||||||||||||||||
1 | CA-2016-152156 | 2016-11-08 | 2016-11-11 | Second Class | CG-12520 | Claire Gute | Consumer | United States | Henderson | Kentucky | 42420 | South | FUR-BO-10001798 | Furniture | Bookcases | Bush Somerset Collection Bookcase | 261.9600 | 2 | 0.00 | 41.9136 |
2 | CA-2016-152156 | 2016-11-08 | 2016-11-11 | Second Class | CG-12520 | Claire Gute | Consumer | United States | Henderson | Kentucky | 42420 | South | FUR-CH-10000454 | Furniture | Chairs | Hon Deluxe Fabric Upholstered Stacking Chairs,... | 731.9400 | 3 | 0.00 | 219.5820 |
4 | US-2015-108966 | 2015-10-11 | 2015-10-18 | Standard Class | SO-20335 | Sean O'Donnell | Consumer | United States | Fort Lauderdale | Florida | 33311 | South | FUR-TA-10000577 | Furniture | Tables | Bretford CR4500 Series Slim Rectangular Table | 957.5775 | 5 | 0.45 | -383.0310 |
6 | CA-2014-115812 | 2014-06-09 | 2014-06-14 | Standard Class | BH-11710 | Brosina Hoffman | Consumer | United States | Los Angeles | California | 90032 | West | FUR-FU-10001487 | Furniture | Furnishings | Eldon Expressions Wood and Plastic Desk Access... | 48.8600 | 7 | 0.00 | 14.1694 |
11 | CA-2014-115812 | 2014-06-09 | 2014-06-14 | Standard Class | BH-11710 | Brosina Hoffman | Consumer | United States | Los Angeles | California | 90032 | West | FUR-TA-10001539 | Furniture | Tables | Chromcraft Rectangular Conference Tables | 1706.1840 | 9 | 0.20 | 85.3092 |
data = data[['Order Date', 'Sales']]
data
Order Date | Sales | |
---|---|---|
Row ID | ||
1 | 2016-11-08 | 261.9600 |
2 | 2016-11-08 | 731.9400 |
4 | 2015-10-11 | 957.5775 |
6 | 2014-06-09 | 48.8600 |
11 | 2014-06-09 | 1706.1840 |
... | ... | ... |
9963 | 2015-03-19 | 383.4656 |
9965 | 2016-12-05 | 13.4000 |
9981 | 2015-09-06 | 85.9800 |
9990 | 2014-01-21 | 25.2480 |
9991 | 2017-02-26 | 91.9600 |
2121 rows × 2 columns
data.sort_values('Order Date') #Order Date 칼럼 기준으로 오름차순으로 불러오는 함수를 사용
Order Date | Sales | |
---|---|---|
Row ID | ||
7475 | 2014-01-06 | 2573.820 |
7661 | 2014-01-07 | 76.728 |
867 | 2014-01-10 | 51.940 |
717 | 2014-01-11 | 9.940 |
2979 | 2014-01-13 | 545.940 |
... | ... | ... |
7637 | 2017-12-29 | 258.750 |
7633 | 2017-12-29 | 1207.840 |
5132 | 2017-12-29 | 101.120 |
7636 | 2017-12-29 | 300.980 |
907 | 2017-12-30 | 323.136 |
2121 rows × 2 columns
우리가 알고자하는 것은 시간에 따른 Sales의 트렌드 임으로 다시 정리를 해준다.
data = data.groupby('Order Date').sum() # 각 날짜별로 총합을 구한다.
data
Sales | |
---|---|
Order Date | |
2014-01-06 | 2573.8200 |
2014-01-07 | 76.7280 |
2014-01-10 | 51.9400 |
2014-01-11 | 9.9400 |
2014-01-13 | 879.9390 |
... | ... |
2017-12-24 | 1393.4940 |
2017-12-25 | 832.4540 |
2017-12-28 | 551.2568 |
2017-12-29 | 2330.7180 |
2017-12-30 | 323.1360 |
889 rows × 1 columns
data.index
DatetimeIndex(['2014-01-06', '2014-01-07', '2014-01-10', '2014-01-11', '2014-01-13', '2014-01-14', '2014-01-16', '2014-01-19', '2014-01-20', '2014-01-21', ... '2017-12-18', '2017-12-19', '2017-12-21', '2017-12-22', '2017-12-23', '2017-12-24', '2017-12-25', '2017-12-28', '2017-12-29', '2017-12-30'], dtype='datetime64[ns]', name='Order Date', length=889, freq=None)
plot을 그려보자.
plt.figure(figsize = (20,10))
sns.lineplot(x = data.index, y = data['Sales'])
<AxesSubplot:xlabel='Order Date', ylabel='Sales'>
위와 같이 컬럼이 한가지만 존재할 경우 그리고, x축에 들어갈 값이 (날짜 등과같은) 존재할 경우 아래와 같은 .plot로 더 간단하게 그릴 수 있다. (Seaborn 보다 plt가 편하다.)
data.plot(figsize = (20,10))
<AxesSubplot:xlabel='Order Date'>
추세를 예측하기 어려운 그림이다. 월 단위로 묶어서 다시 보자. .resample() 함수롤 사용한다.
data.resample('MS').mean() # MS는 월별, 그리고 평균값 mean()
Sales | |
---|---|
Order Date | |
2014-01-01 | 480.194231 |
2014-02-01 | 367.931600 |
2014-03-01 | 857.291529 |
2014-04-01 | 567.488357 |
2014-05-01 | 432.049187 |
2014-06-01 | 695.059242 |
2014-07-01 | 601.169500 |
2014-08-01 | 457.521656 |
2014-09-01 | 992.353367 |
2014-10-01 | 769.015437 |
2014-11-01 | 980.221486 |
2014-12-01 | 1532.298325 |
2015-01-01 | 978.328467 |
2015-02-01 | 522.395667 |
2015-03-01 | 781.236437 |
2015-04-01 | 805.822962 |
2015-05-01 | 624.996700 |
2015-06-01 | 428.565500 |
2015-07-01 | 719.706316 |
2015-08-01 | 602.412012 |
2015-09-01 | 1382.790684 |
2015-10-01 | 632.980184 |
2015-11-01 | 1286.701354 |
2015-12-01 | 1049.355418 |
2016-01-01 | 508.182867 |
2016-02-01 | 356.868273 |
2016-03-01 | 609.575810 |
2016-04-01 | 695.373158 |
2016-05-01 | 687.265227 |
2016-06-01 | 816.910750 |
2016-07-01 | 768.736412 |
2016-08-01 | 734.307782 |
2016-09-01 | 1135.953371 |
2016-10-01 | 624.872474 |
2016-11-01 | 1271.345152 |
2016-12-01 | 1410.719808 |
2017-01-01 | 397.602133 |
2017-02-01 | 528.179800 |
2017-03-01 | 544.672240 |
2017-04-01 | 453.297905 |
2017-05-01 | 678.302328 |
2017-06-01 | 826.460291 |
2017-07-01 | 562.524857 |
2017-08-01 | 857.881889 |
2017-09-01 | 1209.508583 |
2017-10-01 | 875.362728 |
2017-11-01 | 1277.817759 |
2017-12-01 | 1256.298672 |
data = data.resample('MS').mean() # 최종적으로 사용할 data 완성 (할당해주자)
data.plot(figsize = (20,10))
<AxesSubplot:xlabel='Order Date'>
from fbprophet import Prophet
data
Sales | |
---|---|
Order Date | |
2014-01-01 | 480.194231 |
2014-02-01 | 367.931600 |
2014-03-01 | 857.291529 |
2014-04-01 | 567.488357 |
2014-05-01 | 432.049187 |
2014-06-01 | 695.059242 |
2014-07-01 | 601.169500 |
2014-08-01 | 457.521656 |
2014-09-01 | 992.353367 |
2014-10-01 | 769.015437 |
2014-11-01 | 980.221486 |
2014-12-01 | 1532.298325 |
2015-01-01 | 978.328467 |
2015-02-01 | 522.395667 |
2015-03-01 | 781.236437 |
2015-04-01 | 805.822962 |
2015-05-01 | 624.996700 |
2015-06-01 | 428.565500 |
2015-07-01 | 719.706316 |
2015-08-01 | 602.412012 |
2015-09-01 | 1382.790684 |
2015-10-01 | 632.980184 |
2015-11-01 | 1286.701354 |
2015-12-01 | 1049.355418 |
2016-01-01 | 508.182867 |
2016-02-01 | 356.868273 |
2016-03-01 | 609.575810 |
2016-04-01 | 695.373158 |
2016-05-01 | 687.265227 |
2016-06-01 | 816.910750 |
2016-07-01 | 768.736412 |
2016-08-01 | 734.307782 |
2016-09-01 | 1135.953371 |
2016-10-01 | 624.872474 |
2016-11-01 | 1271.345152 |
2016-12-01 | 1410.719808 |
2017-01-01 | 397.602133 |
2017-02-01 | 528.179800 |
2017-03-01 | 544.672240 |
2017-04-01 | 453.297905 |
2017-05-01 | 678.302328 |
2017-06-01 | 826.460291 |
2017-07-01 | 562.524857 |
2017-08-01 | 857.881889 |
2017-09-01 | 1209.508583 |
2017-10-01 | 875.362728 |
2017-11-01 | 1277.817759 |
2017-12-01 | 1256.298672 |
fbprophet을 사용하기 위해서는 몇몇 조건이 필요함으로 변경해야 한다. 이것을 알아보자. 컬럼이 ds 와 y 로 나타내야함.
df = data.reset_index()
df
Order Date | Sales | |
---|---|---|
0 | 2014-01-01 | 480.194231 |
1 | 2014-02-01 | 367.931600 |
2 | 2014-03-01 | 857.291529 |
3 | 2014-04-01 | 567.488357 |
4 | 2014-05-01 | 432.049187 |
5 | 2014-06-01 | 695.059242 |
6 | 2014-07-01 | 601.169500 |
7 | 2014-08-01 | 457.521656 |
8 | 2014-09-01 | 992.353367 |
9 | 2014-10-01 | 769.015437 |
10 | 2014-11-01 | 980.221486 |
11 | 2014-12-01 | 1532.298325 |
12 | 2015-01-01 | 978.328467 |
13 | 2015-02-01 | 522.395667 |
14 | 2015-03-01 | 781.236437 |
15 | 2015-04-01 | 805.822962 |
16 | 2015-05-01 | 624.996700 |
17 | 2015-06-01 | 428.565500 |
18 | 2015-07-01 | 719.706316 |
19 | 2015-08-01 | 602.412012 |
20 | 2015-09-01 | 1382.790684 |
21 | 2015-10-01 | 632.980184 |
22 | 2015-11-01 | 1286.701354 |
23 | 2015-12-01 | 1049.355418 |
24 | 2016-01-01 | 508.182867 |
25 | 2016-02-01 | 356.868273 |
26 | 2016-03-01 | 609.575810 |
27 | 2016-04-01 | 695.373158 |
28 | 2016-05-01 | 687.265227 |
29 | 2016-06-01 | 816.910750 |
30 | 2016-07-01 | 768.736412 |
31 | 2016-08-01 | 734.307782 |
32 | 2016-09-01 | 1135.953371 |
33 | 2016-10-01 | 624.872474 |
34 | 2016-11-01 | 1271.345152 |
35 | 2016-12-01 | 1410.719808 |
36 | 2017-01-01 | 397.602133 |
37 | 2017-02-01 | 528.179800 |
38 | 2017-03-01 | 544.672240 |
39 | 2017-04-01 | 453.297905 |
40 | 2017-05-01 | 678.302328 |
41 | 2017-06-01 | 826.460291 |
42 | 2017-07-01 | 562.524857 |
43 | 2017-08-01 | 857.881889 |
44 | 2017-09-01 | 1209.508583 |
45 | 2017-10-01 | 875.362728 |
46 | 2017-11-01 | 1277.817759 |
47 | 2017-12-01 | 1256.298672 |
df.columns = ['ds','y']
df
ds | y | |
---|---|---|
0 | 2014-01-01 | 480.194231 |
1 | 2014-02-01 | 367.931600 |
2 | 2014-03-01 | 857.291529 |
3 | 2014-04-01 | 567.488357 |
4 | 2014-05-01 | 432.049187 |
5 | 2014-06-01 | 695.059242 |
6 | 2014-07-01 | 601.169500 |
7 | 2014-08-01 | 457.521656 |
8 | 2014-09-01 | 992.353367 |
9 | 2014-10-01 | 769.015437 |
10 | 2014-11-01 | 980.221486 |
11 | 2014-12-01 | 1532.298325 |
12 | 2015-01-01 | 978.328467 |
13 | 2015-02-01 | 522.395667 |
14 | 2015-03-01 | 781.236437 |
15 | 2015-04-01 | 805.822962 |
16 | 2015-05-01 | 624.996700 |
17 | 2015-06-01 | 428.565500 |
18 | 2015-07-01 | 719.706316 |
19 | 2015-08-01 | 602.412012 |
20 | 2015-09-01 | 1382.790684 |
21 | 2015-10-01 | 632.980184 |
22 | 2015-11-01 | 1286.701354 |
23 | 2015-12-01 | 1049.355418 |
24 | 2016-01-01 | 508.182867 |
25 | 2016-02-01 | 356.868273 |
26 | 2016-03-01 | 609.575810 |
27 | 2016-04-01 | 695.373158 |
28 | 2016-05-01 | 687.265227 |
29 | 2016-06-01 | 816.910750 |
30 | 2016-07-01 | 768.736412 |
31 | 2016-08-01 | 734.307782 |
32 | 2016-09-01 | 1135.953371 |
33 | 2016-10-01 | 624.872474 |
34 | 2016-11-01 | 1271.345152 |
35 | 2016-12-01 | 1410.719808 |
36 | 2017-01-01 | 397.602133 |
37 | 2017-02-01 | 528.179800 |
38 | 2017-03-01 | 544.672240 |
39 | 2017-04-01 | 453.297905 |
40 | 2017-05-01 | 678.302328 |
41 | 2017-06-01 | 826.460291 |
42 | 2017-07-01 | 562.524857 |
43 | 2017-08-01 | 857.881889 |
44 | 2017-09-01 | 1209.508583 |
45 | 2017-10-01 | 875.362728 |
46 | 2017-11-01 | 1277.817759 |
47 | 2017-12-01 | 1256.298672 |
model = Prophet()
model.fit(df)
INFO:fbprophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
<fbprophet.forecaster.Prophet at 0x7f7983be3f98>
예측하고자하는 모델 범위 지정 .make_future_dataframe() 함수 사용
future = model.make_future_dataframe(periods = 12, freq = 'MS') # 12개월 / 월 단위 리셋을 나타내는 파라미터 (periods = 12, freq = 'MS')
future
ds | |
---|---|
0 | 2014-01-01 |
1 | 2014-02-01 |
2 | 2014-03-01 |
3 | 2014-04-01 |
4 | 2014-05-01 |
5 | 2014-06-01 |
6 | 2014-07-01 |
7 | 2014-08-01 |
8 | 2014-09-01 |
9 | 2014-10-01 |
10 | 2014-11-01 |
11 | 2014-12-01 |
12 | 2015-01-01 |
13 | 2015-02-01 |
14 | 2015-03-01 |
15 | 2015-04-01 |
16 | 2015-05-01 |
17 | 2015-06-01 |
18 | 2015-07-01 |
19 | 2015-08-01 |
20 | 2015-09-01 |
21 | 2015-10-01 |
22 | 2015-11-01 |
23 | 2015-12-01 |
24 | 2016-01-01 |
25 | 2016-02-01 |
26 | 2016-03-01 |
27 | 2016-04-01 |
28 | 2016-05-01 |
29 | 2016-06-01 |
30 | 2016-07-01 |
31 | 2016-08-01 |
32 | 2016-09-01 |
33 | 2016-10-01 |
34 | 2016-11-01 |
35 | 2016-12-01 |
36 | 2017-01-01 |
37 | 2017-02-01 |
38 | 2017-03-01 |
39 | 2017-04-01 |
40 | 2017-05-01 |
41 | 2017-06-01 |
42 | 2017-07-01 |
43 | 2017-08-01 |
44 | 2017-09-01 |
45 | 2017-10-01 |
46 | 2017-11-01 |
47 | 2017-12-01 |
48 | 2018-01-01 |
49 | 2018-02-01 |
50 | 2018-03-01 |
51 | 2018-04-01 |
52 | 2018-05-01 |
53 | 2018-06-01 |
54 | 2018-07-01 |
55 | 2018-08-01 |
56 | 2018-09-01 |
57 | 2018-10-01 |
58 | 2018-11-01 |
59 | 2018-12-01 |
model.predict(future)
ds | trend | yhat_lower | yhat_upper | trend_lower | trend_upper | additive_terms | additive_terms_lower | additive_terms_upper | yearly | yearly_lower | yearly_upper | multiplicative_terms | multiplicative_terms_lower | multiplicative_terms_upper | yhat | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2014-01-01 | 726.057713 | 375.555739 | 695.295314 | 726.057713 | 726.057713 | -190.685662 | -190.685662 | -190.685662 | -190.685662 | -190.685662 | -190.685662 | 0.0 | 0.0 | 0.0 | 535.372051 |
1 | 2014-02-01 | 727.494023 | 293.399920 | 613.931087 | 727.494023 | 727.494023 | -276.377703 | -276.377703 | -276.377703 | -276.377703 | -276.377703 | -276.377703 | 0.0 | 0.0 | 0.0 | 451.116320 |
2 | 2014-03-01 | 728.791335 | 551.787401 | 875.526653 | 728.791335 | 728.791335 | -22.389755 | -22.389755 | -22.389755 | -22.389755 | -22.389755 | -22.389755 | 0.0 | 0.0 | 0.0 | 706.401580 |
3 | 2014-04-01 | 730.227645 | 479.070561 | 779.563943 | 730.227645 | 730.227645 | -100.141158 | -100.141158 | -100.141158 | -100.141158 | -100.141158 | -100.141158 | 0.0 | 0.0 | 0.0 | 630.086487 |
4 | 2014-05-01 | 731.617622 | 407.290288 | 727.947444 | 731.617622 | 731.617622 | -160.815662 | -160.815662 | -160.815662 | -160.815662 | -160.815662 | -160.815662 | 0.0 | 0.0 | 0.0 | 570.801960 |
5 | 2014-06-01 | 733.053932 | 464.110554 | 775.510263 | 733.053932 | 733.053932 | -113.832366 | -113.832366 | -113.832366 | -113.832366 | -113.832366 | -113.832366 | 0.0 | 0.0 | 0.0 | 619.221566 |
6 | 2014-07-01 | 734.443909 | 489.017230 | 815.461714 | 734.443909 | 734.443909 | -81.335686 | -81.335686 | -81.335686 | -81.335686 | -81.335686 | -81.335686 | 0.0 | 0.0 | 0.0 | 653.108224 |
7 | 2014-08-01 | 735.880219 | 451.040361 | 773.906672 | 735.880219 | 735.880219 | -117.154950 | -117.154950 | -117.154950 | -117.154950 | -117.154950 | -117.154950 | 0.0 | 0.0 | 0.0 | 618.725269 |
8 | 2014-09-01 | 737.316529 | 1041.500636 | 1351.531705 | 737.316529 | 737.316529 | 461.565905 | 461.565905 | 461.565905 | 461.565905 | 461.565905 | 461.565905 | 0.0 | 0.0 | 0.0 | 1198.882434 |
9 | 2014-10-01 | 738.706506 | 552.932184 | 858.826910 | 738.706506 | 738.706506 | -28.820414 | -28.820414 | -28.820414 | -28.820414 | -28.820414 | -28.820414 | 0.0 | 0.0 | 0.0 | 709.886092 |
10 | 2014-11-01 | 740.142816 | 1008.819647 | 1317.092508 | 740.142816 | 740.142816 | 431.055221 | 431.055221 | 431.055221 | 431.055221 | 431.055221 | 431.055221 | 0.0 | 0.0 | 0.0 | 1171.198037 |
11 | 2014-12-01 | 741.532794 | 1089.161899 | 1393.338664 | 741.532794 | 741.532794 | 501.882316 | 501.882316 | 501.882316 | 501.882316 | 501.882316 | 501.882316 | 0.0 | 0.0 | 0.0 | 1243.415110 |
12 | 2015-01-01 | 742.969103 | 454.173451 | 767.734489 | 742.969103 | 742.969103 | -130.055125 | -130.055125 | -130.055125 | -130.055125 | -130.055125 | -130.055125 | 0.0 | 0.0 | 0.0 | 612.913979 |
13 | 2015-02-01 | 744.405413 | 254.091386 | 577.236965 | 744.405413 | 744.405413 | -341.079140 | -341.079140 | -341.079140 | -341.079140 | -341.079140 | -341.079140 | 0.0 | 0.0 | 0.0 | 403.326273 |
14 | 2015-03-01 | 745.702725 | 642.430576 | 951.623020 | 745.702725 | 745.702725 | 46.620746 | 46.620746 | 46.620746 | 46.620746 | 46.620746 | 46.620746 | 0.0 | 0.0 | 0.0 | 792.323472 |
15 | 2015-04-01 | 747.139035 | 540.239706 | 845.960559 | 747.139035 | 747.139035 | -53.528569 | -53.528569 | -53.528569 | -53.528569 | -53.528569 | -53.528569 | 0.0 | 0.0 | 0.0 | 693.610467 |
16 | 2015-05-01 | 748.529013 | 417.522498 | 714.561871 | 748.529013 | 748.529013 | -184.631339 | -184.631339 | -184.631339 | -184.631339 | -184.631339 | -184.631339 | 0.0 | 0.0 | 0.0 | 563.897674 |
17 | 2015-06-01 | 749.965323 | 395.877547 | 704.827872 | 749.965323 | 749.965323 | -207.846148 | -207.846148 | -207.846148 | -207.846148 | -207.846148 | -207.846148 | 0.0 | 0.0 | 0.0 | 542.119174 |
18 | 2015-07-01 | 751.355300 | 546.422761 | 852.998396 | 751.355300 | 751.355300 | -48.294657 | -48.294657 | -48.294657 | -48.294657 | -48.294657 | -48.294657 | 0.0 | 0.0 | 0.0 | 703.060643 |
19 | 2015-08-01 | 752.791609 | 448.121103 | 756.607455 | 752.791609 | 752.791609 | -152.742582 | -152.742582 | -152.742582 | -152.742582 | -152.742582 | -152.742582 | 0.0 | 0.0 | 0.0 | 600.049027 |
20 | 2015-09-01 | 754.227919 | 1149.095495 | 1461.573191 | 754.227919 | 754.227919 | 551.713042 | 551.713042 | 551.713042 | 551.713042 | 551.713042 | 551.713042 | 0.0 | 0.0 | 0.0 | 1305.940961 |
21 | 2015-10-01 | 755.617896 | 588.050589 | 893.732664 | 755.617896 | 755.617896 | -13.792198 | -13.792198 | -13.792198 | -13.792198 | -13.792198 | -13.792198 | 0.0 | 0.0 | 0.0 | 741.825698 |
22 | 2015-11-01 | 757.054205 | 1006.264367 | 1330.522230 | 757.054205 | 757.054205 | 413.144741 | 413.144741 | 413.144741 | 413.144741 | 413.144741 | 413.144741 | 0.0 | 0.0 | 0.0 | 1170.198947 |
23 | 2015-12-01 | 758.444182 | 1024.755884 | 1326.386146 | 758.444182 | 758.444182 | 417.365653 | 417.365653 | 417.365653 | 417.365653 | 417.365653 | 417.365653 | 0.0 | 0.0 | 0.0 | 1175.809835 |
24 | 2016-01-01 | 759.880491 | 533.801284 | 845.522689 | 759.880491 | 759.880491 | -68.554067 | -68.554067 | -68.554067 | -68.554067 | -68.554067 | -68.554067 | 0.0 | 0.0 | 0.0 | 691.326424 |
25 | 2016-02-01 | 761.316801 | 191.493274 | 514.488624 | 761.316801 | 761.316801 | -405.014691 | -405.014691 | -405.014691 | -405.014691 | -405.014691 | -405.014691 | 0.0 | 0.0 | 0.0 | 356.302110 |
26 | 2016-03-01 | 762.660445 | 465.376362 | 752.621365 | 762.660445 | 762.660445 | -157.661809 | -157.661809 | -157.661809 | -157.661809 | -157.661809 | -157.661809 | 0.0 | 0.0 | 0.0 | 604.998637 |
27 | 2016-04-01 | 764.096755 | 397.607966 | 720.010866 | 764.096755 | 764.096755 | -197.922888 | -197.922888 | -197.922888 | -197.922888 | -197.922888 | -197.922888 | 0.0 | 0.0 | 0.0 | 566.173867 |
28 | 2016-05-01 | 765.486732 | 493.020424 | 795.877936 | 765.486732 | 765.486732 | -119.761707 | -119.761707 | -119.761707 | -119.761707 | -119.761707 | -119.761707 | 0.0 | 0.0 | 0.0 | 645.725025 |
29 | 2016-06-01 | 766.923041 | 676.173435 | 994.722300 | 766.923041 | 766.923041 | 75.113515 | 75.113515 | 75.113515 | 75.113515 | 75.113515 | 75.113515 | 0.0 | 0.0 | 0.0 | 842.036556 |
30 | 2016-07-01 | 768.313018 | 472.799304 | 790.070746 | 768.313018 | 768.313018 | -143.219273 | -143.219273 | -143.219273 | -143.219273 | -143.219273 | -143.219273 | 0.0 | 0.0 | 0.0 | 625.093745 |
31 | 2016-08-01 | 769.749327 | 584.580146 | 889.021341 | 769.749327 | 769.749327 | -41.282256 | -41.282256 | -41.282256 | -41.282256 | -41.282256 | -41.282256 | 0.0 | 0.0 | 0.0 | 728.467072 |
32 | 2016-09-01 | 771.185637 | 887.179005 | 1204.828346 | 771.185637 | 771.185637 | 282.865686 | 282.865686 | 282.865686 | 282.865686 | 282.865686 | 282.865686 | 0.0 | 0.0 | 0.0 | 1054.051323 |
33 | 2016-10-01 | 772.575614 | 545.914732 | 860.391636 | 772.575614 | 772.575614 | -63.840536 | -63.840536 | -63.840536 | -63.840536 | -63.840536 | -63.840536 | 0.0 | 0.0 | 0.0 | 708.735078 |
34 | 2016-11-01 | 774.011923 | 1085.839458 | 1392.254970 | 774.011923 | 774.011923 | 461.298523 | 461.298523 | 461.298523 | 461.298523 | 461.298523 | 461.298523 | 0.0 | 0.0 | 0.0 | 1235.310446 |
35 | 2016-12-01 | 775.401900 | 1287.888860 | 1602.762277 | 775.401900 | 775.401900 | 672.698002 | 672.698002 | 672.698002 | 672.698002 | 672.698002 | 672.698002 | 0.0 | 0.0 | 0.0 | 1448.099902 |
36 | 2017-01-01 | 776.838209 | 369.980424 | 683.107377 | 776.838209 | 776.838209 | -250.426116 | -250.426116 | -250.426116 | -250.426116 | -250.426116 | -250.426116 | 0.0 | 0.0 | 0.0 | 526.412094 |
37 | 2017-02-01 | 778.274519 | 408.308622 | 717.390143 | 778.274519 | 778.274519 | -210.957398 | -210.957398 | -210.957398 | -210.957398 | -210.957398 | -210.957398 | 0.0 | 0.0 | 0.0 | 567.317120 |
38 | 2017-03-01 | 779.571831 | 535.357796 | 850.592519 | 779.571831 | 779.571831 | -90.513443 | -90.513443 | -90.513443 | -90.513443 | -90.513443 | -90.513443 | 0.0 | 0.0 | 0.0 | 689.058388 |
39 | 2017-04-01 | 781.008140 | 477.439584 | 785.557238 | 781.008140 | 781.008140 | -148.296501 | -148.296501 | -148.296501 | -148.296501 | -148.296501 | -148.296501 | 0.0 | 0.0 | 0.0 | 632.711639 |
40 | 2017-05-01 | 782.398117 | 486.007271 | 810.160435 | 782.398117 | 782.398117 | -139.178722 | -139.178722 | -139.178722 | -139.178722 | -139.178722 | -139.178722 | 0.0 | 0.0 | 0.0 | 643.219395 |
41 | 2017-06-01 | 783.834426 | 603.733179 | 922.225522 | 783.834426 | 783.834426 | -19.474492 | -19.474492 | -19.474492 | -19.474492 | -19.474492 | -19.474492 | 0.0 | 0.0 | 0.0 | 764.359934 |
42 | 2017-07-01 | 785.224403 | 517.022563 | 829.998891 | 785.224403 | 785.224403 | -112.977624 | -112.977624 | -112.977624 | -112.977624 | -112.977624 | -112.977624 | 0.0 | 0.0 | 0.0 | 672.246779 |
43 | 2017-08-01 | 786.660713 | 553.528488 | 866.557563 | 786.660713 | 786.660713 | -79.999841 | -79.999841 | -79.999841 | -79.999841 | -79.999841 | -79.999841 | 0.0 | 0.0 | 0.0 | 706.660871 |
44 | 2017-09-01 | 788.097022 | 996.618534 | 1316.692991 | 788.097022 | 788.097022 | 371.914804 | 371.914804 | 371.914804 | 371.914804 | 371.914804 | 371.914804 | 0.0 | 0.0 | 0.0 | 1160.011826 |
45 | 2017-10-01 | 789.486999 | 585.531905 | 888.099082 | 789.486999 | 789.486999 | -45.516303 | -45.516303 | -45.516303 | -45.516303 | -45.516303 | -45.516303 | 0.0 | 0.0 | 0.0 | 743.970696 |
46 | 2017-11-01 | 790.923308 | 1080.626366 | 1400.663856 | 790.923308 | 790.923308 | 447.117002 | 447.117002 | 447.117002 | 447.117002 | 447.117002 | 447.117002 | 0.0 | 0.0 | 0.0 | 1238.040310 |
47 | 2017-12-01 | 792.313285 | 1221.074372 | 1538.605895 | 792.313285 | 792.313285 | 587.025701 | 587.025701 | 587.025701 | 587.025701 | 587.025701 | 587.025701 | 0.0 | 0.0 | 0.0 | 1379.338986 |
48 | 2018-01-01 | 793.749595 | 448.163047 | 756.313780 | 793.749595 | 793.749595 | -190.685662 | -190.685662 | -190.685662 | -190.685662 | -190.685662 | -190.685662 | 0.0 | 0.0 | 0.0 | 603.063933 |
49 | 2018-02-01 | 795.185904 | 360.828186 | 683.291792 | 795.185904 | 795.185905 | -276.377703 | -276.377703 | -276.377703 | -276.377703 | -276.377703 | -276.377703 | 0.0 | 0.0 | 0.0 | 518.808201 |
50 | 2018-03-01 | 796.483216 | 614.640162 | 925.996570 | 796.483215 | 796.483217 | -22.389755 | -22.389755 | -22.389755 | -22.389755 | -22.389755 | -22.389755 | 0.0 | 0.0 | 0.0 | 774.093461 |
51 | 2018-04-01 | 797.919525 | 534.060372 | 852.194196 | 797.919524 | 797.919527 | -100.141158 | -100.141158 | -100.141158 | -100.141158 | -100.141158 | -100.141158 | 0.0 | 0.0 | 0.0 | 697.778367 |
52 | 2018-05-01 | 799.309502 | 474.410237 | 781.590787 | 799.309500 | 799.309505 | -160.815662 | -160.815662 | -160.815662 | -160.815662 | -160.815662 | -160.815662 | 0.0 | 0.0 | 0.0 | 638.493840 |
53 | 2018-06-01 | 800.745812 | 537.799138 | 834.481771 | 800.745808 | 800.745815 | -113.832366 | -113.832366 | -113.832366 | -113.832366 | -113.832366 | -113.832366 | 0.0 | 0.0 | 0.0 | 686.913446 |
54 | 2018-07-01 | 802.135789 | 561.579231 | 875.890879 | 802.135784 | 802.135793 | -81.335686 | -81.335686 | -81.335686 | -81.335686 | -81.335686 | -81.335686 | 0.0 | 0.0 | 0.0 | 720.800103 |
55 | 2018-08-01 | 803.572098 | 532.489762 | 837.208640 | 803.572093 | 803.572104 | -117.154950 | -117.154950 | -117.154950 | -117.154950 | -117.154950 | -117.154950 | 0.0 | 0.0 | 0.0 | 686.417148 |
56 | 2018-09-01 | 805.008408 | 1123.623993 | 1429.923019 | 805.008401 | 805.008415 | 461.565905 | 461.565905 | 461.565905 | 461.565905 | 461.565905 | 461.565905 | 0.0 | 0.0 | 0.0 | 1266.574313 |
57 | 2018-10-01 | 806.398384 | 623.089438 | 930.323047 | 806.398376 | 806.398393 | -28.820414 | -28.820414 | -28.820414 | -28.820414 | -28.820414 | -28.820414 | 0.0 | 0.0 | 0.0 | 777.577970 |
58 | 2018-11-01 | 807.834694 | 1083.933747 | 1395.291076 | 807.834685 | 807.834703 | 431.055221 | 431.055221 | 431.055221 | 431.055221 | 431.055221 | 431.055221 | 0.0 | 0.0 | 0.0 | 1238.889915 |
59 | 2018-12-01 | 809.224671 | 1150.847123 | 1475.396456 | 809.224661 | 809.224681 | 501.882316 | 501.882316 | 501.882316 | 501.882316 | 501.882316 | 501.882316 | 0.0 | 0.0 | 0.0 | 1311.106987 |
pred = model.predict(future)
pred = pred[['ds', 'yhat', 'yhat_lower', 'yhat_upper']]
pred
ds | yhat | yhat_lower | yhat_upper | |
---|---|---|---|---|
0 | 2014-01-01 | 535.372051 | 378.524766 | 688.636959 |
1 | 2014-02-01 | 451.116320 | 293.796833 | 610.251748 |
2 | 2014-03-01 | 706.401580 | 543.923578 | 859.173669 |
3 | 2014-04-01 | 630.086487 | 466.414631 | 788.518361 |
4 | 2014-05-01 | 570.801960 | 418.062888 | 723.688846 |
5 | 2014-06-01 | 619.221566 | 473.588452 | 782.257305 |
6 | 2014-07-01 | 653.108224 | 489.390112 | 817.836053 |
7 | 2014-08-01 | 618.725269 | 465.640830 | 777.228368 |
8 | 2014-09-01 | 1198.882434 | 1032.947053 | 1357.184833 |
9 | 2014-10-01 | 709.886092 | 557.926691 | 863.095101 |
10 | 2014-11-01 | 1171.198037 | 1000.900573 | 1326.257329 |
11 | 2014-12-01 | 1243.415110 | 1078.151134 | 1399.100457 |
12 | 2015-01-01 | 612.913979 | 459.295261 | 774.310468 |
13 | 2015-02-01 | 403.326273 | 244.878685 | 557.974096 |
14 | 2015-03-01 | 792.323472 | 629.197645 | 949.559294 |
15 | 2015-04-01 | 693.610467 | 552.540290 | 849.330312 |
16 | 2015-05-01 | 563.897674 | 407.523113 | 717.615703 |
17 | 2015-06-01 | 542.119174 | 377.671689 | 698.468811 |
18 | 2015-07-01 | 703.060643 | 554.042171 | 863.980917 |
19 | 2015-08-01 | 600.049027 | 441.376748 | 743.052218 |
20 | 2015-09-01 | 1305.940961 | 1152.713902 | 1469.983934 |
21 | 2015-10-01 | 741.825698 | 587.322735 | 905.992719 |
22 | 2015-11-01 | 1170.198947 | 1016.291767 | 1332.152695 |
23 | 2015-12-01 | 1175.809835 | 1019.107049 | 1332.031156 |
24 | 2016-01-01 | 691.326424 | 536.685486 | 841.037102 |
25 | 2016-02-01 | 356.302110 | 192.462275 | 509.339288 |
26 | 2016-03-01 | 604.998637 | 451.336717 | 751.128700 |
27 | 2016-04-01 | 566.173867 | 415.777235 | 716.142581 |
28 | 2016-05-01 | 645.725025 | 495.090228 | 807.995724 |
29 | 2016-06-01 | 842.036556 | 684.175196 | 1001.981301 |
30 | 2016-07-01 | 625.093745 | 460.043929 | 785.621332 |
31 | 2016-08-01 | 728.467072 | 575.005988 | 891.436275 |
32 | 2016-09-01 | 1054.051323 | 900.880527 | 1211.208162 |
33 | 2016-10-01 | 708.735078 | 555.187877 | 873.956714 |
34 | 2016-11-01 | 1235.310446 | 1090.615284 | 1390.800265 |
35 | 2016-12-01 | 1448.099902 | 1295.800649 | 1607.336201 |
36 | 2017-01-01 | 526.412094 | 376.730524 | 684.014411 |
37 | 2017-02-01 | 567.317120 | 409.149813 | 730.286661 |
38 | 2017-03-01 | 689.058388 | 531.694572 | 842.986690 |
39 | 2017-04-01 | 632.711639 | 476.111002 | 792.837900 |
40 | 2017-05-01 | 643.219395 | 490.185802 | 795.416521 |
41 | 2017-06-01 | 764.359934 | 611.237269 | 919.585267 |
42 | 2017-07-01 | 672.246779 | 513.125091 | 828.246671 |
43 | 2017-08-01 | 706.660871 | 546.766929 | 864.654936 |
44 | 2017-09-01 | 1160.011826 | 1005.427189 | 1325.637425 |
45 | 2017-10-01 | 743.970696 | 578.941466 | 907.667671 |
46 | 2017-11-01 | 1238.040310 | 1082.147207 | 1392.479155 |
47 | 2017-12-01 | 1379.338986 | 1216.319883 | 1536.014510 |
48 | 2018-01-01 | 603.063933 | 447.464202 | 765.737086 |
49 | 2018-02-01 | 518.808201 | 353.375332 | 677.391527 |
50 | 2018-03-01 | 774.093461 | 615.274334 | 930.740848 |
51 | 2018-04-01 | 697.778367 | 537.821410 | 861.134434 |
52 | 2018-05-01 | 638.493840 | 477.662807 | 792.591736 |
53 | 2018-06-01 | 686.913446 | 526.867767 | 830.127164 |
54 | 2018-07-01 | 720.800103 | 555.966558 | 873.713180 |
55 | 2018-08-01 | 686.417148 | 531.007182 | 852.261733 |
56 | 2018-09-01 | 1266.574313 | 1098.857803 | 1423.457812 |
57 | 2018-10-01 | 777.577970 | 620.248696 | 936.594107 |
58 | 2018-11-01 | 1238.889915 | 1083.628999 | 1385.253346 |
59 | 2018-12-01 | 1311.106987 | 1161.211924 | 1467.705491 |
model.plot(pred)
plt.show()
Train ,Test data 셋을 나누어보자.
data.head()
Sales | |
---|---|
Order Date | |
2014-01-01 | 480.194231 |
2014-02-01 | 367.931600 |
2014-03-01 | 857.291529 |
2014-04-01 | 567.488357 |
2014-05-01 | 432.049187 |
data[:'2016'] # 2016년 까지의 data만 불러오기
Sales | |
---|---|
Order Date | |
2014-01-01 | 480.194231 |
2014-02-01 | 367.931600 |
2014-03-01 | 857.291529 |
2014-04-01 | 567.488357 |
2014-05-01 | 432.049187 |
2014-06-01 | 695.059242 |
2014-07-01 | 601.169500 |
2014-08-01 | 457.521656 |
2014-09-01 | 992.353367 |
2014-10-01 | 769.015437 |
2014-11-01 | 980.221486 |
2014-12-01 | 1532.298325 |
2015-01-01 | 978.328467 |
2015-02-01 | 522.395667 |
2015-03-01 | 781.236437 |
2015-04-01 | 805.822962 |
2015-05-01 | 624.996700 |
2015-06-01 | 428.565500 |
2015-07-01 | 719.706316 |
2015-08-01 | 602.412012 |
2015-09-01 | 1382.790684 |
2015-10-01 | 632.980184 |
2015-11-01 | 1286.701354 |
2015-12-01 | 1049.355418 |
2016-01-01 | 508.182867 |
2016-02-01 | 356.868273 |
2016-03-01 | 609.575810 |
2016-04-01 | 695.373158 |
2016-05-01 | 687.265227 |
2016-06-01 | 816.910750 |
2016-07-01 | 768.736412 |
2016-08-01 | 734.307782 |
2016-09-01 | 1135.953371 |
2016-10-01 | 624.872474 |
2016-11-01 | 1271.345152 |
2016-12-01 | 1410.719808 |
2016까지의 데이터를 훈련데이터로 쓰고, 2017년 데이터는 쌩으로 데이터로 사용하고자 한다.
new_df = data[:'2016']
new_df = new_df.reset_index()
new_df.columns = ['ds', 'y']
new_df
ds | y | |
---|---|---|
0 | 2014-01-01 | 480.194231 |
1 | 2014-02-01 | 367.931600 |
2 | 2014-03-01 | 857.291529 |
3 | 2014-04-01 | 567.488357 |
4 | 2014-05-01 | 432.049187 |
5 | 2014-06-01 | 695.059242 |
6 | 2014-07-01 | 601.169500 |
7 | 2014-08-01 | 457.521656 |
8 | 2014-09-01 | 992.353367 |
9 | 2014-10-01 | 769.015437 |
10 | 2014-11-01 | 980.221486 |
11 | 2014-12-01 | 1532.298325 |
12 | 2015-01-01 | 978.328467 |
13 | 2015-02-01 | 522.395667 |
14 | 2015-03-01 | 781.236437 |
15 | 2015-04-01 | 805.822962 |
16 | 2015-05-01 | 624.996700 |
17 | 2015-06-01 | 428.565500 |
18 | 2015-07-01 | 719.706316 |
19 | 2015-08-01 | 602.412012 |
20 | 2015-09-01 | 1382.790684 |
21 | 2015-10-01 | 632.980184 |
22 | 2015-11-01 | 1286.701354 |
23 | 2015-12-01 | 1049.355418 |
24 | 2016-01-01 | 508.182867 |
25 | 2016-02-01 | 356.868273 |
26 | 2016-03-01 | 609.575810 |
27 | 2016-04-01 | 695.373158 |
28 | 2016-05-01 | 687.265227 |
29 | 2016-06-01 | 816.910750 |
30 | 2016-07-01 | 768.736412 |
31 | 2016-08-01 | 734.307782 |
32 | 2016-09-01 | 1135.953371 |
33 | 2016-10-01 | 624.872474 |
34 | 2016-11-01 | 1271.345152 |
35 | 2016-12-01 | 1410.719808 |
model = Prophet()
model.fit(new_df)
INFO:fbprophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
<fbprophet.forecaster.Prophet at 0x7f7983014a20>
future = model.make_future_dataframe(periods = 12, freq = 'MS') # 예상할 범위 지정
pred = model.predict(future)
pred
ds | trend | yhat_lower | yhat_upper | trend_lower | trend_upper | additive_terms | additive_terms_lower | additive_terms_upper | yearly | yearly_lower | yearly_upper | multiplicative_terms | multiplicative_terms_lower | multiplicative_terms_upper | yhat | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2014-01-01 | 681.588219 | 489.820917 | 780.550959 | 681.588219 | 681.588219 | -47.448838 | -47.448838 | -47.448838 | -47.448838 | -47.448838 | -47.448838 | 0.0 | 0.0 | 0.0 | 634.139381 |
1 | 2014-02-01 | 685.636335 | 263.817165 | 551.384047 | 685.636335 | 685.636335 | -274.076247 | -274.076247 | -274.076247 | -274.076247 | -274.076247 | -274.076247 | 0.0 | 0.0 | 0.0 | 411.560088 |
2 | 2014-03-01 | 689.292697 | 590.943677 | 877.179408 | 689.292697 | 689.292697 | 42.432009 | 42.432009 | 42.432009 | 42.432009 | 42.432009 | 42.432009 | 0.0 | 0.0 | 0.0 | 731.724706 |
3 | 2014-04-01 | 693.340812 | 506.537667 | 808.965787 | 693.340812 | 693.340812 | -34.565068 | -34.565068 | -34.565068 | -34.565068 | -34.565068 | -34.565068 | 0.0 | 0.0 | 0.0 | 658.775745 |
4 | 2014-05-01 | 697.258343 | 380.459041 | 691.741707 | 697.258343 | 697.258343 | -165.311506 | -165.311506 | -165.311506 | -165.311506 | -165.311506 | -165.311506 | 0.0 | 0.0 | 0.0 | 531.946837 |
5 | 2014-06-01 | 701.306459 | 420.211069 | 723.366012 | 701.306459 | 701.306459 | -133.386474 | -133.386474 | -133.386474 | -133.386474 | -133.386474 | -133.386474 | 0.0 | 0.0 | 0.0 | 567.919984 |
6 | 2014-07-01 | 705.223990 | 497.611700 | 804.471583 | 705.223990 | 705.223990 | -58.042923 | -58.042923 | -58.042923 | -58.042923 | -58.042923 | -58.042923 | 0.0 | 0.0 | 0.0 | 647.181067 |
7 | 2014-08-01 | 709.272106 | 383.683725 | 693.950098 | 709.272106 | 709.272106 | -171.626465 | -171.626465 | -171.626465 | -171.626465 | -171.626465 | -171.626465 | 0.0 | 0.0 | 0.0 | 537.645640 |
8 | 2014-09-01 | 713.320221 | 1001.869125 | 1292.353941 | 713.320221 | 713.320221 | 436.759246 | 436.759246 | 436.759246 | 436.759246 | 436.759246 | 436.759246 | 0.0 | 0.0 | 0.0 | 1150.079467 |
9 | 2014-10-01 | 717.237752 | 500.833488 | 795.834748 | 717.237752 | 717.237752 | -76.340818 | -76.340818 | -76.340818 | -76.340818 | -76.340818 | -76.340818 | 0.0 | 0.0 | 0.0 | 640.896934 |
10 | 2014-11-01 | 721.285868 | 982.698246 | 1280.944120 | 721.285868 | 721.285868 | 414.478990 | 414.478990 | 414.478990 | 414.478990 | 414.478990 | 414.478990 | 0.0 | 0.0 | 0.0 | 1135.764858 |
11 | 2014-12-01 | 725.203399 | 1114.032354 | 1407.626220 | 725.203399 | 725.203399 | 535.324793 | 535.324793 | 535.324793 | 535.324793 | 535.324793 | 535.324793 | 0.0 | 0.0 | 0.0 | 1260.528192 |
12 | 2015-01-01 | 729.251514 | 507.046250 | 802.452572 | 729.251514 | 729.251514 | -74.004639 | -74.004639 | -74.004639 | -74.004639 | -74.004639 | -74.004639 | 0.0 | 0.0 | 0.0 | 655.246875 |
13 | 2015-02-01 | 733.299630 | 270.205334 | 551.769970 | 733.299630 | 733.299630 | -318.385936 | -318.385936 | -318.385936 | -318.385936 | -318.385936 | -318.385936 | 0.0 | 0.0 | 0.0 | 414.913693 |
14 | 2015-03-01 | 736.955992 | 717.461038 | 1024.298701 | 736.955992 | 736.955992 | 135.207544 | 135.207544 | 135.207544 | 135.207544 | 135.207544 | 135.207544 | 0.0 | 0.0 | 0.0 | 872.163536 |
15 | 2015-04-01 | 741.004093 | 599.434707 | 893.316256 | 741.004093 | 741.004093 | 10.925079 | 10.925079 | 10.925079 | 10.925079 | 10.925079 | 10.925079 | 0.0 | 0.0 | 0.0 | 751.929172 |
16 | 2015-05-01 | 744.921610 | 423.294725 | 718.730926 | 744.921610 | 744.921610 | -178.100732 | -178.100732 | -178.100732 | -178.100732 | -178.100732 | -178.100732 | 0.0 | 0.0 | 0.0 | 566.820879 |
17 | 2015-06-01 | 748.969712 | 376.036629 | 685.465799 | 748.969712 | 748.969712 | -228.203393 | -228.203393 | -228.203393 | -228.203393 | -228.203393 | -228.203393 | 0.0 | 0.0 | 0.0 | 520.766319 |
18 | 2015-07-01 | 752.887229 | 552.088154 | 840.950943 | 752.887229 | 752.887229 | -58.585019 | -58.585019 | -58.585019 | -58.585019 | -58.585019 | -58.585019 | 0.0 | 0.0 | 0.0 | 694.302210 |
19 | 2015-08-01 | 756.935331 | 408.764592 | 705.574247 | 756.935331 | 756.935331 | -203.124853 | -203.124853 | -203.124853 | -203.124853 | -203.124853 | -203.124853 | 0.0 | 0.0 | 0.0 | 553.810477 |
20 | 2015-09-01 | 760.983432 | 1138.933806 | 1429.417552 | 760.983432 | 760.983432 | 522.378502 | 522.378502 | 522.378502 | 522.378502 | 522.378502 | 522.378502 | 0.0 | 0.0 | 0.0 | 1283.361934 |
21 | 2015-10-01 | 764.900950 | 576.829257 | 872.963723 | 764.900950 | 764.900950 | -41.132905 | -41.132905 | -41.132905 | -41.132905 | -41.132905 | -41.132905 | 0.0 | 0.0 | 0.0 | 723.768045 |
22 | 2015-11-01 | 768.949051 | 1034.928202 | 1353.467806 | 768.949051 | 768.949051 | 420.101231 | 420.101231 | 420.101231 | 420.101231 | 420.101231 | 420.101231 | 0.0 | 0.0 | 0.0 | 1189.050282 |
23 | 2015-12-01 | 772.866568 | 1085.980872 | 1387.241224 | 772.866568 | 772.866568 | 464.683096 | 464.683096 | 464.683096 | 464.683096 | 464.683096 | 464.683096 | 0.0 | 0.0 | 0.0 | 1237.549665 |
24 | 2016-01-01 | 776.914670 | 533.434123 | 824.662898 | 776.914670 | 776.914670 | -99.647965 | -99.647965 | -99.647965 | -99.647965 | -99.647965 | -99.647965 | 0.0 | 0.0 | 0.0 | 677.266704 |
25 | 2016-02-01 | 780.962771 | 269.521829 | 570.783075 | 780.962771 | 780.962771 | -360.434238 | -360.434238 | -360.434238 | -360.434238 | -360.434238 | -360.434238 | 0.0 | 0.0 | 0.0 | 420.528534 |
26 | 2016-03-01 | 784.749705 | 496.867187 | 793.121656 | 784.749705 | 784.749705 | -140.486697 | -140.486697 | -140.486697 | -140.486697 | -140.486697 | -140.486697 | 0.0 | 0.0 | 0.0 | 644.263007 |
27 | 2016-04-01 | 788.797806 | 529.030838 | 811.756415 | 788.797806 | 788.797806 | -130.923223 | -130.923223 | -130.923223 | -130.923223 | -130.923223 | -130.923223 | 0.0 | 0.0 | 0.0 | 657.874584 |
28 | 2016-05-01 | 792.715324 | 500.902636 | 785.870123 | 792.715324 | 792.715324 | -147.271749 | -147.271749 | -147.271749 | -147.271749 | -147.271749 | -147.271749 | 0.0 | 0.0 | 0.0 | 645.443575 |
29 | 2016-06-01 | 796.763425 | 698.191855 | 986.436287 | 796.763425 | 796.763425 | 55.078814 | 55.078814 | 55.078814 | 55.078814 | 55.078814 | 55.078814 | 0.0 | 0.0 | 0.0 | 851.842239 |
30 | 2016-07-01 | 800.680943 | 599.818969 | 900.783295 | 800.680943 | 800.680943 | -52.647345 | -52.647345 | -52.647345 | -52.647345 | -52.647345 | -52.647345 | 0.0 | 0.0 | 0.0 | 748.033598 |
31 | 2016-08-01 | 804.729044 | 564.562668 | 854.640135 | 804.729044 | 804.729044 | -102.089978 | -102.089978 | -102.089978 | -102.089978 | -102.089978 | -102.089978 | 0.0 | 0.0 | 0.0 | 702.639066 |
32 | 2016-09-01 | 808.777146 | 935.301896 | 1230.359833 | 808.777146 | 808.777146 | 268.734098 | 268.734098 | 268.734098 | 268.734098 | 268.734098 | 268.734098 | 0.0 | 0.0 | 0.0 | 1077.511244 |
33 | 2016-10-01 | 812.694664 | 501.744256 | 798.487191 | 812.694664 | 812.694664 | -150.546754 | -150.546754 | -150.546754 | -150.546754 | -150.546754 | -150.546754 | 0.0 | 0.0 | 0.0 | 662.147909 |
34 | 2016-11-01 | 816.742765 | 1068.965055 | 1360.526758 | 816.742765 | 816.742765 | 396.487662 | 396.487662 | 396.487662 | 396.487662 | 396.487662 | 396.487662 | 0.0 | 0.0 | 0.0 | 1213.230427 |
35 | 2016-12-01 | 820.660283 | 1333.386531 | 1652.766966 | 820.660283 | 820.660283 | 673.686476 | 673.686476 | 673.686476 | 673.686476 | 673.686476 | 673.686476 | 0.0 | 0.0 | 0.0 | 1494.346758 |
36 | 2017-01-01 | 824.708384 | 654.628881 | 948.501143 | 824.708384 | 824.708385 | -20.053946 | -20.053946 | -20.053946 | -20.053946 | -20.053946 | -20.053946 | 0.0 | 0.0 | 0.0 | 804.654438 |
37 | 2017-02-01 | 828.756486 | 451.673740 | 739.229452 | 828.756484 | 828.756487 | -227.549968 | -227.549968 | -227.549968 | -227.549968 | -227.549968 | -227.549968 | 0.0 | 0.0 | 0.0 | 601.206518 |
38 | 2017-03-01 | 832.412835 | 647.384793 | 934.031460 | 832.412832 | 832.412839 | -49.507766 | -49.507766 | -49.507766 | -49.507766 | -49.507766 | -49.507766 | 0.0 | 0.0 | 0.0 | 782.905069 |
39 | 2017-04-01 | 836.460937 | 615.754206 | 904.336383 | 836.460931 | 836.460943 | -81.875639 | -81.875639 | -81.875639 | -81.875639 | -81.875639 | -81.875639 | 0.0 | 0.0 | 0.0 | 754.585298 |
40 | 2017-05-01 | 840.378455 | 536.074714 | 837.877014 | 840.378446 | 840.378463 | -155.025651 | -155.025651 | -155.025651 | -155.025651 | -155.025651 | -155.025651 | 0.0 | 0.0 | 0.0 | 685.352803 |
41 | 2017-06-01 | 844.426556 | 649.635087 | 939.254091 | 844.426544 | 844.426568 | -38.918902 | -38.918902 | -38.918902 | -38.918902 | -38.918902 | -38.918902 | 0.0 | 0.0 | 0.0 | 805.507655 |
42 | 2017-07-01 | 848.344074 | 650.086358 | 939.639765 | 848.344058 | 848.344089 | -56.050359 | -56.050359 | -56.050359 | -56.050359 | -56.050359 | -56.050359 | 0.0 | 0.0 | 0.0 | 792.293715 |
43 | 2017-08-01 | 852.392175 | 565.438941 | 866.059661 | 852.392157 | 852.392194 | -137.943755 | -137.943755 | -137.943755 | -137.943755 | -137.943755 | -137.943755 | 0.0 | 0.0 | 0.0 | 714.448421 |
44 | 2017-09-01 | 856.440277 | 1061.035048 | 1368.132822 | 856.440253 | 856.440299 | 352.175705 | 352.175705 | 352.175705 | 352.175705 | 352.175705 | 352.175705 | 0.0 | 0.0 | 0.0 | 1208.615981 |
45 | 2017-10-01 | 860.357794 | 599.915070 | 890.540836 | 860.357767 | 860.357820 | -112.835925 | -112.835925 | -112.835925 | -112.835925 | -112.835925 | -112.835925 | 0.0 | 0.0 | 0.0 | 747.521869 |
46 | 2017-11-01 | 864.405896 | 1131.166407 | 1424.700197 | 864.405865 | 864.405927 | 406.605191 | 406.605191 | 406.605191 | 406.605191 | 406.605191 | 406.605191 | 0.0 | 0.0 | 0.0 | 1271.011087 |
47 | 2017-12-01 | 868.323413 | 1333.490239 | 1617.829474 | 868.323378 | 868.323449 | 605.025364 | 605.025364 | 605.025364 | 605.025364 | 605.025364 | 605.025364 | 0.0 | 0.0 | 0.0 | 1473.348777 |
pred = pred[['ds', 'yhat']]
pred.set_index('ds', inplace=True)
pred = pred['2017':]
pred
yhat | |
---|---|
ds | |
2017-01-01 | 804.654438 |
2017-02-01 | 601.206518 |
2017-03-01 | 782.905069 |
2017-04-01 | 754.585298 |
2017-05-01 | 685.352803 |
2017-06-01 | 805.507655 |
2017-07-01 | 792.293715 |
2017-08-01 | 714.448421 |
2017-09-01 | 1208.615981 |
2017-10-01 | 747.521869 |
2017-11-01 | 1271.011087 |
2017-12-01 | 1473.348777 |
data
Sales | |
---|---|
Order Date | |
2014-01-01 | 480.194231 |
2014-02-01 | 367.931600 |
2014-03-01 | 857.291529 |
2014-04-01 | 567.488357 |
2014-05-01 | 432.049187 |
2014-06-01 | 695.059242 |
2014-07-01 | 601.169500 |
2014-08-01 | 457.521656 |
2014-09-01 | 992.353367 |
2014-10-01 | 769.015437 |
2014-11-01 | 980.221486 |
2014-12-01 | 1532.298325 |
2015-01-01 | 978.328467 |
2015-02-01 | 522.395667 |
2015-03-01 | 781.236437 |
2015-04-01 | 805.822962 |
2015-05-01 | 624.996700 |
2015-06-01 | 428.565500 |
2015-07-01 | 719.706316 |
2015-08-01 | 602.412012 |
2015-09-01 | 1382.790684 |
2015-10-01 | 632.980184 |
2015-11-01 | 1286.701354 |
2015-12-01 | 1049.355418 |
2016-01-01 | 508.182867 |
2016-02-01 | 356.868273 |
2016-03-01 | 609.575810 |
2016-04-01 | 695.373158 |
2016-05-01 | 687.265227 |
2016-06-01 | 816.910750 |
2016-07-01 | 768.736412 |
2016-08-01 | 734.307782 |
2016-09-01 | 1135.953371 |
2016-10-01 | 624.872474 |
2016-11-01 | 1271.345152 |
2016-12-01 | 1410.719808 |
2017-01-01 | 397.602133 |
2017-02-01 | 528.179800 |
2017-03-01 | 544.672240 |
2017-04-01 | 453.297905 |
2017-05-01 | 678.302328 |
2017-06-01 | 826.460291 |
2017-07-01 | 562.524857 |
2017-08-01 | 857.881889 |
2017-09-01 | 1209.508583 |
2017-10-01 | 875.362728 |
2017-11-01 | 1277.817759 |
2017-12-01 | 1256.298672 |
ax = data.plot()
ax = data.plot()
pred.plot(ax=ax, figsize=(20,10)) # ax plot에 pred plot를 겹처서 그리게 하는 파라미터를 추가한다.
<AxesSubplot:xlabel='ds'>
파란색은 실제 Sales의 값이며, 주황색은 우리가 예측한 값이다. 2017년 초반에는 차이가 좀 있지만, 전체 적인 트렌드는 비슷한것을 확인 할 수 있다.
3. AutoRegressive 알고리즘을 통한 예측 (성능은 prphet보다 떨어진다.)¶
data = pd.read_excel('./data/Superstore.xls', index_col=0)
data['Category'].unique()
array(['Furniture', 'Office Supplies', 'Technology'], dtype=object)
.unique() 함수를 사용하여 Category의 종류를 확인해 볼 수 있다.
data['Category'].value_counts()
Office Supplies 6026 Furniture 2121 Technology 1847 Name: Category, dtype: int64
.value_counts() 함수를 사용하여 Category의 종류와 수량을 확인 할 수 있다.
data['Category'] == 'Furniture' ## 불리안 True or False 확인
Row ID 1 True 2 True 3 False 4 True 5 False ... 9990 True 9991 True 9992 False 9993 False 9994 False Name: Category, Length: 9994, dtype: bool
data.head() # 위 코드와 대조해서 확인 해본다.
Order ID | Order Date | Ship Date | Ship Mode | Customer ID | Customer Name | Segment | Country | City | State | Postal Code | Region | Product ID | Category | Sub-Category | Product Name | Sales | Quantity | Discount | Profit | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Row ID | ||||||||||||||||||||
1 | CA-2016-152156 | 2016-11-08 | 2016-11-11 | Second Class | CG-12520 | Claire Gute | Consumer | United States | Henderson | Kentucky | 42420 | South | FUR-BO-10001798 | Furniture | Bookcases | Bush Somerset Collection Bookcase | 261.9600 | 2 | 0.00 | 41.9136 |
2 | CA-2016-152156 | 2016-11-08 | 2016-11-11 | Second Class | CG-12520 | Claire Gute | Consumer | United States | Henderson | Kentucky | 42420 | South | FUR-CH-10000454 | Furniture | Chairs | Hon Deluxe Fabric Upholstered Stacking Chairs,... | 731.9400 | 3 | 0.00 | 219.5820 |
3 | CA-2016-138688 | 2016-06-12 | 2016-06-16 | Second Class | DV-13045 | Darrin Van Huff | Corporate | United States | Los Angeles | California | 90036 | West | OFF-LA-10000240 | Office Supplies | Labels | Self-Adhesive Address Labels for Typewriters b... | 14.6200 | 2 | 0.00 | 6.8714 |
4 | US-2015-108966 | 2015-10-11 | 2015-10-18 | Standard Class | SO-20335 | Sean O'Donnell | Consumer | United States | Fort Lauderdale | Florida | 33311 | South | FUR-TA-10000577 | Furniture | Tables | Bretford CR4500 Series Slim Rectangular Table | 957.5775 | 5 | 0.45 | -383.0310 |
5 | US-2015-108966 | 2015-10-11 | 2015-10-18 | Standard Class | SO-20335 | Sean O'Donnell | Consumer | United States | Fort Lauderdale | Florida | 33311 | South | OFF-ST-10000760 | Office Supplies | Storage | Eldon Fold 'N Roll Cart System | 22.3680 | 2 | 0.20 | 2.5164 |
data[data['Category'] == 'Furniture' ] # data['Category'] == 'Furniture' 을 data[]에 넣어 주면 True만 불러와주는 기능을 한다.
Order ID | Order Date | Ship Date | Ship Mode | Customer ID | Customer Name | Segment | Country | City | State | Postal Code | Region | Product ID | Category | Sub-Category | Product Name | Sales | Quantity | Discount | Profit | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Row ID | ||||||||||||||||||||
1 | CA-2016-152156 | 2016-11-08 | 2016-11-11 | Second Class | CG-12520 | Claire Gute | Consumer | United States | Henderson | Kentucky | 42420 | South | FUR-BO-10001798 | Furniture | Bookcases | Bush Somerset Collection Bookcase | 261.9600 | 2 | 0.00 | 41.9136 |
2 | CA-2016-152156 | 2016-11-08 | 2016-11-11 | Second Class | CG-12520 | Claire Gute | Consumer | United States | Henderson | Kentucky | 42420 | South | FUR-CH-10000454 | Furniture | Chairs | Hon Deluxe Fabric Upholstered Stacking Chairs,... | 731.9400 | 3 | 0.00 | 219.5820 |
4 | US-2015-108966 | 2015-10-11 | 2015-10-18 | Standard Class | SO-20335 | Sean O'Donnell | Consumer | United States | Fort Lauderdale | Florida | 33311 | South | FUR-TA-10000577 | Furniture | Tables | Bretford CR4500 Series Slim Rectangular Table | 957.5775 | 5 | 0.45 | -383.0310 |
6 | CA-2014-115812 | 2014-06-09 | 2014-06-14 | Standard Class | BH-11710 | Brosina Hoffman | Consumer | United States | Los Angeles | California | 90032 | West | FUR-FU-10001487 | Furniture | Furnishings | Eldon Expressions Wood and Plastic Desk Access... | 48.8600 | 7 | 0.00 | 14.1694 |
11 | CA-2014-115812 | 2014-06-09 | 2014-06-14 | Standard Class | BH-11710 | Brosina Hoffman | Consumer | United States | Los Angeles | California | 90032 | West | FUR-TA-10001539 | Furniture | Tables | Chromcraft Rectangular Conference Tables | 1706.1840 | 9 | 0.20 | 85.3092 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
9963 | CA-2015-168088 | 2015-03-19 | 2015-03-22 | First Class | CM-12655 | Corinna Mitchell | Home Office | United States | Houston | Texas | 77041 | Central | FUR-BO-10004218 | Furniture | Bookcases | Bush Heritage Pine Collection 5-Shelf Bookcase... | 383.4656 | 4 | 0.32 | -67.6704 |
9965 | CA-2016-146374 | 2016-12-05 | 2016-12-10 | Second Class | HE-14800 | Harold Engle | Corporate | United States | Newark | Delaware | 19711 | East | FUR-FU-10002671 | Furniture | Furnishings | Electrix 20W Halogen Replacement Bulb for Zoom... | 13.4000 | 1 | 0.00 | 6.4320 |
9981 | US-2015-151435 | 2015-09-06 | 2015-09-09 | Second Class | SW-20455 | Shaun Weien | Consumer | United States | Lafayette | Louisiana | 70506 | South | FUR-TA-10001039 | Furniture | Tables | KI Adjustable-Height Table | 85.9800 | 1 | 0.00 | 22.3548 |
9990 | CA-2014-110422 | 2014-01-21 | 2014-01-23 | Second Class | TB-21400 | Tom Boeckenhauer | Consumer | United States | Miami | Florida | 33180 | South | FUR-FU-10001889 | Furniture | Furnishings | Ultra Door Pull Handle | 25.2480 | 3 | 0.20 | 4.1028 |
9991 | CA-2017-121258 | 2017-02-26 | 2017-03-03 | Standard Class | DB-13060 | Dave Brooks | Consumer | United States | Costa Mesa | California | 92627 | West | FUR-FU-10000747 | Furniture | Furnishings | Tenex B1-RE Series Chair Mats for Low Pile Car... | 91.9600 | 2 | 0.00 | 15.6332 |
2121 rows × 20 columns
data = data[data['Category'] == 'Furniture'] # data에 할당
data.head()
Order ID | Order Date | Ship Date | Ship Mode | Customer ID | Customer Name | Segment | Country | City | State | Postal Code | Region | Product ID | Category | Sub-Category | Product Name | Sales | Quantity | Discount | Profit | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Row ID | ||||||||||||||||||||
1 | CA-2016-152156 | 2016-11-08 | 2016-11-11 | Second Class | CG-12520 | Claire Gute | Consumer | United States | Henderson | Kentucky | 42420 | South | FUR-BO-10001798 | Furniture | Bookcases | Bush Somerset Collection Bookcase | 261.9600 | 2 | 0.00 | 41.9136 |
2 | CA-2016-152156 | 2016-11-08 | 2016-11-11 | Second Class | CG-12520 | Claire Gute | Consumer | United States | Henderson | Kentucky | 42420 | South | FUR-CH-10000454 | Furniture | Chairs | Hon Deluxe Fabric Upholstered Stacking Chairs,... | 731.9400 | 3 | 0.00 | 219.5820 |
4 | US-2015-108966 | 2015-10-11 | 2015-10-18 | Standard Class | SO-20335 | Sean O'Donnell | Consumer | United States | Fort Lauderdale | Florida | 33311 | South | FUR-TA-10000577 | Furniture | Tables | Bretford CR4500 Series Slim Rectangular Table | 957.5775 | 5 | 0.45 | -383.0310 |
6 | CA-2014-115812 | 2014-06-09 | 2014-06-14 | Standard Class | BH-11710 | Brosina Hoffman | Consumer | United States | Los Angeles | California | 90032 | West | FUR-FU-10001487 | Furniture | Furnishings | Eldon Expressions Wood and Plastic Desk Access... | 48.8600 | 7 | 0.00 | 14.1694 |
11 | CA-2014-115812 | 2014-06-09 | 2014-06-14 | Standard Class | BH-11710 | Brosina Hoffman | Consumer | United States | Los Angeles | California | 90032 | West | FUR-TA-10001539 | Furniture | Tables | Chromcraft Rectangular Conference Tables | 1706.1840 | 9 | 0.20 | 85.3092 |
data = data[['Order Date', 'Sales']]
data
Order Date | Sales | |
---|---|---|
Row ID | ||
1 | 2016-11-08 | 261.9600 |
2 | 2016-11-08 | 731.9400 |
4 | 2015-10-11 | 957.5775 |
6 | 2014-06-09 | 48.8600 |
11 | 2014-06-09 | 1706.1840 |
... | ... | ... |
9963 | 2015-03-19 | 383.4656 |
9965 | 2016-12-05 | 13.4000 |
9981 | 2015-09-06 | 85.9800 |
9990 | 2014-01-21 | 25.2480 |
9991 | 2017-02-26 | 91.9600 |
2121 rows × 2 columns
data.sort_values('Order Date') #Order Date 칼럼 기준으로 오름차순으로 불러오는 함수를 사용
Order Date | Sales | |
---|---|---|
Row ID | ||
7475 | 2014-01-06 | 2573.820 |
7661 | 2014-01-07 | 76.728 |
867 | 2014-01-10 | 51.940 |
717 | 2014-01-11 | 9.940 |
2979 | 2014-01-13 | 545.940 |
... | ... | ... |
7637 | 2017-12-29 | 258.750 |
7633 | 2017-12-29 | 1207.840 |
5132 | 2017-12-29 | 101.120 |
7636 | 2017-12-29 | 300.980 |
907 | 2017-12-30 | 323.136 |
2121 rows × 2 columns
우리가 알고자하는 것은 시간에 따른 Sales의 트렌드 임으로 다시 정리를 해준다.
data = data.groupby('Order Date').sum() # 각 날짜별로 총합을 구한다.
data
Sales | |
---|---|
Order Date | |
2014-01-06 | 2573.8200 |
2014-01-07 | 76.7280 |
2014-01-10 | 51.9400 |
2014-01-11 | 9.9400 |
2014-01-13 | 879.9390 |
... | ... |
2017-12-24 | 1393.4940 |
2017-12-25 | 832.4540 |
2017-12-28 | 551.2568 |
2017-12-29 | 2330.7180 |
2017-12-30 | 323.1360 |
889 rows × 1 columns
data.index
DatetimeIndex(['2014-01-06', '2014-01-07', '2014-01-10', '2014-01-11', '2014-01-13', '2014-01-14', '2014-01-16', '2014-01-19', '2014-01-20', '2014-01-21', ... '2017-12-18', '2017-12-19', '2017-12-21', '2017-12-22', '2017-12-23', '2017-12-24', '2017-12-25', '2017-12-28', '2017-12-29', '2017-12-30'], dtype='datetime64[ns]', name='Order Date', length=889, freq=None)
plot을 그려보자.
plt.figure(figsize = (20,10))
sns.lineplot(x = data.index, y = data['Sales'])
<AxesSubplot:xlabel='Order Date', ylabel='Sales'>
위와 같이 컬럼이 한가지만 존재할 경우 그리고, x축에 들어갈 값이 (날짜 등과같은) 존재할 경우 아래와 같은 .plot로 더 간단하게 그릴 수 있다. (Seaborn 보다 plt가 편하다.)
data.plot(figsize = (20,10))
<AxesSubplot:xlabel='Order Date'>
추세를 예측하기 어려운 그림이다. 월 단위로 묶어서 다시 보자. .resample() 함수롤 사용한다.
data = data.resample('MS').mean() # MS는 월별, 그리고 평균값 mean()
data # 답을 비교하기위한 data
Sales | |
---|---|
Order Date | |
2014-01-01 | 480.194231 |
2014-02-01 | 367.931600 |
2014-03-01 | 857.291529 |
2014-04-01 | 567.488357 |
2014-05-01 | 432.049187 |
2014-06-01 | 695.059242 |
2014-07-01 | 601.169500 |
2014-08-01 | 457.521656 |
2014-09-01 | 992.353367 |
2014-10-01 | 769.015437 |
2014-11-01 | 980.221486 |
2014-12-01 | 1532.298325 |
2015-01-01 | 978.328467 |
2015-02-01 | 522.395667 |
2015-03-01 | 781.236437 |
2015-04-01 | 805.822962 |
2015-05-01 | 624.996700 |
2015-06-01 | 428.565500 |
2015-07-01 | 719.706316 |
2015-08-01 | 602.412012 |
2015-09-01 | 1382.790684 |
2015-10-01 | 632.980184 |
2015-11-01 | 1286.701354 |
2015-12-01 | 1049.355418 |
2016-01-01 | 508.182867 |
2016-02-01 | 356.868273 |
2016-03-01 | 609.575810 |
2016-04-01 | 695.373158 |
2016-05-01 | 687.265227 |
2016-06-01 | 816.910750 |
2016-07-01 | 768.736412 |
2016-08-01 | 734.307782 |
2016-09-01 | 1135.953371 |
2016-10-01 | 624.872474 |
2016-11-01 | 1271.345152 |
2016-12-01 | 1410.719808 |
2017-01-01 | 397.602133 |
2017-02-01 | 528.179800 |
2017-03-01 | 544.672240 |
2017-04-01 | 453.297905 |
2017-05-01 | 678.302328 |
2017-06-01 | 826.460291 |
2017-07-01 | 562.524857 |
2017-08-01 | 857.881889 |
2017-09-01 | 1209.508583 |
2017-10-01 | 875.362728 |
2017-11-01 | 1277.817759 |
2017-12-01 | 1256.298672 |
new_df = data[:'2016'] # train data로 사용할 것이다.
from statsmodels.tsa.ar_model import AutoReg
model = AutoReg(new_df, lags = 12)
/home/ubuntu/.local/lib/python3.6/site-packages/statsmodels/tsa/ar_model.py:252: FutureWarning: The parameter names will change after 0.12 is released. Set old_names to False to use the new names now. Set old_names to True to use the old names.
model_fit = model.fit()
len(new_df)
36
pred = model_fit.predict(start = len(new_df) , end = len(new_df)+11 )
pred
2017-01-01 660.758250 2017-02-01 310.973369 2017-03-01 433.467903 2017-04-01 589.733771 2017-05-01 634.609408 2017-06-01 932.697417 2017-07-01 959.741129 2017-08-01 983.674137 2017-09-01 989.383080 2017-10-01 554.888314 2017-11-01 1030.961310 2017-12-01 1194.389892 Freq: MS, dtype: float64
ax = data.plot()
pred.plot(ax=ax, figsize=(20,10))
<AxesSubplot:xlabel='Order Date'>
어느 정도 트렌드는 따라가지만, prophet보다 떨어진 성능을 확인할 수 있다.
- prophet 를 사용하는 것이 더 좋다.
- 출처 : 파이썬을 활용한 이커머스 데이터 분석
'파이썬을 활용한 이커머스 데이터 분석' 카테고리의 다른 글
Chapter09.NLP(상품리뷰분석) (0) | 2021.07.11 |
---|---|
Chapter.07 고객 분류 (Kmeans) (0) | 2021.06.14 |
Chapter.06 프로모션 효율 예측 (Random Forest) (0) | 2021.06.13 |
Chapter05.구매 요인 분석(Dicision Tree) (0) | 2021.06.12 |
Chapter04.KNN (0) | 2021.06.10 |