데이터 시각화 연습 with 캐글 (자살률)¶
개발 도상국과 선진국을 포함하여 많은 국가에서 자살률이 수년 동안 상당히 높다. 처음에는 성별 자살률, 그 다음에는 연령과 세대, 나라별, 소득별 자살률 살펴보고자한다. 다음으로 우리는 자살률의 top 10 순위를 해볼 것이다.
- reference
In [3]:
import pandas as pd
df = pd.read_csv('./data/master.csv')
In [4]:
df
Out[4]:
country | year | sex | age | suicides_no | population | suicides/100k pop | country-year | HDI for year | gdp_for_year ($) | gdp_per_capita ($) | generation | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Albania | 1987 | male | 15-24 years | 21 | 312900 | 6.71 | Albania1987 | NaN | 2,156,624,900 | 796 | Generation X |
1 | Albania | 1987 | male | 35-54 years | 16 | 308000 | 5.19 | Albania1987 | NaN | 2,156,624,900 | 796 | Silent |
2 | Albania | 1987 | female | 15-24 years | 14 | 289700 | 4.83 | Albania1987 | NaN | 2,156,624,900 | 796 | Generation X |
3 | Albania | 1987 | male | 75+ years | 1 | 21800 | 4.59 | Albania1987 | NaN | 2,156,624,900 | 796 | G.I. Generation |
4 | Albania | 1987 | male | 25-34 years | 9 | 274300 | 3.28 | Albania1987 | NaN | 2,156,624,900 | 796 | Boomers |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
27815 | Uzbekistan | 2014 | female | 35-54 years | 107 | 3620833 | 2.96 | Uzbekistan2014 | 0.675 | 63,067,077,179 | 2309 | Generation X |
27816 | Uzbekistan | 2014 | female | 75+ years | 9 | 348465 | 2.58 | Uzbekistan2014 | 0.675 | 63,067,077,179 | 2309 | Silent |
27817 | Uzbekistan | 2014 | male | 5-14 years | 60 | 2762158 | 2.17 | Uzbekistan2014 | 0.675 | 63,067,077,179 | 2309 | Generation Z |
27818 | Uzbekistan | 2014 | female | 5-14 years | 44 | 2631600 | 1.67 | Uzbekistan2014 | 0.675 | 63,067,077,179 | 2309 | Generation Z |
27819 | Uzbekistan | 2014 | female | 55-74 years | 21 | 1438935 | 1.46 | Uzbekistan2014 | 0.675 | 63,067,077,179 | 2309 | Boomers |
27820 rows × 12 columns
In [6]:
df.head(30)
Out[6]:
country | year | sex | age | suicides_no | population | suicides/100k pop | country-year | HDI for year | gdp_for_year ($) | gdp_per_capita ($) | generation | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Albania | 1987 | male | 15-24 years | 21 | 312900 | 6.71 | Albania1987 | NaN | 2,156,624,900 | 796 | Generation X |
1 | Albania | 1987 | male | 35-54 years | 16 | 308000 | 5.19 | Albania1987 | NaN | 2,156,624,900 | 796 | Silent |
2 | Albania | 1987 | female | 15-24 years | 14 | 289700 | 4.83 | Albania1987 | NaN | 2,156,624,900 | 796 | Generation X |
3 | Albania | 1987 | male | 75+ years | 1 | 21800 | 4.59 | Albania1987 | NaN | 2,156,624,900 | 796 | G.I. Generation |
4 | Albania | 1987 | male | 25-34 years | 9 | 274300 | 3.28 | Albania1987 | NaN | 2,156,624,900 | 796 | Boomers |
5 | Albania | 1987 | female | 75+ years | 1 | 35600 | 2.81 | Albania1987 | NaN | 2,156,624,900 | 796 | G.I. Generation |
6 | Albania | 1987 | female | 35-54 years | 6 | 278800 | 2.15 | Albania1987 | NaN | 2,156,624,900 | 796 | Silent |
7 | Albania | 1987 | female | 25-34 years | 4 | 257200 | 1.56 | Albania1987 | NaN | 2,156,624,900 | 796 | Boomers |
8 | Albania | 1987 | male | 55-74 years | 1 | 137500 | 0.73 | Albania1987 | NaN | 2,156,624,900 | 796 | G.I. Generation |
9 | Albania | 1987 | female | 5-14 years | 0 | 311000 | 0.00 | Albania1987 | NaN | 2,156,624,900 | 796 | Generation X |
10 | Albania | 1987 | female | 55-74 years | 0 | 144600 | 0.00 | Albania1987 | NaN | 2,156,624,900 | 796 | G.I. Generation |
11 | Albania | 1987 | male | 5-14 years | 0 | 338200 | 0.00 | Albania1987 | NaN | 2,156,624,900 | 796 | Generation X |
12 | Albania | 1988 | female | 75+ years | 2 | 36400 | 5.49 | Albania1988 | NaN | 2,126,000,000 | 769 | G.I. Generation |
13 | Albania | 1988 | male | 15-24 years | 17 | 319200 | 5.33 | Albania1988 | NaN | 2,126,000,000 | 769 | Generation X |
14 | Albania | 1988 | male | 75+ years | 1 | 22300 | 4.48 | Albania1988 | NaN | 2,126,000,000 | 769 | G.I. Generation |
15 | Albania | 1988 | male | 35-54 years | 14 | 314100 | 4.46 | Albania1988 | NaN | 2,126,000,000 | 769 | Silent |
16 | Albania | 1988 | male | 55-74 years | 4 | 140200 | 2.85 | Albania1988 | NaN | 2,126,000,000 | 769 | G.I. Generation |
17 | Albania | 1988 | female | 15-24 years | 8 | 295600 | 2.71 | Albania1988 | NaN | 2,126,000,000 | 769 | Generation X |
18 | Albania | 1988 | female | 55-74 years | 3 | 147500 | 2.03 | Albania1988 | NaN | 2,126,000,000 | 769 | G.I. Generation |
19 | Albania | 1988 | female | 25-34 years | 5 | 262400 | 1.91 | Albania1988 | NaN | 2,126,000,000 | 769 | Boomers |
20 | Albania | 1988 | male | 25-34 years | 5 | 279900 | 1.79 | Albania1988 | NaN | 2,126,000,000 | 769 | Boomers |
21 | Albania | 1988 | female | 35-54 years | 4 | 284500 | 1.41 | Albania1988 | NaN | 2,126,000,000 | 769 | Silent |
22 | Albania | 1988 | female | 5-14 years | 0 | 317200 | 0.00 | Albania1988 | NaN | 2,126,000,000 | 769 | Generation X |
23 | Albania | 1988 | male | 5-14 years | 0 | 345000 | 0.00 | Albania1988 | NaN | 2,126,000,000 | 769 | Generation X |
24 | Albania | 1989 | male | 75+ years | 2 | 22500 | 8.89 | Albania1989 | NaN | 2,335,124,988 | 833 | G.I. Generation |
25 | Albania | 1989 | male | 25-34 years | 18 | 283600 | 6.35 | Albania1989 | NaN | 2,335,124,988 | 833 | Boomers |
26 | Albania | 1989 | male | 35-54 years | 15 | 318400 | 4.71 | Albania1989 | NaN | 2,335,124,988 | 833 | Silent |
27 | Albania | 1989 | male | 55-74 years | 6 | 142100 | 4.22 | Albania1989 | NaN | 2,335,124,988 | 833 | G.I. Generation |
28 | Albania | 1989 | male | 15-24 years | 12 | 323500 | 3.71 | Albania1989 | NaN | 2,335,124,988 | 833 | Generation X |
29 | Albania | 1989 | female | 35-54 years | 7 | 288600 | 2.43 | Albania1989 | NaN | 2,335,124,988 | 833 | Silent |
In [7]:
df.sample(30)
Out[7]:
country | year | sex | age | suicides_no | population | suicides/100k pop | country-year | HDI for year | gdp_for_year ($) | gdp_per_capita ($) | generation | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
6864 | Cuba | 2011 | male | 15-24 years | 75 | 808772 | 9.27 | Cuba2011 | 0.776 | 68,990,000,000 | 6419 | Millenials |
4116 | Belize | 2013 | female | 15-24 years | 1 | 35551 | 2.81 | Belize2013 | 0.715 | 1,612,573,850 | 5272 | Millenials |
14272 | Kuwait | 1997 | female | 55-74 years | 0 | 25000 | 0.00 | Kuwait1997 | NaN | 30,354,434,553 | 18766 | Silent |
5069 | Canada | 1997 | female | 75+ years | 42 | 970900 | 4.33 | Canada1997 | NaN | 652,825,364,726 | 23245 | G.I. Generation |
13465 | Japan | 1993 | female | 55-74 years | 2418 | 13512000 | 17.90 | Japan1993 | NaN | 4,454,143,876,947 | 37832 | Silent |
14718 | Kyrgyzstan | 2009 | male | 5-14 years | 8 | 531076 | 1.51 | Kyrgyzstan2009 | NaN | 4,690,062,255 | 977 | Generation Z |
5563 | Chile | 2009 | female | 5-14 years | 11 | 1277663 | 0.86 | Chile2009 | NaN | 172,389,498,445 | 11062 | Generation Z |
17729 | New Zealand | 2008 | female | 15-24 years | 38 | 304640 | 12.47 | New Zealand2008 | NaN | 133,279,679,483 | 33585 | Millenials |
22471 | Singapore | 1990 | female | 5-14 years | 1 | 194700 | 0.51 | Singapore1990 | 0.718 | 36,152,027,893 | 14393 | Generation X |
4679 | Bulgaria | 1996 | male | 35-54 years | 376 | 1129300 | 33.29 | Bulgaria1996 | NaN | 10,109,612,142 | 1270 | Boomers |
23771 | Spain | 2005 | male | 25-34 years | 402 | 3844965 | 10.46 | Spain2005 | 0.845 | 1,157,276,458,152 | 28092 | Generation X |
7125 | Czech Republic | 1992 | female | 55-74 years | 164 | 1035800 | 15.83 | Czech Republic1992 | NaN | 34,590,052,812 | 3573 | Silent |
26610 | United Kingdom | 1996 | male | 75+ years | 218 | 1460275 | 14.93 | United Kingdom1996 | NaN | 1,408,781,591,264 | 25609 | G.I. Generation |
3692 | Belgium | 2006 | male | 75+ years | 142 | 319905 | 44.39 | Belgium2006 | NaN | 409,813,197,842 | 41135 | Silent |
6701 | Cuba | 1997 | male | 5-14 years | 5 | 868234 | 0.58 | Cuba1997 | NaN | 25,366,200,000 | 2472 | Millenials |
6672 | Cuba | 1995 | female | 35-54 years | 241 | 1330000 | 18.12 | Cuba1995 | 0.653 | 30,429,803,651 | 2993 | Boomers |
17146 | Netherlands | 1991 | female | 25-34 years | 107 | 1245900 | 8.59 | Netherlands1991 | NaN | 323,320,449,906 | 22906 | Boomers |
23804 | Spain | 2008 | male | 75+ years | 484 | 1470360 | 32.92 | Spain2008 | NaN | 1,635,015,380,108 | 37848 | Silent |
1020 | Armenia | 1995 | male | 55-74 years | 29 | 242300 | 11.97 | Armenia1995 | 0.605 | 1,468,317,350 | 426 | Silent |
18008 | Norway | 1997 | female | 75+ years | 12 | 214100 | 5.60 | Norway1997 | NaN | 161,354,369,893 | 39335 | G.I. Generation |
1668 | Australia | 2006 | male | 25-34 years | 277 | 1442942 | 19.20 | Australia2006 | NaN | 745,521,862,833 | 39014 | Generation X |
22320 | Seychelles | 2009 | female | 35-54 years | 0 | 12916 | 0.00 | Seychelles2009 | NaN | 847,397,850 | 10157 | Boomers |
6677 | Cuba | 1995 | male | 5-14 years | 1 | 846300 | 0.12 | Cuba1995 | 0.653 | 30,429,803,651 | 2993 | Millenials |
8565 | Estonia | 2012 | female | 35-54 years | 7 | 184354 | 3.80 | Estonia2012 | 0.855 | 23,043,864,510 | 18411 | Generation X |
25131 | Thailand | 2000 | female | 55-74 years | 153 | 3820685 | 4.00 | Thailand2000 | 0.648 | 126,392,308,498 | 2169 | Silent |
6601 | Croatia | 2014 | male | 25-34 years | 42 | 287190 | 14.62 | Croatia2014 | 0.818 | 57,629,518,806 | 14299 | Millenials |
444 | Antigua and Barbuda | 2002 | female | 15-24 years | 0 | 7196 | 0.00 | Antigua and Barbuda2002 | NaN | 814,615,333 | 10499 | Millenials |
4606 | Bulgaria | 1990 | male | 55-74 years | 294 | 873400 | 33.66 | Bulgaria1990 | 0.695 | 20,632,090,909 | 2451 | G.I. Generation |
1128 | Armenia | 2006 | male | 75+ years | 6 | 48123 | 12.47 | Armenia2006 | NaN | 6,384,451,606 | 2310 | Silent |
16675 | Mexico | 1994 | male | 35-54 years | 554 | 7691700 | 7.20 | Mexico1994 | NaN | 527,813,238,126 | 6735 | Boomers |
In [9]:
df.isnull().sum()
Out[9]:
country 0 year 0 sex 0 age 0 suicides_no 0 population 0 suicides/100k pop 0 country-year 0 HDI for year 19456 gdp_for_year ($) 0 gdp_per_capita ($) 0 generation 0 dtype: int64
In [12]:
df.groupby("age")
Out[12]:
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f2810e74790>
In [24]:
df.groupby("age").head(1)
Out[24]:
country | year | sex | age | suicides_no | population | suicides/100k pop | country-year | HDI for year | gdp_for_year ($) | gdp_per_capita ($) | generation | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Albania | 1987 | male | 15-24 years | 21 | 312900 | 6.71 | Albania1987 | NaN | 2,156,624,900 | 796 | Generation X |
1 | Albania | 1987 | male | 35-54 years | 16 | 308000 | 5.19 | Albania1987 | NaN | 2,156,624,900 | 796 | Silent |
3 | Albania | 1987 | male | 75+ years | 1 | 21800 | 4.59 | Albania1987 | NaN | 2,156,624,900 | 796 | G.I. Generation |
4 | Albania | 1987 | male | 25-34 years | 9 | 274300 | 3.28 | Albania1987 | NaN | 2,156,624,900 | 796 | Boomers |
8 | Albania | 1987 | male | 55-74 years | 1 | 137500 | 0.73 | Albania1987 | NaN | 2,156,624,900 | 796 | G.I. Generation |
9 | Albania | 1987 | female | 5-14 years | 0 | 311000 | 0.00 | Albania1987 | NaN | 2,156,624,900 | 796 | Generation X |
In [21]:
df.groupby("age").describe()
Out[21]:
year | suicides_no | ... | HDI for year | gdp_per_capita ($) | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | mean | std | min | 25% | 50% | 75% | max | count | mean | ... | 75% | max | count | mean | std | min | 25% | 50% | 75% | max | |
age | |||||||||||||||||||||
15-24 years | 4642.0 | 2001.275312 | 8.479669 | 1985.0 | 1995.0 | 2002.0 | 2008.0 | 2016.0 | 4642.0 | 174.179664 | ... | 0.855 | 0.944 | 4642.0 | 16876.57346 | 18894.414357 | 251.0 | 3450.0 | 9378.0 | 24874.0 | 126352.0 |
25-34 years | 4642.0 | 2001.275312 | 8.479669 | 1985.0 | 1995.0 | 2002.0 | 2008.0 | 2016.0 | 4642.0 | 242.118053 | ... | 0.855 | 0.944 | 4642.0 | 16876.57346 | 18894.414357 | 251.0 | 3450.0 | 9378.0 | 24874.0 | 126352.0 |
35-54 years | 4642.0 | 2001.275312 | 8.479669 | 1985.0 | 1995.0 | 2002.0 | 2008.0 | 2016.0 | 4642.0 | 528.250969 | ... | 0.855 | 0.944 | 4642.0 | 16876.57346 | 18894.414357 | 251.0 | 3450.0 | 9378.0 | 24874.0 | 126352.0 |
5-14 years | 4610.0 | 2001.173102 | 8.419515 | 1985.0 | 1994.0 | 2002.0 | 2008.0 | 2015.0 | 4610.0 | 11.337093 | ... | 0.855 | 0.944 | 4610.0 | 16815.56833 | 18863.290560 | 251.0 | 3436.0 | 9283.0 | 24796.0 | 126352.0 |
55-74 years | 4642.0 | 2001.275312 | 8.479669 | 1985.0 | 1995.0 | 2002.0 | 2008.0 | 2016.0 | 4642.0 | 357.269065 | ... | 0.855 | 0.944 | 4642.0 | 16876.57346 | 18894.414357 | 251.0 | 3450.0 | 9378.0 | 24874.0 | 126352.0 |
75+ years | 4642.0 | 2001.275312 | 8.479669 | 1985.0 | 1995.0 | 2002.0 | 2008.0 | 2016.0 | 4642.0 | 140.697544 | ... | 0.855 | 0.944 | 4642.0 | 16876.57346 | 18894.414357 | 251.0 | 3450.0 | 9378.0 | 24874.0 | 126352.0 |
6 rows × 48 columns
In [25]:
df.groupby(["age", "sex"]).head(1)
Out[25]:
country | year | sex | age | suicides_no | population | suicides/100k pop | country-year | HDI for year | gdp_for_year ($) | gdp_per_capita ($) | generation | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Albania | 1987 | male | 15-24 years | 21 | 312900 | 6.71 | Albania1987 | NaN | 2,156,624,900 | 796 | Generation X |
1 | Albania | 1987 | male | 35-54 years | 16 | 308000 | 5.19 | Albania1987 | NaN | 2,156,624,900 | 796 | Silent |
2 | Albania | 1987 | female | 15-24 years | 14 | 289700 | 4.83 | Albania1987 | NaN | 2,156,624,900 | 796 | Generation X |
3 | Albania | 1987 | male | 75+ years | 1 | 21800 | 4.59 | Albania1987 | NaN | 2,156,624,900 | 796 | G.I. Generation |
4 | Albania | 1987 | male | 25-34 years | 9 | 274300 | 3.28 | Albania1987 | NaN | 2,156,624,900 | 796 | Boomers |
5 | Albania | 1987 | female | 75+ years | 1 | 35600 | 2.81 | Albania1987 | NaN | 2,156,624,900 | 796 | G.I. Generation |
6 | Albania | 1987 | female | 35-54 years | 6 | 278800 | 2.15 | Albania1987 | NaN | 2,156,624,900 | 796 | Silent |
7 | Albania | 1987 | female | 25-34 years | 4 | 257200 | 1.56 | Albania1987 | NaN | 2,156,624,900 | 796 | Boomers |
8 | Albania | 1987 | male | 55-74 years | 1 | 137500 | 0.73 | Albania1987 | NaN | 2,156,624,900 | 796 | G.I. Generation |
9 | Albania | 1987 | female | 5-14 years | 0 | 311000 | 0.00 | Albania1987 | NaN | 2,156,624,900 | 796 | Generation X |
10 | Albania | 1987 | female | 55-74 years | 0 | 144600 | 0.00 | Albania1987 | NaN | 2,156,624,900 | 796 | G.I. Generation |
11 | Albania | 1987 | male | 5-14 years | 0 | 338200 | 0.00 | Albania1987 | NaN | 2,156,624,900 | 796 | Generation X |
1. 어떤 나이 연령대에 자살을 많이 하는지 살펴보자. (나이와 성별)¶
In [26]:
df["age"].value_counts()
Out[26]:
25-34 years 4642 35-54 years 4642 15-24 years 4642 75+ years 4642 55-74 years 4642 5-14 years 4610 Name: age, dtype: int64
In [27]:
# 시각화
import matplotlib.pyplot as plt
import seaborn as sns
In [37]:
plt.figure(figsize=(10,5))
sns.barplot(x = "age", y = 'suicides/100k pop', hue = "sex", data = df.groupby(["age", "sex"]).sum().reset_index())
plt.show()
- 성별은 female 보다 male이 높다.
- 5-14세 자살률이 낮다.
- 75세 이상이 자살률이 가장 높다.
In [34]:
df.groupby(["age", "sex"]).sum()
Out[34]:
year | suicides_no | population | suicides/100k pop | HDI for year | gdp_per_capita ($) | ||
---|---|---|---|---|---|---|---|
age | sex | ||||||
15-24 years | female | 4644960 | 175437 | 4245159089 | 10045.33 | 541.291 | 39170527 |
male | 4644960 | 633105 | 4397787807 | 31487.36 | 541.291 | 39170527 | |
25-34 years | female | 4644960 | 208823 | 4190523226 | 10614.42 | 541.291 | 39170527 |
male | 4644960 | 915089 | 4247580361 | 45957.10 | 541.291 | 39170527 | |
35-54 years | female | 4644960 | 506233 | 7266872023 | 13732.15 | 541.291 | 39170527 |
male | 4644960 | 1945908 | 7109016100 | 55653.87 | 541.291 | 39170527 | |
5-14 years | female | 4612704 | 16997 | 4107939076 | 1065.49 | 541.291 | 38759885 |
male | 4612704 | 35267 | 4290754161 | 1792.90 | 541.291 | 38759885 | |
55-74 years | female | 4644960 | 430036 | 4756740046 | 16533.52 | 541.291 | 39170527 |
male | 4644960 | 1228407 | 4046505294 | 58460.68 | 541.291 | 39170527 | |
75+ years | female | 4644960 | 221984 | 1705548397 | 23023.86 | 541.291 | 39170527 |
male | 4644960 | 431134 | 957732856 | 88177.15 | 541.291 | 39170527 |
그룹으로 묶여져 있는것을 .reset_index()로 풀어줄 수 있다. 행으로..
In [35]:
df.groupby(["age", "sex"]).sum().reset_index()
Out[35]:
age | sex | year | suicides_no | population | suicides/100k pop | HDI for year | gdp_per_capita ($) | |
---|---|---|---|---|---|---|---|---|
0 | 15-24 years | female | 4644960 | 175437 | 4245159089 | 10045.33 | 541.291 | 39170527 |
1 | 15-24 years | male | 4644960 | 633105 | 4397787807 | 31487.36 | 541.291 | 39170527 |
2 | 25-34 years | female | 4644960 | 208823 | 4190523226 | 10614.42 | 541.291 | 39170527 |
3 | 25-34 years | male | 4644960 | 915089 | 4247580361 | 45957.10 | 541.291 | 39170527 |
4 | 35-54 years | female | 4644960 | 506233 | 7266872023 | 13732.15 | 541.291 | 39170527 |
5 | 35-54 years | male | 4644960 | 1945908 | 7109016100 | 55653.87 | 541.291 | 39170527 |
6 | 5-14 years | female | 4612704 | 16997 | 4107939076 | 1065.49 | 541.291 | 38759885 |
7 | 5-14 years | male | 4612704 | 35267 | 4290754161 | 1792.90 | 541.291 | 38759885 |
8 | 55-74 years | female | 4644960 | 430036 | 4756740046 | 16533.52 | 541.291 | 39170527 |
9 | 55-74 years | male | 4644960 | 1228407 | 4046505294 | 58460.68 | 541.291 | 39170527 |
10 | 75+ years | female | 4644960 | 221984 | 1705548397 | 23023.86 | 541.291 | 39170527 |
11 | 75+ years | male | 4644960 | 431134 | 957732856 | 88177.15 | 541.291 | 39170527 |
In [38]:
df.head()
Out[38]:
country | year | sex | age | suicides_no | population | suicides/100k pop | country-year | HDI for year | gdp_for_year ($) | gdp_per_capita ($) | generation | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Albania | 1987 | male | 15-24 years | 21 | 312900 | 6.71 | Albania1987 | NaN | 2,156,624,900 | 796 | Generation X |
1 | Albania | 1987 | male | 35-54 years | 16 | 308000 | 5.19 | Albania1987 | NaN | 2,156,624,900 | 796 | Silent |
2 | Albania | 1987 | female | 15-24 years | 14 | 289700 | 4.83 | Albania1987 | NaN | 2,156,624,900 | 796 | Generation X |
3 | Albania | 1987 | male | 75+ years | 1 | 21800 | 4.59 | Albania1987 | NaN | 2,156,624,900 | 796 | G.I. Generation |
4 | Albania | 1987 | male | 25-34 years | 9 | 274300 | 3.28 | Albania1987 | NaN | 2,156,624,900 | 796 | Boomers |
2. 나라별 자살률¶
In [39]:
plt.figure(figsize=(10,5))
sns.barplot(x = "country", y = 'suicides/100k pop', hue = "sex", data = df.groupby(["country", "sex"]).sum().reset_index())
plt.show()
x축 글자가 겹친다. 이를 해결하자.
In [41]:
plt.figure(figsize=(20,5))
sns.barplot(x = "country", y = 'suicides/100k pop', hue = "sex", data = df.groupby(["country", "sex"]).sum().reset_index())
plt.xticks(rotation = 90) # 글자 회전 90도
plt.show()
In [94]:
df.groupby(["country"]).sum()
Out[94]:
year | suicides_no | population | suicides/100k pop | HDI for year | gdp_per_capita ($) | |
---|---|---|---|---|---|---|
country | ||||||
Albania | 527796 | 1970 | 62325467 | 924.76 | 32.304 | 490788 |
Antigua and Barbuda | 647832 | 11 | 1990228 | 179.14 | 28.140 | 3385212 |
Argentina | 744000 | 82219 | 1035985431 | 3894.59 | 93.552 | 2944044 |
Armenia | 596832 | 1905 | 77348173 | 976.21 | 66.252 | 558428 |
Aruba | 336720 | 101 | 1259677 | 1596.52 | 0.000 | 4069236 |
... | ... | ... | ... | ... | ... | ... |
United Arab Emirates | 144540 | 622 | 36502275 | 94.89 | 19.800 | 3035664 |
United Kingdom | 744000 | 136805 | 1738767780 | 2790.92 | 103.620 | 11869908 |
United States | 744000 | 1034013 | 8054027201 | 5140.97 | 106.992 | 14608296 |
Uruguay | 672072 | 13138 | 84068943 | 6538.96 | 80.628 | 2561016 |
Uzbekistan | 528348 | 34803 | 486422532 | 2138.17 | 54.600 | 257712 |
101 rows × 6 columns
시각화는 되었지만, 눈에 확 들어오지 않는다. 이것을 자살률이 높은 순서대로 나타내 보자.
In [47]:
country_sucides = df.groupby("country").sum().reset_index()
나라별로 정렬
In [50]:
country_sucides
Out[50]:
country | year | suicides_no | population | suicides/100k pop | HDI for year | gdp_per_capita ($) | |
---|---|---|---|---|---|---|---|
0 | Albania | 527796 | 1970 | 62325467 | 924.76 | 32.304 | 490788 |
1 | Antigua and Barbuda | 647832 | 11 | 1990228 | 179.14 | 28.140 | 3385212 |
2 | Argentina | 744000 | 82219 | 1035985431 | 3894.59 | 93.552 | 2944044 |
3 | Armenia | 596832 | 1905 | 77348173 | 976.21 | 66.252 | 558428 |
4 | Aruba | 336720 | 101 | 1259677 | 1596.52 | 0.000 | 4069236 |
... | ... | ... | ... | ... | ... | ... | ... |
96 | United Arab Emirates | 144540 | 622 | 36502275 | 94.89 | 19.800 | 3035664 |
97 | United Kingdom | 744000 | 136805 | 1738767780 | 2790.92 | 103.620 | 11869908 |
98 | United States | 744000 | 1034013 | 8054027201 | 5140.97 | 106.992 | 14608296 |
99 | Uruguay | 672072 | 13138 | 84068943 | 6538.96 | 80.628 | 2561016 |
100 | Uzbekistan | 528348 | 34803 | 486422532 | 2138.17 | 54.600 | 257712 |
101 rows × 7 columns
오름차순으로 정렬
In [52]:
country_sucides.sort_values(by = "suicides/100k pop", ascending = False)
Out[52]:
country | year | suicides_no | population | suicides/100k pop | HDI for year | gdp_per_capita ($) | |
---|---|---|---|---|---|---|---|
75 | Russian Federation | 648648 | 1209742 | 3690802620 | 11305.13 | 0.000 | 2112096 |
52 | Lithuania | 525420 | 28039 | 68085210 | 10588.88 | 77.136 | 2431504 |
40 | Hungary | 621060 | 73891 | 248644256 | 10156.07 | 77.172 | 2904716 |
47 | Kazakhstan | 624780 | 101546 | 377513869 | 9519.52 | 80.016 | 1662684 |
73 | Republic of Korea | 744000 | 261730 | 1354944936 | 9350.45 | 0.000 | 5506068 |
... | ... | ... | ... | ... | ... | ... | ... |
45 | Jamaica | 407604 | 184 | 39481817 | 106.44 | 50.448 | 673452 |
96 | United Arab Emirates | 144540 | 622 | 36502275 | 94.89 | 19.800 | 3035664 |
65 | Oman | 72396 | 33 | 8987087 | 26.50 | 19.056 | 746664 |
76 | Saint Kitts and Nevis | 71676 | 0 | 117300 | 0.00 | 0.000 | 198900 |
27 | Dominica | 23820 | 0 | 66400 | 0.00 | 0.000 | 17820 |
101 rows × 7 columns
In [53]:
plt.figure(figsize=(20,5))
sns.barplot(x = "country", y = 'suicides/100k pop', data = country_sucides.sort_values(by = "suicides/100k pop", ascending = False))
plt.xticks(rotation = 90) # 글자 회전 90도
plt.show()
- 자살률이 가장 높은 나라는 러시아이고, 그 반대는 도미니카 라는 나라이다.
- 우리나라도 상위권으로 자살률이 5위 이다.
3.자살률이 높은 순서대로 top 10¶
In [54]:
country_sucides.sort_values(by = "suicides/100k pop", ascending = False)[:10]
Out[54]:
country | year | suicides_no | population | suicides/100k pop | HDI for year | gdp_per_capita ($) | |
---|---|---|---|---|---|---|---|
75 | Russian Federation | 648648 | 1209742 | 3690802620 | 11305.13 | 0.000 | 2112096 |
52 | Lithuania | 525420 | 28039 | 68085210 | 10588.88 | 77.136 | 2431504 |
40 | Hungary | 621060 | 73891 | 248644256 | 10156.07 | 77.172 | 2904716 |
47 | Kazakhstan | 624780 | 101546 | 377513869 | 9519.52 | 80.016 | 1662684 |
73 | Republic of Korea | 744000 | 261730 | 1354944936 | 9350.45 | 0.000 | 5506068 |
6 | Austria | 764160 | 50073 | 243853094 | 9076.23 | 101.700 | 13088000 |
95 | Ukraine | 672192 | 319950 | 1286469184 | 8931.66 | 68.496 | 627492 |
46 | Japan | 744000 | 806902 | 3681024844 | 8025.23 | 103.356 | 13539888 |
32 | Finland | 696348 | 33677 | 141925658 | 7924.11 | 92.760 | 12342960 |
12 | Belgium | 744000 | 62761 | 303302621 | 7900.50 | 103.284 | 11928828 |
In [55]:
plt.figure(figsize=(20,5))
sns.barplot(x = "country", y = 'suicides/100k pop', data = country_sucides.sort_values(by = "suicides/100k pop", ascending = False)[:10])
plt.xticks(rotation = 90) # 글자 회전 90도
plt.show()
4.연도별 자살률¶
In [56]:
sns.distplot(df['year'])
/home/goldang/anaconda3/lib/python3.8/site-packages/seaborn/distributions.py:2557: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms). warnings.warn(msg, FutureWarning)
Out[56]:
<AxesSubplot:xlabel='year', ylabel='Density'>
In [57]:
df['year'].value_counts()
Out[57]:
2009 1068 2010 1056 2001 1056 2000 1032 2011 1032 2007 1032 2002 1032 2003 1032 2006 1020 2008 1020 2005 1008 2004 1008 1999 996 2012 972 2013 960 1998 948 1995 936 2014 936 1997 924 1996 924 1994 816 1992 780 1993 780 1991 768 1990 768 2015 744 1987 648 1989 624 1988 588 1986 576 1985 576 2016 160 Name: year, dtype: int64
In [65]:
plt.figure(figsize=(20,5))
sns.countplot('year', data = df)
plt.show()
/home/goldang/anaconda3/lib/python3.8/site-packages/seaborn/_decorators.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation. warnings.warn(
In [66]:
plt.figure(figsize=(20,5))
sns.countplot('year', data = df)
plt.xticks(rotation = 90)
plt.show()
/home/goldang/anaconda3/lib/python3.8/site-packages/seaborn/_decorators.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation. warnings.warn(
- 80~90년대 보다 2000대 들어서 자살률이 늘어 난 것을 확인 할 수 있다.
In [67]:
plt.figure(figsize=(20,5))
sns.stripplot( x = 'year', y = 'suicides/100k pop', data = df)
plt.xticks(rotation = 90)
plt.show()
In [68]:
df
Out[68]:
country | year | sex | age | suicides_no | population | suicides/100k pop | country-year | HDI for year | gdp_for_year ($) | gdp_per_capita ($) | generation | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Albania | 1987 | male | 15-24 years | 21 | 312900 | 6.71 | Albania1987 | NaN | 2,156,624,900 | 796 | Generation X |
1 | Albania | 1987 | male | 35-54 years | 16 | 308000 | 5.19 | Albania1987 | NaN | 2,156,624,900 | 796 | Silent |
2 | Albania | 1987 | female | 15-24 years | 14 | 289700 | 4.83 | Albania1987 | NaN | 2,156,624,900 | 796 | Generation X |
3 | Albania | 1987 | male | 75+ years | 1 | 21800 | 4.59 | Albania1987 | NaN | 2,156,624,900 | 796 | G.I. Generation |
4 | Albania | 1987 | male | 25-34 years | 9 | 274300 | 3.28 | Albania1987 | NaN | 2,156,624,900 | 796 | Boomers |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
27815 | Uzbekistan | 2014 | female | 35-54 years | 107 | 3620833 | 2.96 | Uzbekistan2014 | 0.675 | 63,067,077,179 | 2309 | Generation X |
27816 | Uzbekistan | 2014 | female | 75+ years | 9 | 348465 | 2.58 | Uzbekistan2014 | 0.675 | 63,067,077,179 | 2309 | Silent |
27817 | Uzbekistan | 2014 | male | 5-14 years | 60 | 2762158 | 2.17 | Uzbekistan2014 | 0.675 | 63,067,077,179 | 2309 | Generation Z |
27818 | Uzbekistan | 2014 | female | 5-14 years | 44 | 2631600 | 1.67 | Uzbekistan2014 | 0.675 | 63,067,077,179 | 2309 | Generation Z |
27819 | Uzbekistan | 2014 | female | 55-74 years | 21 | 1438935 | 1.46 | Uzbekistan2014 | 0.675 | 63,067,077,179 | 2309 | Boomers |
27820 rows × 12 columns
In [70]:
df["age"].unique()
Out[70]:
array(['15-24 years', '35-54 years', '75+ years', '25-34 years', '55-74 years', '5-14 years'], dtype=object)
In [71]:
df["age"].value_counts()
Out[71]:
25-34 years 4642 35-54 years 4642 15-24 years 4642 75+ years 4642 55-74 years 4642 5-14 years 4610 Name: age, dtype: int64
In [72]:
df["generation"].unique()
Out[72]:
array(['Generation X', 'Silent', 'G.I. Generation', 'Boomers', 'Millenials', 'Generation Z'], dtype=object)
In [73]:
df["generation"].value_counts()
Out[73]:
Generation X 6408 Silent 6364 Millenials 5844 Boomers 4990 G.I. Generation 2744 Generation Z 1470 Name: generation, dtype: int64
In [74]:
sns.countplot('generation', data = df)
/home/goldang/anaconda3/lib/python3.8/site-packages/seaborn/_decorators.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation. warnings.warn(
Out[74]:
<AxesSubplot:xlabel='generation', ylabel='count'>
파이 형태로 그려보자.
In [75]:
df['generation'].value_counts().plot.pie()
Out[75]:
<AxesSubplot:ylabel='generation'>
In [76]:
sns.boxplot(df['year'])
/home/goldang/anaconda3/lib/python3.8/site-packages/seaborn/_decorators.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation. warnings.warn(
Out[76]:
<AxesSubplot:xlabel='year'>
In [77]:
sns.boxplot(df['suicides/100k pop'])
/home/goldang/anaconda3/lib/python3.8/site-packages/seaborn/_decorators.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation. warnings.warn(
Out[77]:
<AxesSubplot:xlabel='suicides/100k pop'>
In [79]:
sns.jointplot('suicides/100k pop', 'year', data = df)
/home/goldang/anaconda3/lib/python3.8/site-packages/seaborn/_decorators.py:36: FutureWarning: Pass the following variables as keyword args: x, y. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation. warnings.warn(
Out[79]:
<seaborn.axisgrid.JointGrid at 0x7f280c692e20>
조금 변경해서 시각화를 해보자.
In [81]:
sns.jointplot('suicides/100k pop', 'year', data = df, kind = 'kde')
/home/goldang/anaconda3/lib/python3.8/site-packages/seaborn/_decorators.py:36: FutureWarning: Pass the following variables as keyword args: x, y. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation. warnings.warn(
Out[81]:
<seaborn.axisgrid.JointGrid at 0x7f27fa8af6a0>
5.소득과 자살률¶
In [82]:
sns.boxplot(df['gdp_per_capita ($)'])
/home/goldang/anaconda3/lib/python3.8/site-packages/seaborn/_decorators.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation. warnings.warn(
Out[82]:
<AxesSubplot:xlabel='gdp_per_capita ($)'>
In [83]:
sns.distplot(df['gdp_per_capita ($)'])
/home/goldang/anaconda3/lib/python3.8/site-packages/seaborn/distributions.py:2557: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms). warnings.warn(msg, FutureWarning)
Out[83]:
<AxesSubplot:xlabel='gdp_per_capita ($)', ylabel='Density'>
In [84]:
sns.jointplot('suicides/100k pop', 'gdp_per_capita ($)', data = df)
/home/goldang/anaconda3/lib/python3.8/site-packages/seaborn/_decorators.py:36: FutureWarning: Pass the following variables as keyword args: x, y. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation. warnings.warn(
Out[84]:
<seaborn.axisgrid.JointGrid at 0x7f27facb1fd0>
- 소득률이 낮을 수록 자살률이 높다.
'kaggle' 카테고리의 다른 글
의료데이터_심부전증 예방하기(with 캐글) (0) | 2021.08.20 |
---|---|
당뇨병예측,시각화(with 캐글) (0) | 2021.08.20 |
의료데이터 입문(with 캐글) (0) | 2021.08.20 |
kaggle_Bike Sharing Demand[입문용] (0) | 2021.05.22 |
XGBoost 개념 이해 (0) | 2021.05.15 |