Complete Python Pandas Data Science Tutorial! (Reading CSV/Excel files, Sorting, Filtering, Groupby)
官網下載 install Python 3
pip install pandas (Pandas 是 python 的一個數據分析 lib)
pip install jupyterlab (Jupyter Notebook web-based 的開發 編輯 執行程式)
cmd 執行 jupyter notebook 稍等會跳瀏覽器 右上 new 可選 Python 3
read_excel 出錯 ImportError: Missing optional dependency 'xlrd'. Install xlrd >= 1.0.0 for Excel support Use pip or conda to install xlrd.
pip install xlrd
pip install openpyxl
jupyter notebook 的Title效果 新增in 然後 code下拉改成Markdown 然後打 #Title 執行 這樣就會變成
jupyter notebook 預設的檔案路徑 是 C:\Users\帳號名\
# Loading data into Pandas
import pandas as pd
df = pd.read_csv('pokemon_data.csv')
print(df.head(3))
#df2 = pd.read_excel('pokemon_data.xlsx')
#print(df2.tail(3))
#df3 = pd.read_csv('pokemon_data.txt',delimiter='\t')
#print(df3.head(3))
#-------------------------------------------------
#read Headers
#print(df.columns)
#read each column
#print(df['Name'][0:5])
#print(df[['Name','Type 1','HP']][0:2])
#read each row
#print(df.iloc[1])
#print(df.iloc[0:4])
#for index,row in df[0:3].iterrows():
# print(index,row['Name'])
df.loc[df['Type 1'] == 'Grass']
#read a specific laoation (R,C)
#print(df.iloc[1,1])
#-------------------------------------------------
#Sorting/Describing Data
#df
#df.describe()
#df.sort_values('Name',ascending=False)
df.sort_values(['Type 1','HP'],ascending=[1,0])
#-------------------------------------------------
df['Attack-Defense'] = df['Attack'] - df['Defense']
df['Defense-Attack'] = df['Defense'] - df['Attack']
df = df.drop(columns=['Attack-Defense'])
df['HP+Attack'] =df.iloc[:,4:6].sum(axis=1)
df[0:5]
#-------------------------------------------------
df.to_csv('TestBySam.csv',index=False)
df.to_excel('TestBySam.xlsx',index=False)
df.to_csv('TestBySam.txt',index=False,sep='\t')
#-------------------------------------------------
#new_df = df.loc[(df['Type 1'] == 'Grass') & (df['Type 2'] == 'Poison') & (df['HP'] > 70)]
#new_df.reset_index(drop=True,inplace=True)
#new_df
#df.loc[df['Name'].str.contains('Mega')]
#~ not contains
#df.loc[~df['Name'].str.contains('Mega')]
#-------------------------------------------------
import re
df.loc[df['Name'].str.contains('^pi[a-z]*',flags=re.I,regex=True)]
#-------------------------------------------------
df.groupby(['Type 1']).mean().sort_values('Attack',ascending = False)
df.groupby(['Type 1']).sum()
df.groupby(['Type 1']).count()
df.groupby(['Type 1','Type 2']).count()['#']
#-------------------------------------------------
for chunkdf in pd.read_csv('pokemon_data.csv',chunksize=5):
print("ChunkSize")
print(chunkdf)
Python 知識
python 執行 預設是REPL , Read Eval Print Loop
pip 安裝第三方套件
ipython 交互式shell ( 還不懂 先筆記)
WebApi => Flask
Website => Django
如果內容有誤請多鞭策謝謝