[私人筆記]Python 知識 + Pandas

  • 186
  • 0
  • 2021-02-21

Complete Python Pandas Data Science Tutorial! (Reading CSV/Excel files, Sorting, Filtering, Groupby)

官網下載 install Python 3  

pip install pandas  (Pandas 是 python 的一個數據分析 lib)

測試資料

pip install jupyterlab  (Jupyter Notebook web-based 的開發 編輯 執行程式)

cmd 執行 jupyter notebook  稍等會跳瀏覽器 右上 new 可選 Python 3

read_excel 出錯 ImportError: Missing optional dependency 'xlrd'. Install xlrd >= 1.0.0 for Excel support Use pip or conda to install xlrd.

pip install xlrd

pip install openpyxl

jupyter notebook 的Title效果 新增in 然後 code下拉改成Markdown 然後打 #Title 執行 這樣就會變成 

jupyter notebook 預設的檔案路徑 是  C:\Users\帳號名\

# Loading data into Pandas
import pandas as pd

df = pd.read_csv('pokemon_data.csv')
print(df.head(3))

#df2 = pd.read_excel('pokemon_data.xlsx')
#print(df2.tail(3))

#df3 = pd.read_csv('pokemon_data.txt',delimiter='\t')
#print(df3.head(3))

#-------------------------------------------------
#read Headers
#print(df.columns)

#read each column
#print(df['Name'][0:5])
#print(df[['Name','Type 1','HP']][0:2])

#read each row
#print(df.iloc[1])
#print(df.iloc[0:4])
#for index,row in df[0:3].iterrows():
#    print(index,row['Name'])

df.loc[df['Type 1'] == 'Grass']
    
    
#read a specific laoation (R,C)
#print(df.iloc[1,1])
#-------------------------------------------------
#Sorting/Describing Data
#df
#df.describe()
#df.sort_values('Name',ascending=False)
df.sort_values(['Type 1','HP'],ascending=[1,0])
#-------------------------------------------------
df['Attack-Defense'] = df['Attack'] - df['Defense']
df['Defense-Attack'] = df['Defense'] - df['Attack']
df = df.drop(columns=['Attack-Defense'])
df['HP+Attack'] =df.iloc[:,4:6].sum(axis=1)
df[0:5]
#-------------------------------------------------
df.to_csv('TestBySam.csv',index=False)
df.to_excel('TestBySam.xlsx',index=False)
df.to_csv('TestBySam.txt',index=False,sep='\t')
#-------------------------------------------------
#new_df = df.loc[(df['Type 1'] == 'Grass') & (df['Type 2'] == 'Poison') & (df['HP'] > 70)]
#new_df.reset_index(drop=True,inplace=True)
#new_df
#df.loc[df['Name'].str.contains('Mega')]
#~ not contains
#df.loc[~df['Name'].str.contains('Mega')]
#-------------------------------------------------
import re
df.loc[df['Name'].str.contains('^pi[a-z]*',flags=re.I,regex=True)]
#-------------------------------------------------
df.groupby(['Type 1']).mean().sort_values('Attack',ascending = False)

df.groupby(['Type 1']).sum()

df.groupby(['Type 1']).count()


df.groupby(['Type 1','Type 2']).count()['#']
#-------------------------------------------------
for chunkdf in pd.read_csv('pokemon_data.csv',chunksize=5):
    print("ChunkSize")
    print(chunkdf)

Python 知識

python 執行 預設是REPL , Read Eval Print Loop

pip 安裝第三方套件

ipython 交互式shell ( 還不懂 先筆記)

WebApi => Flask

Website => Django

如果內容有誤請多鞭策謝謝