[SQL SERVER]提高資料處理效能

2014-10-21

10041
0
SQL_SERVER

很久以前我個人有寫一篇 [C#][SQL SERVER] 提高 Insert 效能，

基本上要大幅提高資料處理效能，我下面列幾個較重要影響因子

ps:我測試資料庫復原模式採用完整，當處理大量資料時，建議改成大量紀錄以最小化Transaction log為主

1.移除索引(constraint、triggers)

我提供個人實際使用SSIS執行ETL轉約2千萬筆資料

heap table:約 4分鐘 (cpu*2 、ram:8G)

nonheap table:約23分鐘 (cpu*2 、ram:8G)

nonheap table with nonclustered index(*3): 約600分鐘 (cpu*2 、ram:8G)

你可以看到2千萬筆資料新增到空資料表只花不到4分鐘，

如果table有1個clustered index要花23分鐘，

最慘就是總共有4各index(clustered index *1+ nonclustered index*3)花了快10小時才搞定，

所以當你要轉移大量資料時，建議移除所有索引，等資料轉好後再建立索引。

ps:這裡不考慮網路頻寬問題

2.批次處理

盡量避免row by row處理方式，我常看新增100筆資料的邏輯就是for loop 100次，

row by row 處理可說是關聯資料庫效能殺手，

下面簡單測試批次處理。

目前資料庫交易紀錄檔大小

批次處理(一百萬筆一包直接下，但要注意長時間交易)

總共花費約2.6秒

執行後交易紀錄檔大小

批次處理(20萬筆批次)

資料庫交易紀錄檔大小

DECLARE @batchcount int,@stime varchar(20),@etime varchar(20)
set @batchcount=200000
BEGIN try
set @stime=convert(varchar(20),getdate(),120)
print 'starttiem:'+@stime
    WHILE(@batchcount<=1000000)
    begin        
        insert into mytest (EMP_ID,EMP_NAME,SDATE,STIME,EMP_DEP)
        select top(200000) EMP_ID,EMP_NAME,SDATE,STIME,EMP_DEP
       from rsa241_20130821 s 
       where not exists
       (
       select 1 from mytest where EMP_ID = s.EMP_ID 
       and EMP_NAME=s.EMP_NAME and SDATE=s.SDATE and  STIME=s.STIME
       and EMP_DEP=s.EMP_DEP
       )     
        set @batchcount=@batchcount+200000          
    end
    set @etime=convert(varchar(20),getdate(),120)
print 'endtime:'+@etime
print '執行時間(ms):'+ cast(DATEDIFF(ms,@stime,@etime) as varchar)
end try
begin catch
 print error_message();
end catch