[SQL SERVER][Performance]刪除重複資料效能比較
我在 Plurk 看到有篇文章標題如下
由於看到快速兩字便引起我高度興趣,但我看了該格主所分享的方法後(方法一和方法三好像相同..Orz),
心中產生很大疑問,原因無他,就是使用迴圈的做法,
所以我拿格主第2和第3種方法並和自己的方法做了以下測試…
先建立測試資料
--方法2
declare @recorddatetime datetime
declare @xname varchar(50)
declare @xid int
declare @no int
declare @start time(7),@end time(7),@elaspedtime numeric(20,7)
set @start=GETDATE()
DECLARE DeleteRepeat_cursor CURSOR FOR
With CTETable(xid,nostr)
as
(
Select c1,count(c1) AS no
from ##mytable
group by c1
)
Select * from CTETable where nostr>1
OPEN DeleteRepeat_cursor
FETCH NEXT FROM DeleteRepeat_cursor INTO @xid,@NO
WHILE @@FETCH_STATUS = 0
BEGIN
delete top(@NO-1) from ##mytable where c1=@xid
FETCH NEXT FROM DeleteRepeat_cursor INTO @xid,@NO
End
CLOSE DeleteRepeat_cursor
DEALLOCATE DeleteRepeat_cursor
set @end=GETDATE()
set @elaspedtime=convert(int, datediff(ms, @start, @end));
select @elaspedtime as '花費時間'
花費時間(ms)=173
ps.把邏輯或實體刪除作業包在迴圈中並使用Cursor,無疑這只會產生更多Lock,並耗費更多時間,
--方法3
--先新增資料
declare @start time(7),@end time(7),@elaspedtime numeric(20,7)
set @start=GETDATE()
while 1=1
begin
delete top(1) from ##mytable
where c1 in
( select c1 from ##mytable
group by c1
having COUNT(*)>1 );
if @@ROWCOUNT=0 break
end
set @end=GETDATE()
set @elaspedtime=convert(int, datediff(ms, @start, @end));
select @elaspedtime as '花費時間'
--查看結果
select * from ##mytable
花費時間(ms)=20000
ps:每一次的刪除作業就得先在子查詢取得鍵值,直到@@ROWCOUNT=0才跳出,
這樣的做法效能奇差無比,但如果你需要一個燒機方法,這方法應該算首選…XD
--我的方法
--先新增資料
declare @start time(7),@end time(7),@elaspedtime numeric(20,7)
set @start=GETDATE()
;with mycte(c1,c2,row)
as
(
select *,ROW_NUMBER() over(partition by c1 order by c1 ) as 'row'
from ##mytable
)
delete mycte where row>1
set @end=GETDATE()
set @elaspedtime=convert(int, datediff(ms, @start, @end));
select @elaspedtime as '花費時間'
--查看結果
select * from ##mytable
花費時間(ms)=6
測試結果總表
最後~如果你還有更快速的做法,還請在告訴我一下,謝謝.