[SQL Server]hash index

2017-10-15

1824
0
SQL Server 2016
2017-11-03

Hash index只能建立在In-memory table，並且提供point更快速的搜尋效能，

SQL2016開始可透過rebuild index方式修改bucket_count，無須drop and create。

查詢使用須知

避免Rang query，適用point query
避免order by 操作，因為hash在記憶體中都不會排序
所有的hash key都必須要存在where 子句中

Note:每個bucket使用 8 bytes

當建立hash index，SQL Server會透過hash function把指定的hash key進行hash，

這過程中有可能會發生hash collision，所以我們應該要盡量使用唯一性高的key(無法完全避免)，

來減少發生hash collision頻率和bucked link list長度。

From Microsoft

假設我建立一個hash index並有 4 個bucket，下面我大概簡單說明這過程

如果這時候我在新增一筆FIFI的資料，

那麼透過hash function就會被分配到bucket=4的address，這時候就會發生hash collision，如下圖

你可以看到，就算該index key唯一性高，但還是會有機率發生hash collision並造成長的bucked link list，

這主要對我們的query of point和batch insert效能影響很大。

Bucked count建議大小=唯一索引鍵值1.5~2倍，因為太大浪費記憶體，太小影響效能，

而SQL Server在建立bucket count是以2的次方來處理，下面我簡單驗證這數字不是我自己唬爛的。

while @maxloop <= 50 
begin
    -- build dynamic string with table name and bucket count having loop counter
    set @mysql = N'

    CREATE TABLE buckedcounttest'+ CONVERT(varchar(10), @maxloop)+'
    (
    ID INT NOT NULL PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = '+CONVERT(varchar(10), @maxloop) +'), --each bucket_count
    EName VARCHAR(20) NOT NULL,
    CName VARCHAR(20) NOT NULL
    ) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA)'
   
    Exec (@mysql)
  
    set @maxloop = @maxloop+1
end
note:schema_and_data:must have a PK, schema_only:must have at least on index or a PK。

Select  object_name( [object_id]) AS 'Table Name',
        bucket_count AS 'Number of Buckets'
from  sys.hash_indexes
WHERE object_name( [object_id]) LIKE N'buckedcounttest%'
order by bucket_count, [object_id] asc

你可以看到我雖然最大指定50，但SQL Server所建立的bucket_count有效值卻是64，

我們可以透過下面script來驗算這數字

DECLARE @BucketNumber int = 50 ;

DECLARE @BucketsCreatedBySQL int = 0;
-- Check of @BucketNumber is power of 2. if yes, that's the bucket number

IF (LOG(@BucketNumber,2) = ROUND(LOG(@BucketNumber,2),0))    
    SET @BucketsCreatedBySQL = @BucketNumber;
ELSE
-- If not power of 2, round off to next power of 2
    SET @BucketsCreatedBySQL = POWER( 2, floor(LOG(@BucketNumber,2))+1); 
SELECT @BucketsCreatedBySQL [BucketsCreatedBySQL]

--query hash index statistics
SELECT Object_name(his.object_id) AS 'Table Name', 
       idx.name AS 'Index Name', 
       total_bucket_count AS 'total buckets', 
       empty_bucket_count AS 'empty buckets',            
       total_bucket_count - empty_bucket_count AS 'used buckets', 
       avg_chain_length AS 'avg chain length', 
       max_chain_length AS 'max chain length', 
       90000 AS 'Rows - hardcoded value' 
FROM   sys.dm_db_xtp_hash_index_stats as his 
       JOIN sys.indexes as idx 
       ON his.object_id = idx.object_id 
       AND his.index_id = idx.index_id;

參考

Hash Indexes for Memory-Optimized Tables

sys.dm_db_xtp_hash_index_stats (Transact-SQL)

Determining the Correct Bucket Count for Hash Indexes

Table and Row Size in Memory-Optimized Tables

In-Memory

回首頁

RiCo技術農場

後續文章將發佈在https://medium.com/ricos-note

[SQL Server]hash index

RiCo技術農場粉絲頁

標籤雲

系列文章

最新留言