Linq 新功能 (2) Chunk

這一篇介紹 .NET 6 新增的另一個功能,Chunk

本集提要
  • 框架 : .NET 6
  • 功能 : Chunk
說明

Chunk 的作用很簡單,就是把一個序列分割成多個相同數量的序列 (最後一個會是剩餘數量,會與前面的數量不同),這個方法執行後回傳一個 IEnumerable<T[]> 。

例如有一個以下的序列,其中有十三個元素:

 List<Person> people = new List<Person>
     {
         new Person { Name = "John", Age = 21 },
         new Person { Name = "Alex", Age = 34 },
         new Person { Name = "Mary", Age = 29 },
         new Person { Name = "Sophia", Age = 24 },
         new Person { Name = "Michael", Age = 40 },
         new Person { Name = "Emma", Age = 26 },
         new Person { Name = "Daniel", Age = 33 },
         new Person { Name = "Olivia", Age = 28 },
         new Person { Name = "James", Age = 45 },
         new Person { Name = "Isabella", Age = 30 },
         new Person { Name = "Benjamin", Age = 31 },
         new Person { Name = "Mia", Age = 32 },
         new Person { Name = "Lucas", Age = 27 }
     };

要將這個 List<Person> 以三個為一單位分割,並顯示結果:

 IEnumerable<Person[]> chunkedPeople = people.Chunk(3);
 int index = 1;
 foreach (var chunk in chunkedPeople)
 {
     Console.WriteLine($"Chunk {index}: {string.Join(", ", chunk.Select(p => p.Name))}");
     index++;
 }
Chunk 1: John, Alex, Mary
Chunk 2: Sophia, Michael, Emma
Chunk 3: Daniel, Olivia, James
Chunk 4: Isabella, Benjamin, Mia
Chunk 5: Lucas

如果沒有 Chunk ,那我們可能得這麼搞:

 static IEnumerable<T[]> CustomChunk<T>(IEnumerable<T> source, int size)
 {
     if (source == null)
     {
         throw new ArgumentNullException(nameof(source));
     }
     var queue = new Queue<T>(source);
     while (queue.Count > 0)
     {
         var chunk = new T[Math.Min(size, queue.Count)];
         for (int i = 0; i < chunk.Length; i++)
         {
             chunk[i] = queue.Dequeue();
         }
         yield return chunk;
     }
 }

上述的範例在此。

Benchmark

附帶也寫了個 Benchmark 測試:

 internal class Program
 {
     static void Main(string[] args)
     {
         var summary = BenchmarkRunner.Run<ChunkBenchmark>();
     }
 }

 [MemoryDiagnoser]
 public class ChunkBenchmark
 {
     private List<Person> _people;

     [GlobalSetup]
     public void Setup()
     {
         var random = new Random();
         _people = Enumerable.Range(1, 1000).Select(i => new Person
         {
             Name = $"Name_{i}",
             Age = random.Next(10, 81)
         }).ToList();
     }

     [Benchmark]
     [Arguments(3)]
     [Arguments(7)]
     [Arguments(43)]
     public void CallChunk(int size)
     {
         var result = _people.Chunk(size).ToList();
     }

     [Benchmark]
     [Arguments(3)]
     [Arguments(7)]
     [Arguments(43)]
     public void CallCustomChunk(int size)
     {
         var result = CustomChunk(_people, size).ToList();
     }

     static IEnumerable<T[]> CustomChunk<T>(IEnumerable<T> source, int size)
     {
         if (source == null)
         {
             throw new ArgumentNullException(nameof(source));
         }
         var queue = new Queue<T>(source);
         while (queue.Count > 0)
         {
             var chunk = new T[Math.Min(size, queue.Count)];
             for (int i = 0; i < chunk.Length; i++)
             {
                 chunk[i] = queue.Dequeue();
             }
             yield return chunk;
         }
     }
 }

 public class Person
 {
     public string Name { get; set; }
     public int Age { get; set; }
 }

結果有點意外,自己搞出來的 CustomChunk 的效能比較好,不過記憶體消耗比較大,或許有可能是我程式碼考慮的不夠周詳,又或是微軟的程式碼裡用了 Array.Resize 的緣故。

// * Summary *

BenchmarkDotNet v0.14.0, Windows 11 (10.0.22631.4751/23H2/2023Update/SunValley3)
12th Gen Intel Core i7-1265U, 1 CPU, 12 logical and 10 physical cores
.NET SDK 9.0.200-preview.0.25057.12
  [Host]     : .NET 9.0.1 (9.0.124.61010), X64 RyuJIT AVX2
  DefaultJob : .NET 9.0.1 (9.0.124.61010), X64 RyuJIT AVX2


| Method          | size | Mean      | Error     | StdDev    | Gen0   | Gen1   | Allocated |
|---------------- |----- |----------:|----------:|----------:|-------:|-------:|----------:|
| CallChunk       | 3    | 10.981 us | 0.1972 us | 0.2192 us | 3.9063 | 0.2899 |  23.98 KB |
| CallCustomChunk | 3    |  6.380 us | 0.1259 us | 0.1177 us | 5.1804 | 0.5112 |  31.77 KB |
| CallChunk       | 7    |  8.072 us | 0.0886 us | 0.0829 us | 2.5330 | 0.1221 |  15.57 KB |
| CallCustomChunk | 7    |  4.426 us | 0.0506 us | 0.0448 us | 3.7994 | 0.3128 |  23.27 KB |
| CallChunk       | 43   |  6.759 us | 0.0616 us | 0.0515 us | 1.6251 | 0.0534 |     10 KB |
| CallCustomChunk | 43   |  3.430 us | 0.0288 us | 0.0256 us | 2.7618 | 0.1717 |  16.91 KB |


// * Summary *

BenchmarkDotNet v0.14.0, Windows 11 (10.0.22631.4751/23H2/2023Update/SunValley3)
12th Gen Intel Core i7-1265U, 1 CPU, 12 logical and 10 physical cores
.NET SDK 9.0.200-preview.0.25057.12
  [Host]     : .NET 9.0.1 (9.0.124.61010), X64 RyuJIT AVX2
  DefaultJob : .NET 9.0.1 (9.0.124.61010), X64 RyuJIT AVX2


| Method          | size | Mean      | Error     | StdDev    | Gen0   | Gen1   | Allocated |
|---------------- |----- |----------:|----------:|----------:|-------:|-------:|----------:|
| CallChunk       | 3    | 10.532 us | 0.1145 us | 0.1071 us | 3.9063 | 0.2899 |  23.98 KB |
| CallCustomChunk | 3    |  6.557 us | 0.1219 us | 0.1140 us | 5.1804 | 0.5112 |  31.77 KB |
| CallChunk       | 7    |  8.208 us | 0.1490 us | 0.2363 us | 2.5330 | 0.1221 |  15.57 KB |
| CallCustomChunk | 7    |  4.501 us | 0.0877 us | 0.1009 us | 3.7994 | 0.3128 |  23.27 KB |
| CallChunk       | 43   |  6.835 us | 0.1349 us | 0.1385 us | 1.6251 | 0.0534 |     10 KB |
| CallCustomChunk | 43   |  3.436 us | 0.0654 us | 0.0778 us | 2.7618 | 0.1717 |  16.91 KB |

Benchmark 的程式碼在此