通過 MiniExcel 批次匯入/匯出

最近有機會又要操作 Excel,很久以前就知道 MiniExcel,趁假日有機會來把玩一下,這次的重點是研究批次讀寫的使用方式,看看有沒有機會降低一次載入大量 Excel 所造成的記憶體損耗。除了 Excel 之外,它也支援 Csv 呢。還沒開始之前看一下效能比較圖。

開發環境

定義型別

跟其他的 Excel 套件一樣,這裡也是用 Attribute 定義欄位名稱、寬度...等等

class Data
{
    [ExcelColumnIndex("A")]
    [ExcelColumnName("商品名稱")]
    public string SaleName { get; set; }

    [ExcelColumnIndex("B")]
    [ExcelColumnName("規格")]
    public string Option { get; set; }

    [ExcelColumnIndex("C")]
    [ExcelColumnName("編號")]
    public string Id { get; set; }

    [ExcelColumnIndex("D")]
    [ExcelColumnName("Code")]
    public string Code { get; set; }

    [ExcelColumnIndex("E")]
    [ExcelColumnName("庫存")]
    public string StockQty { get; set; }
}

更多的細節參考
MiniExcel/README.zh-Hant.md at master · mini-software/MiniExcel (github.com)

讀檔

  • 支援強型別集合(IEnumerable<T>)、dynamic 集合
  • 支援延遲載入
  • 讀檔時,下列方法擇一使用
    • var inputRows = await MiniExcel.QueryAsync<Data>(inputPath);
    • var inputRows = await inputStream.QueryAsync<Data>();
[Fact]
public async Task 批次匯入大檔案()
{
    //import large excel file
    var inputPath = "10萬.xlsx";
    var chunkSize = 1024;

    await using var inputStream = File.OpenRead(inputPath);
    var results = new List<Data>();

    // chunk read the data
    var inputRows = await inputStream.QueryAsync<Data>();
    foreach (var chunks in inputRows.Chunk(chunkSize))
    {
        results.AddRange(chunks);
    }
}

官方文件強調

 除非必要請不要使用 ToList 等方法讀取全部資料到記憶體

 

存檔

  • 支援強型別、匿名型別、字典集合、IEnumerable<IDictionary<string, object>>)
  • 存檔時,下列方法擇一
    • await MiniExcel.SaveAsAsync(outputPath, value, overwriteFile: true);
    • await outputStream.SaveAsAsync(value);
public async Task 強型別另存會員表()
{
    var outputPath = "MemberResult.xlsx";

    await using var outputStream = File.Open(outputPath, FileMode.OpenOrCreate, FileAccess.ReadWrite);
    var value = new List<Member>()
    {
        new()
        {
            Id = "1",
            Name = "Alice",
            Birthday = DateTime.Now,
            Age = 25,
            Phone = "1234567890",
            Reason = null,
        },
        new()
        {
            Id = "2",
            Name = "Bob",
            Birthday = DateTime.Now,
            Age = 35,
            Phone = "1234567890",
            Reason = "年齡超過30",
        }
    };

    // await MiniExcel.SaveAsAsync(outputPath, value, overwriteFile: true);
    await outputStream.SaveAsAsync(value);
}

批次匯入並另存範本檔

在 Excel 定義模板 {{變量名稱}}, 或是集合渲染 {{集合名稱.欄位名稱}}

 

定義型別

class Member
{
    [ExcelColumnName(excelColumnName: "編號", aliases: ["MemberId"])]
    public string Id { get; set; }

    [ExcelColumnName(excelColumnName: "姓名", aliases: ["FullName"])]
    public string Name { get; set; }

    [ExcelColumnName(excelColumnName: "生日")]
    public DateTime Birthday { get; set; }

    // [ExcelColumnName(excelColumnName: "年齡", aliases: ["Age"])]
    [DisplayName("年齡")]
    public int Age { get; set; }

    [DisplayName("電話")]

    public string Phone { get; set; }

    [DisplayName("失敗原因")]
    public string Reason { get; set; }
}

 

先用 QueryAsync 讀檔,每批讀兩筆,再用 SaveAsByTemplateAsync 另存新檔

[Fact]
public async Task 批次匯入後批次填充範本()
{
    //import excel file
    var inputPath = "Import.xlsx";
    var outputPath = "MemberResult.xlsx";
    var templatePath = "Template/Member.xlsx";
    var chunkSize = 2;

    await using var inputStream = File.OpenRead(inputPath);
    var inputRows = await inputStream.QueryAsync<Member>();
    var results = new List<Member>();
    foreach (var chunks in inputRows.Chunk(chunkSize))
    {
        foreach (var row in chunks)
        {
            var age = (DateTime.Now - row.Birthday).TotalDays / 365.25;
            if (age > 30)
            {
                row.Reason = "年齡超過30";
            }

            row.Age = (int)age;
        }

        results.AddRange(chunks);
        var value = new
        {
            Members = results
        };

        //Append to the same file
        await MiniExcel.SaveAsByTemplateAsync(outputPath, templatePath, value);
    }
}


每次兩筆兩筆的寫入同一個檔案,最終有 10 筆,這好棒

 

經實驗,當筆數大時,這樣的寫法檔案會產不出來,需要把存檔的動作放在迴圈外面

[Fact]
public async Task 產生大資料填充範本()
{
    //import excel file
    var outputPath = "MemberResult.xlsx";
    var templatePath = "Template/Member.xlsx";
    var chunkSize = 128;
    
    //generate 10000000 member row
    var inputRows = Enumerable.Range(1, 600000).Select(x => new Member
    {
        Id = x.ToString(),
        Name = "Name" + x,
        Birthday = DateTime.Now,
        Phone = "1234567890"
    });
    var results = new List<Member>();
    foreach (var chunks in inputRows.Chunk(chunkSize))
    {
        foreach (var row in chunks)
        {
            var age = (DateTime.Now - row.Birthday).TotalDays / 365.25;
            if (age > 30)
            {
                row.Reason = "年齡超過30";
            }

            row.Age = (int)age;
        }

        results.AddRange(chunks);
    }

    var value = new
    {
        Members = results
    };

    //Append to the same file
    MiniExcel.SaveAsByTemplate(outputPath, templatePath, value);
}

 

SaveAsByTemplate vs SaveAs 比較

比較了一下,填充範本跟直接匯出所需要的時間,兩者的花費的時間差蠻多的,匯出 800000 筆的場景

MiniExcel.SaveAsByTemplate() 約花費 40sec

MiniExcel.SaveAs() 約花費 7sec

結論

最讓我驚豔的就是批次匯入,這是過往我沒有過的使用體驗,往往碰到大一點的檔案,記憶體就被吃掉一大票,現在就看看在真實現場的表現如何

範例位置

sample.dotblog/Excel/Lab.MiniExcelQuery at master · yaochangyu/sample.dotblog (github.com)

若有謬誤,煩請告知,新手發帖請多包涵


Microsoft MVP Award 2010~2017 C# 第四季
Microsoft MVP Award 2018~2022 .NET

Image result for microsoft+mvp+logo