Quick—off the top of your head, how many times have you had to write code to move a bunch of data from an external source into your application database?  Was there a requirement that the process be fast?  Yeah, we both know the answer to that one.

Really, shuffling bits is the core of what we do as developers.  And who doesn’t want the process to be fast and efficient?

For simple use cases where you’d prefer to implement your data loading in C# and you need it to be extremely fast and memory-efficient, the Bulk Writer library will fit the bill nicely.

TL;DR Bulk load data into your database in a very fast and memory efficient way using the Bulk Writer library.

Functionality in brief

At its core, Headspring.BulkWriter wraps the SqlBulkCopy class and simplifies the interface so that all you really need to do is define a data model, provide a connection string and an IEnumerable, and the BulkWriter takes care of the rest.

The library provides an extremely simple mapping function to map your model to a database table.  This mapping function will honor attributes from the System.ComponentModel.DataAnnotations.Schema namespace, including TableAttribute, NotMappedAttribute, MaxLengthAttribute, KeyAttribute and ColumnAttribute.

Real world example

I’ve downloaded some weather data from NOAA, which I’d like to get into my local database for further analysis and reporting. This data is formatted as a CSV file, and for my particular weather station (Camp Mabry in Austin, TX) I can get Daily Summary data including precipitation, air temperature and wind data all the way back to 1938 (!).

Using BulkWriter and a library called CsvHelper, the code to load from my CSV file into my database is very compact:

First, the database setup script:
USE master;
GO

CREATE DATABASE BulkWriter.Demo
GO

USE BulkWriter.Demo
GO

CREATE TABLE dbo.WeatherData 
(
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [StationId] [nvarchar](100) NOT NULL,
    [StationName] [nvarchar](100) NOT NULL,
    [ReadingDate] [date] NOT NULL,
    [Precipitation] [decimal] NOT NULL,
    [MinTemperature] [decimal] NOT NULL,
    [MaxTemperature] [decimal] NOT NULL,
    [FastestMileWindDirection] [decimal] NULL,
    [FastestMileWindSpeed] [decimal] NULL,
    [PeakGustWindDirection] [decimal] NULL,
    [PeakGustWindSpeed] [decimal] NULL,
    CONSTRAINT PK_WeatherData PRIMARY KEY CLUSTERED ([Id] ASC)
)
And the C# code:
using System;
using System.ComponentModel.DataAnnotations.Schema;
using System.Globalization;
using System.IO;
using System.Threading.Tasks;
using CsvHelper;
using CsvHelper.Configuration.Attributes;

namespace BulkWriter.Demo
{
    class Program
    {
        static async Task Main(string[] args)
        {
            var filePath = "CampMabryWeatherData.csv";

            using var reader = new StreamReader(filePath);
            using var csv = new CsvReader(reader, CultureInfo.InvariantCulture);
            using var bulkWriter = new BulkWriter<WeatherData>(@"Data Source=.;Database=BulkWriter.Demo;Trusted_Connection=True;Connection Timeout=300")
            {
                BulkCopyTimeout = 0,
                BatchSize = 10000
            };

            var records = csv.GetRecords<WeatherData>();
            await bulkWriter.WriteToDatabaseAsync(records);
        }

        [Table("WeatherData")]
        public class WeatherData
        {
            [Name("STATION")]
            public string StationId { get; set; }

            [Name("NAME")]
            public string StationName { get; set; }

            [Name("DATE")]
            public DateTime ReadingDate { get; set; }

            [Name("PRCP")]
            public decimal Precipitation { get; set; }

            [Name("TMIN")]
            public decimal MinTemperature { get; set; }

            [Name("TMAX")]
            public decimal MaxTemperature { get; set; }

            [Name("WDFM")]
            public decimal? FastestMileWindDirection { get; set; }

            [Name("WSFM")]
            public decimal? FastestMileWindSpeed { get; set; }

            [Name("WDFG")]
            public decimal? PeakGustWindDirection { get; set; }

            [Name("WSFG")]
            public decimal? PeakGustWindSpeed { get; set; }
        }
    }
}

Note that most of the code is actually in defining and setting up the attributes on my model class.  The Name attributes on each property are actually for the CsvHelper to map between column names in the CSV file when reading my records; the Table attribute on the WeatherData class tells is one of the aforementioned attributes out of the System.ComponentModel.DataAnnotations.Schema namespace.

A note on efficiency

BulkWriter is efficient in two separate domains, which is what makes it a powerful solution for custom data loading from code:

First, it’s extremely memory efficient.  Since the BulkWriter takes an IEnumerable as input, we can capitalize on the “streaming” nature of enumerables to ensure we use as little memory as possible when passing our records for writing to the database.  CsvHelper is very complementary in this regard as the CsvReader class only reads one record at a time into memory.  Note that in order to see this benefit, your source enumerable must yield values on demand; if you have already loaded a huge list into memory as your source IEnumerable, you won’t see any additional memory savings.

Second, the heart of BulkWriter, SqlBulkCopy, is highly efficient at batching and loading groups of records all at once. Contrast this with an approach you’ve probably used before (either directly or via your ORM), where an INSERT is issued for each record being loaded to the target table. By batching, you save network round trips for each command and lots of statement processing and log writing for the database server itself.  That all translates into a lot less time to load your data into the database.

In summary

Headspring.BulkWriter provides a very simple interface for bulk loading data from an IEnumerable data source into a target table on your database server via the SqlBulkCopy operation.  When used properly, the process will be much faster and more memory efficient than approaches you may have used in the past.

Let's Talk

Have a tech-oriented question? We'd love to talk.