Biggy Available as Pre-Release on Nuget

Posted on December 22 2014 10:41 AM by jatten in Biggy, C#, Database   ||   Comments (0)

super-charger-240Biggy is a very fast, synchronized, in-memory document/relational query tool for .NET.

If you have been following the Biggy project over the past year, you are aware that there has been a rapid evolution of ideas, changes in structure, and changes in stewardship.

Biggy was originally an experiment by Rob Conery, a man who likes to challenge convention, break rules, and in general stir things up in a way which causes folks to re-think some closely held conventions. All of the above are why I follow Rob's blog, Twit-stream, and Github repo.

Image by fortfan  |  Some Rights Reserved

Plus, he's just a damn fine fellow.

Rob recently withdrew from active development on Biggy, and I am now doing my best to shepherd this interesting project. We'll see how this goes. My goal is to try to maintain the spirit of the original project, and find some happy medium between what I think Rob would have done, what I would like to see, and what the community wants (if anything) from this unique data access and query tool. I'll have more to say on that in another post, but I had a fantastic time working on this with Rob, and as you might imagine, I learned a lot.

On that note, I am pleased to announce that there is now a pre-release version of Biggy available on Nuget.

Getting to Know Biggy

Biggy works by loading all of the data in your store (be it a flat JSON file, or from a relational database) into memory, which makes querying super fast. Also, because your data is deserialized into POCO objects behind and ICollection<T> interface, it is all queryable using LINQ.

This works with relational data in the way you might expect, but it really shines when you work with JSON documents. Complex, nested JSON documents are now queryable at blazing speed.

In its current form, Biggy is comprised of a core package, and data extension packages specific to each supported backing store. Biggy directly supports simple file-based JSON as a store, as well as SQLite and Postgresql relational data bases. However, Biggy is designed to be extensible, so alternative stores can be supported by implementing the IDataStore interface found in Biggy.Core.

The basic Nuget packages look like this:

The store-specific packages all take a dependency on Biggy.Core, and it is not necessary to pull down Biggy.Core as a separate package, unless you intend to implement your own store against the interfaces and base classes therein.

Inject Stores into Lists

At present, the underpinning of how Biggy works is based on the idea of synchronizing an in-memory list with a backing store. An implementation of IDataStore<T> knows how to talk to a database of some sort (even if it may be a flat JSON file). The BiggyList<T> , an in-memory implementation of ICollection<T>, doesn't know or care what specific store it is working with, it just knows how to implement the ICollection interface (plus a few additional methods) and present a LINQ-queryable API to the world.

The relationship between the two is achieved by injecting an instance of IDataStore<T> into an instance of BiggyList<T> upon instantiation. There are a number of ways to do this, depending upon the needs of your application.

A Simple How-To: File-Based JSON Data

We'll take a quick look at the very basics of using Biggy using a Visual Studio Console Application. To get started, you'll need to create a new Console Application, and pull down the Biggy.Data.Json package:

Get the Biggy.Data.Json Package from Nuget:
PM> Install-Package Biggy.Data.Json

 

Now we can work against flat-file JSON data.

The IDataStore<T> interface offers a fairly simple CRUD interface to the world. We can work directly with an implementation of IDataStore<T> to Add, Update, and Delete records, and to read the entirety of the store into an IEnumerable<T> for use in our code.

Since we have started with the JSON implementation, let's take a look at how the JsonStore<T> works on it's own.

We will need to add the following namespaces to the usings at the top of our code file:

Add Biggy References to the Program.cs Code File:
using Biggy.Core;
using Biggy.Extensions;
using Biggy.Data.Json;

 

Then, we might do the following:

Add a Document to the Json Store:
public class ArtistDocument
{
    public ArtistDocument()
    {
        this.Albums = new List<AlbumDocument>();
    }
 
    [PrimaryKey(Auto: false)]
    public int ArtistDocumentId { get; set; }
    public string Name { get; set; }
    public List<AlbumDocument> Albums;
}
 
 
public partial class AlbumDocument
{
    public AlbumDocument()
    {
        this.Tracks = new List<Track>();
    }
 
    public int AlbumId { get; set; }
    public string Title { get; set; }
    public int ArtistId { get; set; }
    public virtual List<Track> Tracks { get; set; }
}
 
 
class Program
{
    static void Main(string[] args)
    {
        var jsonArtistStore = new JsonStore<ArtistDocument>();
        var newArtist = new ArtistDocument
        {
            ArtistDocumentId = 1,
            Name = "Nirvana"
        };
        newArtist.Albums.Add(new AlbumDocument
        {
            AlbumId = 1,
            ArtistId = 1,
            Title = "Bleach"
        });
        newArtist.Albums.Add(new AlbumDocument
        {
            AlbumId = 2,
            ArtistId = 1,
            Title = "Incesticide"
        });
        jsonArtistStore.Add(newArtist);
    }
}

 

If we run this code, the following happen behind the scenes:

  • By default, new directory is created in our project root named ~\Data\Json\BiggyDemo1
  • A file is created in that directory, named artistdocuments.json
  • The new artist data is persisted as raw JSON in the artistdocuments.json file

If we crack open the file, we find:

Contents of Artist Documents JSON File:
[ {
    "Albums": [
      { "AlbumId": 1, "Title": "Bleach", "ArtistId": 1, "Tracks": [ ] },
      { "AlbumId": 2, "Title": "Incesticide", "ArtistId": 1, "Tracks": [ ]
      } ],
    "ArtistDocumentId": 1,
    "Name": "Nirvana"
  } ]

 

Yup. The POCO objects we used in our .NET code have been serialized to JSON and saved in the file.

We can use the JsonStore<T> this way to our heart's content if we like, but that's not really the point of Biggy. Instead, let's use our store in conjunction with a BiggyList<T> instance, and get some real work done.

Inject IDataStore into BiggyList

Biggy is intended to present an in-memory representation of the data in your store, so you can query away using LINQ, make additions, updates, and deletions, and the data in memory will remain in sync with your backing store.

Let's expand on what we did above, and inject a store into a BiggyList<T>. In this case, we will use some data from the Chinook Database which I have used to create a full set of JSON artist documents, including albums and tracks for each artist. There are 275 artists in the JSON data set, including even more albums (nested under each artist), and several thousand tracks (nested under the appropriate album).

For example, the JSON data for a single artist in the modified Chinook sample data looks like this:

Singe Sample JSON Artist Document Record Using Modified Chinook Data:
[
  {
    "Albums": [
      {
        "AlbumId": 1,
        "Title": "For Those About To Rock We Salute You",
        "ArtistId": 1,
        "Tracks": [
          { "TrackId": 1, "AlbumId": 1, "Name": "For Those About To Rock (We Salute You)" },
          { "TrackId": 6, "AlbumId": 1, "Name": "Put The Finger On You" },
          { "TrackId": 7, "AlbumId": 1, "Name": "Let's Get It Up" },
          { "TrackId": 8, "AlbumId": 1, "Name": "Inject The Venom" },
          { "TrackId": 9, "AlbumId": 1, "Name": "Snowballed" },
          { "TrackId": 10, "AlbumId": 1, "Name": "Evil Walks" },
          { "TrackId": 11, "AlbumId": 1, "Name": "C.O.D." },
          { "TrackId": 12, "AlbumId": 1, "Name": "Breaking The Rules" },
          { "TrackId": 13, "AlbumId": 1, "Name": "Night Of The Long Knives" },
          { "TrackId": 14, "AlbumId": 1, "Name": "Spellbound" }
        ]
    },
    {
        "AlbumId": 4,
        "Title": "Let There Be Rock",
        "ArtistId": 1,
        "Tracks": [
          { "TrackId": 15, "AlbumId": 4, "Name": "Go Down" },
          { "TrackId": 16, "AlbumId": 4, "Name": "Dog Eat Dog" },
          { "TrackId": 17, "AlbumId": 4, "Name": "Let There Be Rock" },
          { "TrackId": 18, "AlbumId": 4, "Name": "Bad Boy Boogie" },
          { "TrackId": 19, "AlbumId": 4, "Name": "Problem Child" },
          { "TrackId": 20, "AlbumId": 4, "Name": "Overdose" },
          { "TrackId": 21, "AlbumId": 4, "Name": "Hell Ain't A Bad Place To Be" },
          { "TrackId": 22, "AlbumId": 4, "Name": "Whole Lotta Rosie" }
        ]
      }
    ],
    "ArtistDocumentId": 1,
    "Name": "AC/DC"
  }
]

 

I went ahead and overwrote artistdocuments.json in our Data\Json directory with this Chinook file. We could change our Main() method to look like this:

Query Chinook Document Data:
static void Main(string[] args)
{
    var artistDocumentStore = new JsonStore<ArtistDocument>();
    var artistDocuments = new BiggyList<ArtistDocument>(artistDocumentStore);
 
    // Select artists with names beginning with 'M' and write to the console, 
    // along with a count of albums for each:
    var selected = from a in artistDocuments where a.Name.StartsWith("M") select a;
 
    foreach(var artistDoc in selected)
    {
        Console.WriteLine("{0}: {1} Albums", artistDoc.Name, artistDoc.Albums.Count);
    }
    Console.Read();
}

 

Notice how we simply initialized a store, injected it into the BiggyList<ArtistDocument> , and presto, our document data was loaded and queryable using LINQ?

Again, by default, Biggy is going to look in the ~\Data\Json directory for a folder with the current project name, then look for a file with a name matching the POCO class represented by T. If one is found, the data is loaded. If not, a new file is created the first time data is added.

In this example, a (new) file named artistdocuments.json already exists in the default directory, so the Chinook document data is loaded up during initialization, and is ready for querying.

Output from the above would resemble the following:

Console Output from LINQ Query Against Chinook Artist Documents:

console-output-1

Or, we might want to query up a specific artist and examine the albums and tracks on file for that artist. If we change our Main() method again:

Query a Specific Artist and Output the Albums and Tracks to the Console:
static void Main(string[] args)
{
    var artistDocumentStore = new JsonStore<ArtistDocument>();
    var artistDocuments = new BiggyList<ArtistDocument>(artistDocumentStore);
 
    // Select a single artist, and list the albums/tracks for that artist:
    var selected = artistDocuments.FirstOrDefault(a => a.Name == "Metallica");
    Console.WriteLine("Albums by {0}:", selected.Name);
    foreach(var albumDoc in selected.Albums)
    {
        Console.WriteLine(albumDoc.Title);
        foreach(var track in albumDoc.Tracks)
        {
            Console.WriteLine("\t{0}", track.Name);
        }
    }
    Console.Read();
}

 

Output here is as expected, and makes clear that the maintainers of Chinook Database are pretty big Metallica fans.

Biggy and Relational Database Stores

There are limitations to using a flat-file JSON store. For one, concurrency can become an issue. When your application needs call for a more robust persistence mechanism, you can use a relational database, both in the conventional sense, and for persisting document data.

SQLite is a nice simple solution if a file-based relational data store is a good option. SQLite required no configuration or administration, and is easily added to your project. Also, like Postgres, SQLite is free, cross-platform, and open-source. In terms of scaling up, SQLite makes a logical next step from flat-file JSON as your application grows.

Postgresql is also free, cross-platform, and open-source, and is our default, large-scale client-server database of choice.

We've designed Biggy to minimize the pain associated with moving between different backing stores. While file-based JSON stores, SQLite, and Postgresql all have different capabilities, advantages, and disadvantages, the IDataStore interface and the BiggyList don't care. Concrete implementations of IDataStore can capitalize on the various strengths of each storage format, but you can also pass them into an existing BiggyList and everything should "just work."

RelationalStore and DocumentStore

When working with relational data in the traditional sense, use a concrete implementation of RelationalStore<T> . In this case, Biggy will expect to find a table and schema which match the POCO class <T> specified as a type argument (or appropriately mapped using some extensions found in Biggy.Core).

When working with Document data, even stored in a relational database, Biggy will serialize/de-serialize the POCO class <T> into JSON, nesting any child collections or contained objects.

Also, Biggy will create Document tables on the fly, as any document table used by Biggy will have the same schema: id, body, and created_at.

Document Storage in a Relational Data Store

Once again, one of the primary use-cases for Biggy is to work with JSON-format document data. Postgres, because it is awesome, implements its own JSON data type (and now, with the release of version 9.4, additionally supports Binary JSON, or bson). Persisting documents in Postgres takes full advantage of the JSON data type.

For SQLite (or any other concrete implementation of IDataStore<T> you choose to make), JSON is persisted as simple string data.

For working with Relational data stores, the Chinook Database once again provides a handy, ready-to-use data set for both SQLite and Postgresql. We will use Chinook in the following examples.

Using Biggy with SQLite

To use Biggy with a SQLite backing store, just pull Biggy.Data.SQLite down from Nuget:

Use Nuget Package Manager Console to get Biggy.Data.SQLite:
PM> Install-Package Biggy.Data.Sqlite

 

If we drop the Chinook Database file into our project Data directory (if needed, change the file extension to .db), we can get right to work querying and using the relational data present in Chinook out of the box.

When using SQLite with Biggy, the primary constructor argument will at a minimum specify the name of the database file to look for. If no other arguments are provided, Biggy will look in the ~\Data directory at the root of our project directory, and try to match the string value with a file by the same name, with a .db extension.

If no database with that name is found, Biggy will create one.

If we change the using statement to Biggy.Data.SQLite instead of Biggy.Data.Json, we can do the following, working with some slightly different POCO models, and the basic relational tables in the Chinook Database just as they are. Notice here, we specify SQLiteRealtionalStore<T> , and not SQLiteDocumentStore<T> because we will be working with a relational data set.

Note: Biggy will create DocumentStore<T> tables on the fly. However, Biggy cannot, at present, create standard relational tables - to work against relational data, the tables need to already exist.

Connect BiggyList<T> to a SQLite Database:
public partial class Artist
{
    public int ArtistId { get; set; }
    public string Name { get; set; }
}
 
public partial class Album
{
    public int AlbumId { get; set; }
    public string Title { get; set; }
    public int ArtistId { get; set; }
}
 
 
static void Main(string[] args)
{
    // Pass the name of the database file as a constructor argument:
    var artistStore = new SqliteRelationalStore<Artist>("Chinook");
    var albumStore = new SqliteRelationalStore<Album>("Chinook");
 
    // Pass the store into the list:
    var artists = new BiggyList<Artist>(artistStore);
    var albums = new BiggyList<Album>(albumStore);
 
    var someArtist = artists.FirstOrDefault(a => a.Name == "AC/DC");
    var artistAlbums = albums.Where(a => a.ArtistId == someArtist.ArtistId);
 
    Console.WriteLine("Albums by {0}:", someArtist.Name);
    foreach(var album in artistAlbums)
    {
        Console.WriteLine(album.Title);
    }
}

 

Creating a Document Store Using SQLite

We can use SQLite to persist documents as well as work with traditional relational data, simply be using SQLiteDocumentStore<T> , similar to the way we did with the JSON store. The following code will create a new table in the Chinook database we are using for our back-end, and add a single artist document record:

Use SQLite to Create a Document Store in the Chinook Database:
var artistDocumentStore = new SqliteDocumentStore<ArtistDocument>("Chinook");
var artistDocuments = new BiggyList<ArtistDocument>(artistDocumentStore);
 
var newArtist = new ArtistDocument
{
    ArtistDocumentId = 1,
    Name = "Nirvana"
};
 
newArtist.Albums.Add(new AlbumDocument
{
    AlbumId = 1,
    ArtistId = 1,
    Title = "Bleach"
});
 
newArtist.Albums.Add(new AlbumDocument
{
    AlbumId = 2,
    ArtistId = 1,
    Title = "Incesticide"
});
 
artistDocuments.Add(newArtist);

 

In this case, the artistdocuments table has three simple fields: id, body, and created_at. The id will be the same as the primary key for each artist object. the body field contains the JSON document itself, and the created_at is simply a date-time stamp.

We could, of course, decide we want to pull all of the artist/album/track data together, and compose it all into artist documents similar to the JSON file I used in the previous section, and then push it out into a new SQLite database.

Aggregate Artists, Albums,and Tracks into a Document Store Using SQLite:
// Pass the name of the database file as a constructor argument:
var artistStore = new SqliteRelationalStore<Artist>("Chinook");
var albumStore = new SqliteRelationalStore<Album>("Chinook");
var trackStore = new SqliteRelationalStore<Track>("Chinook");
 
// Pass the store into the list:
var artists = new BiggyList<Artist>(artistStore);
var albums = new BiggyList<Album>(albumStore);
var tracks = new BiggyList<Track>(trackStore);
 
// Use a list, and do a bulk add when all the artist documents have been created:
var newArtistDocs = new List<ArtistDocument>();
foreach (var artist in artists)
{
    var artistDoc = new ArtistDocument { ArtistDocumentId = artist.ArtistId };
    var artistAlbums = albums.Where(a => a.ArtistId == artist.ArtistId);
 
    foreach (var album in artistAlbums)
    {
        var albumDoc = new AlbumDocument();
        var albumtracks = tracks.Where(t => t.AlbumId == album.AlbumId);
        albumDoc.Tracks.AddRange(albumtracks);
        artistDoc.Albums.Add(albumDoc);
    }
    newArtistDocs.Add(artistDoc);
}
 
// Now let's new up a NEW SQLite-based database:
var artistDocumentStore = new SqliteDocumentStore<ArtistDocument>("ChinookDocuments");
var artistDocumentsList = new BiggyList<ArtistDocument>(artistDocumentStore);
 
// Push our new nested artist documents into the new database:
artistDocumentsList.Add(newArtistDocs);

 

That's a large mess of code, but you get the idea. In the above, we materialized some relational data, composed it into document form, and persisted it into a brand-new SQLite database (this time named "ChinookDocuments.db" and again, in our ~\Data\ directory) as JSON string values in a table named "artistdocuments".

Using Biggy with Postgres

Unlike simple JSON files and SQLite, PostgreSql is a full-fledged client-server database. This means we need to work with real connection strings, connect to a real database, and that Biggy cannot create a database for us.

However, Postgres is an awesome database, and if you are not familiar, I recommend you go check it out.

With the Json store, and the SQLite store, we were able to provide some minimal initialization arguments to the concrete implementation of IDataStore<T> to get up and moving. With Postgres, we still don't need much, but we do need to specify a connection string in our App.config or Web.config file.

You can pull down the Biggy.Data.Postgres package from Nuget using the Package Manager Console:

Get Biggy.Data.Postgres from Nuget:
PM> Install-Package Biggy.Data.Postgres

 

Once again, we can get started by pulling down the Chinook Database for Postgres, running the CREATE script, and Voila.

Specifics may vary, but your App.Config should look something like this:

Example App.config File with Postgres Connection:
<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <startup>
    <supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.0" />
  </startup>
  <connectionStrings>
    <add name="chinook" connectionString="server=localhost;user id=biggy;password=password;database=chinook" />
  </connectionStrings>
</configuration>

 

Note in the above, our connection string is named "chinook" and we are using a local PG instance.

With that, we can once again start with some code that should be looking familiar by now:

Query Chinook Database Using Biggy with Postgres:
// Pass the name of the database file as a constructor argument:
var artistStore = new PgRelationalStore<Artist>("chinook");
var albumStore = new PgRelationalStore<Album>("chinook");
 
// Pass the store into the list:
var artists = new BiggyList<Artist>(artistStore);
var albums = new BiggyList<Album>(albumStore);
 
var someArtist = artists.FirstOrDefault(a => a.Name == "AC/DC");
var artistAlbums = albums.Where(a => a.ArtistId == someArtist.ArtistId);
 
Console.WriteLine("Albums by {0}:", someArtist.Name);
foreach (var album in artistAlbums)
{
    Console.WriteLine(album.Title);
}
Console.Read();

 

Here again, we are able to directly query the relational data loaded into the BiggyList<T> from Chinook Database. Biggy looks in the App.config file for a connection string matching the one provided as a constructor argument, and then does its thing.

From here, working with document data in Postgres is much the same as we saw from SQLite. Remember though, that Biggy cannot create a new database on the fly with Postgres as we did using SQLite.

What Next for Biggy?

As mentioned, the packages available on Nuget at the moment are definitely "pre-release" in that we expect to be making changes. I'm not feeling the product is all the way there in terms of its front-side API, and usefulness.

At this point, I would love to hear some feedback from those who have taken some time to play with it. What would make it more useful? Is there a critical feature missing that would also be widely used (as opposed to special-case features)?

My interest now is to arrive at a stable, minimal feature set, and a maximally useful API. I'm less interested in adding additional store options or features, and more in refining what is already there.

Please do check it out, and please do open an issue on the Github Repo for any bugs, suggestions, or other comments you may have.

Want to Contribute?

Again, please do! I will happily accepts Pull Requests for bug fixes and features, so long as they fit in the general scheme of things. A word of caution, though, creeping feature-itis was one of the problems we had with the "version 1" repo of Biggy. We really want to make sure any new feature is needed, and I really would like to keep things as simple as possible until we have a ready-for-prime-time release version.

What's with the Strange Code Formatting in the Repo?

The code in the repo is very much not idiomatic C# format. We are using two spaces for indentation, and same-line braces. This is  something I picked up from Rob, who I think picked it up during his years in the Ruby/JS wilderness. However, after struggling with it for a bit, I came to really like the compact style.

For the moment, I plan to maintain the repo using this format. However, any PR's you send my way do not need to match -I'll fix them

At some point, I may cave and revert back to idiomatic C#. But I challenge you to check it out, try it out, and see if you don't find yourself thinking most C# code looks a little spread out after a while…

Where can I Find More Info and Better Docs?

A more fully developed documentation page is coming, There is a lot more to Biggy then you saw here - this was a quick and dirty intro, since the code has changed so much in the past year. There are better ways to do almost everything I discussed in this article, but the examples here were kept purposefully simple.

We are also gratefully accepting contributions to documentation :-)

Additional Resources and Items of Interest

Some History on Biggy:

 

Posted on December 22 2014 10:41 AM by jatten     

Comments (0)

C#: Avoiding Performance Issues with Inserts in SQLite

Posted on December 15 2014 08:49 PM by jatten in SQLite, Database, C#, CodeProject   ||   Comments (0)

Coronado-Island Parking-Meter-320

If you are new to SQLite, you may well run across one of the most confounding of its implementation details the moment you attempt to do some sort of bulk or batch processing of inserts or updates.

What you will discover is that unless properly implemented, inserting or updating multiple records in a SQLite database can seem abysmally slow. Slow to the point of unsuitability in certain cases.

Not to fear, this has to do with some default (and not entirely improper) design choices in SQLite, for which there is an easy workaround.

Image by Lance McCord  |  Some Rights Reserved

SQLite is a wonderfully simple to use, cross-platform/open source database with terrific performance specs. It is a mature product, and, if we are to believe the estimates of SQLite.org, is the most widely deployed SQL database in the world.

SQLite manages to cram a host of mature, well-developed features into a compact and well-documented package, including full transaction support.

This transaction support, and the way it is implemented, has a significant impact on certain performance characteristics of SQLite.

Transactions by Default in SQLite

As stated previously, one of the selling points of SQLite, despite it being a simple, file-based database, is that it is fully transactional. What does this mean?

From Wikipedia:

A transaction comprises a unit of work performed within a database management system (or similar system) against a database, and treated in a coherent and reliable way independent of other transactions. Transactions in a database environment have two main purposes:

  1. To provide reliable units of work that allow correct recovery from failures and keep a database consistent even in cases of system failure, when execution stops (completely or partially) and many operations upon a database remain uncompleted, with unclear status.
  2. To provide isolation between programs accessing a database concurrently. If this isolation is not provided, the program's outcome are possibly erroneous.

A database transaction, by definition, must be atomic, consistent, isolated and durable.[1] Database practitioners often refer to these properties of database transactions using the acronym ACID.

Transactions provide an "all-or-nothing" proposition, stating that each work-unit performed in a database must either complete in its entirety or have no effect whatsoever. Further, the system must isolate each transaction from other transactions, results must conform to existing constraints in the database, and transactions that complete successfully must get written to durable storage.

SQLite is not alone, of course, in implementing transactions - in fact, transactions are a core concept in database design. However, the implementation of SQLite proposes that, unless otherwise specified, each individual write action against your database (any action through which you modify a record) is treated as an individual transaction.

In other words, if you perform multiple INSERTs (or UPDATEs, or DELETEs) in a "batch," each INSERT will be treated as a separate transaction by SQLite.

The trouble is, transactions carry processing overhead. When we decide we need to perform multiple INSERTs in a batch, we can run into some troubling performance bottlenecks.

Batch Processing in SQLite - A Console Example

If we are using SQLite from the SQLite Console, we can see exactly what I am talking about by running an easy insert script, and seeing how things go. For this example, I borrowed a few lines from the Chinook Database to create and populate a table of Artists. If you don't have the SQLite Command Line Console on your machine, install it now (see Installing and Using SQLite on Windows for details). Then copy the SQL script from my Gist on Github, paste it into a text file, and save the file in your user folder as create-insert-artists.sql.

The script should look like this in the text file before you save:

Paste the SQL Script Into a Text File and Save:
DROP TABLE IF EXISTS [Artist];
 
CREATE TABLE [Artist]
(
    [ArtistId] INTEGER  NOT NULL,
    [Name] NVARCHAR(120),
    CONSTRAINT [PK_Artist] PRIMARY KEY  ([ArtistId])
);
 
INSERT INTO [Artist] ([ArtistId], [Name]) VALUES (1, 'AC/DC');
INSERT INTO [Artist] ([ArtistId], [Name]) VALUES (2, 'Accept');
INSERT INTO [Artist] ([ArtistId], [Name]) VALUES (3, 'Aerosmith');
INSERT INTO [Artist] ([ArtistId], [Name]) VALUES (4, 'Alanis Morissette');
 
-- . . . A bunch more artist records . . . 
 
INSERT INTO [Artist] ([ArtistId], [Name]) VALUES (273, 'C. Monteverdi, Nigel Rogers 
INSERT INTO [Artist] ([ArtistId], [Name]) VALUES (274, 'Nash Ensemble');
INSERT INTO [Artist] ([ArtistId], [Name]) VALUES (275, 'Philip Glass Ensemble');

 

If we open a new database in the SQLite Console (navigate to your User folder to do this for our purposes here) and read the script, we can see how long it takes. There are 275 Artist records in the script to be INSERTED.

Run SQLite3, Open a New Database, and Read the Artists Script:
Microsoft Windows [Version 6.3.9600]
(c) 2013 Microsoft Corporation. All rights reserved.
C:\Users\John>sqlite3
SQLite version 3.8.7.3 2014-12-05 22:29:24
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .open txdemo.db
sqlite> .read create-insert-artists.sql

 

We can see that (depending on your machine - your mileage may vary) executing the script takes roughly 10 seconds. Inserting 275 records should NOT take 10 seconds. Ever.

Console Output from Running Script (Took Way Too Long!):

run-create-insert-artists-script-no-transactions

As mentioned previously, unless we tell it otherwise, SQLite will treat each of those INSERT commands as an individual transaction, which slows things WAAAYYYY DOOOOWWWWN. We can do better. We tell SQLite to override this behavior by explicitly specifying our own transaction, beginning before the INSERT batch, and committing after each INSERT batch.

Add Transactions to SQLite Scripts Using BEGIN and COMMIT

When we are executing batches of INSERTs, UPDATEs, or DELETEs in a script, wrap all the writes against each table up in a transaction using the BEGIN and COMMIT SQLite Keywords. Modify the create-insert-artists.sql script in out text file by adding a BEGIN before the table INSERTs, and a COMMIT after the table inserts (for scripts involving more than one table, do this for the INSERTs for each table):

Modified Script Wraps INSERTs in single transaction:
DROP TABLE IF EXISTS [Artist];
 
CREATE TABLE [Artist]
(
    [ArtistId] INTEGER  NOT NULL,
    [Name] NVARCHAR(120),
    CONSTRAINT [PK_Artist] PRIMARY KEY  ([ArtistId])
);
 
BEGIN;
INSERT INTO [Artist] ([ArtistId], [Name]) VALUES (1, 'AC/DC');
INSERT INTO [Artist] ([ArtistId], [Name]) VALUES (2, 'Accept');
INSERT INTO [Artist] ([ArtistId], [Name]) VALUES (3, 'Aerosmith');
INSERT INTO [Artist] ([ArtistId], [Name]) VALUES (4, 'Alanis Morissette');
 
-- . . . A bunch more artist records . . . 
 
INSERT INTO [Artist] ([ArtistId], [Name]) VALUES (273, 'C. Monteverdi, Nigel Rogers 
INSERT INTO [Artist] ([ArtistId], [Name]) VALUES (274, 'Nash Ensemble');
INSERT INTO [Artist] ([ArtistId], [Name]) VALUES (275, 'Philip Glass Ensemble');
COMMIT;

 

If we re-run our script now, we see a significant performance boost. In fact, the script execution is nearly immediate.

Re-Run the Script in the SQLite Console (this time, with a Transaction):

run-create-insert-artists-script-with-transaction

The above will apply to all INSERTs, UPDATEs, and DELETEs when you execute scripts in the SQLite console.

Improve SQLite Performance in Your .NET Application Using Transactions

We see a similar problem when we use SQLite in a .NET application, and the solution is conceptually the same, although the implementation is necessarily a little different. If you are new to using SQLite (and many .NET developers are, at some point), this is exactly the type of confounding quirk that can have you running back to yet another "integrated" Microsoft database solution before giving this great database a chance. "I tried SQLite, but the inserts and updates were too damn slow . . ."

Sample .NET Application - The Slow, Hard Way

Consider the following Console application example. It is a small, simplistic example, and has no exception handling, but you get the idea. The Main() method performs some basic set-up, then builds a List<User> which is passed to the AddUsers() method.

Program to Insert a List of Users Using System.Data.SQLite:
class Program
{
    static string _connectionString;
    static void Main(string[] args)
    {
        // 'Data' directory in the current directory ( ..\bin\Debug\):
        string dbDirectory = Environment.CurrentDirectory;
        string dbName = "test.db";
 
        // Add System.IO to the using statements at the top of your code file:
        string dbPath = Path.Combine(dbDirectory, dbName);
        _connectionString = string.Format("Data Source = {0}", dbPath);
 
        CreateDbIfNotExists(dbPath);
        CreateUsersTable();
 
        int qtyToAdd = 100;
 
        // Load some users into a list...
        var usersToAdd = new List<User>();
        for(int i = 0; i < qtyToAdd; i++)
        {
            usersToAdd.Add(new User { Name = "User #" + i });
        }
 
        // And THEN add them:
        var sw = new System.Diagnostics.Stopwatch(); ;
        sw.Start();
        int qtyAdded = AddUsers(usersToAdd);
        sw.Stop();
 
        Console.WriteLine("Added {0} Users successfully in {1} ms", 
        	qtyAdded, sw.ElapsedMilliseconds);
 
        var allUsers = ReadUsers();
 
        Console.WriteLine("Read {0} Users from SQLite", allUsers.Count());
        Console.Read();
    }
 
 
    static void CreateDbIfNotExists(string dbPath)
    {
        string directory = Path.GetDirectoryName(dbPath);
        if (!File.Exists(dbPath))
        {
            // Creates directory if it doesn't already exist:
            Directory.CreateDirectory(directory);
 
            // Creates file if it doesn't already exist:
            SQLiteConnection.CreateFile(dbPath);
        }
    }
 
 
    static SQLiteConnection CreateConnection()
    {
        return new SQLiteConnection(_connectionString);
    }
 
 
    static void CreateUsersTable()
    {
        string sqlTestTable =
            @"CREATE TABLE IF NOT EXISTS Users 
            ( 
                Id INTEGER PRIMARY KEY AUTOINCREMENT, 
                Name TEXT NOT NULL 
            )";
 
        using (var cn = new SQLiteConnection(_connectionString))
        {
            using (var cmd = new SQLiteCommand(sqlTestTable, cn))
            {
                cn.Open();
                cmd.ExecuteNonQuery();
                cn.Close();
            }
        }
    }
 
 
    class User
    {
        public int Id { get; set; }
        public string Name { get; set; }
    }
 
 
    static int AddUsers(IEnumerable<User> users)
    {
        var results = new List<int>();
        string sqlInsertUsers =
            @"INSERT INTO Users (Name) VALUES (@0);";
 
        using (var cn = new SQLiteConnection(_connectionString))
        {
            // Open the connection, and also atransaction:
            cn.Open();
            using(var transaction = cn.BeginTransaction())
            {
                foreach (var user in users)
                {
                    using (var cmd = cn.CreateCommand())
                    {
                        cmd.CommandText = sqlInsertUsers;
                        cmd.Parameters.AddWithValue("@0", user.Name);
                        results.Add(cmd.ExecuteNonQuery());
                    }
                }
                transaction.Commit();
            }
            cn.Close();
        }
        return results.Sum();
    }
}

 

The AddUsers() method creates a connection and a command, opens the connection, and then iterates over the IEnumerable<User>, successively inserting the user data for each into the SQLite database. We are using a System.Diagnostics.Stopwatch to time the execution of the call to AddUsers() from Main().

It looks like we've done everything right here - we set up the connection only once, open it only once (opening and closing connections for each loop iteration causes its own performance hit). However, it still takes upwards of four seconds to insert only 100 users. We can see the results in our console output.

Console Output from Example Program Inserting 100 Users:

add-users-no-transaction-dotnet

Pretty lame, but not surprising, given what we have learned about transactionality defaults in SQLite. but, once again, we can do better.

Wrap SQLite Batch Operations in an ADO Transaction in Your .NET Application

Similar to using the SQLite console, the solution here is also to use a transaction. We can modify the code in the AddUsers() method as follows:

Modified Code for AddUsers() Method Wrapping Command Execution in a Transaction:
static int AddUsers(IEnumerable<User> users)
{
    var results = new List<int>();
    string sqlInsertUsers =
        @"INSERT INTO Users (Name) VALUES (@0);";
 
    using (var cn = new SQLiteConnection(_connectionString))
    {
        // Open the connection, and also atransaction:
        cn.Open();
        using(var transaction = cn.BeginTransaction())
        {
            foreach (var user in users)
            {
                using (var cmd = cn.CreateCommand())
                {
                    cmd.CommandText = sqlInsertUsers;
                    cmd.Parameters.AddWithValue("@0", user.Name);
                    results.Add(cmd.ExecuteNonQuery());
                }
            }
            transaction.Commit();
        }
        cn.Close();
    }
    return results.Sum();
}

 

With that, if we run our application again, we see an order of magnitude performance improvement:

Improved SQLite Insert Performance Using Transaction in .NET:

add-users-with-transaction-dotnet

Yep. 52 milliseconds, down from over 4,000 milliseconds.

Be Cautious with Your Transactions in SQLite

We've seen how we can realize some serious performance wins in SQLite by using transactions to wrap up bulk operations. However, let's not put the cart before the horse without thinking it through. Sometimes, you actually need a more granular level of transaction to ensure data integrity.

It simply would not do to maximize performance of a banking application if transactions were implemented only at the top level of a batch operation. After all, transactions in the world of relational databases are first and foremost about creating assurance that an operation succeed in its entirety, or not at all.

Additional Resources and Items of Interest

 

Posted on December 15 2014 08:49 PM by jatten     

Comments (0)

Installing and Using SQLite on Windows

Posted on December 7 2014 04:14 PM by jatten in SQLite, Database, CodeProject   ||   Comments (0)

3835365695_5e515a3492_mWhen one is developing in .NET with Visual Studio and other Microsoft tools, it is easy to lose sight of alternative solutions to common problems. MS does a competent job of creating a tightly integrated development tool chain, where available MS products (both free and paid) offer reasonable default choices which generally get the job done.

Given this, .NET devs often fail to explore outside this arena, or try on alternate solutions which might acquit themselves equally as well, or better, to the problem at hand. Also, of course, there is always a learning curve to new choices, and we often choose the familiar out of simple expediency.

Image by shinichi  |  Some Rights Reserved

Some Background

SQLite is an awesome, open source, cross-platform, freely available file-based relational database. Database files created on Windows will move seamlessly to OSX or Linux OSes. The tools (in particular the SQLite3 Command Line CLI we examine here) work the same from one environment to the next.

It is also not new. If you have been around for a while, you doubtless know SQLite has been in active and open development for well over a decade, and is widely used in many different scenarios and operating environments. In fact, SQLite.org estimates that SQLite is in fact the most widely deployed SQL database solution in the world. Their most recent figures (albeit from 2006) would indicate that there are over 500 million deployments of SQLite (this number is no doubt higher by now). 

SQLite documentation is also widely regarded as above average in completeness and usability, providing both new and experienced users a well-developed canonical resource for learning and troubleshooting.

SQLite was originally designed by D. Richard Hipp in 2000 for the U.S. Navy, with the goal of allowing SQLite-based programs to function without installing a database management system, and without requiring a system administrator (from Wikipedia). These design requirements result in, as the SQLite site describes it, "a software library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine."

Until recently, I had not spent much time with SQLite. However, in developing the Biggy project, we decided that the core supported database systems would be cross-platform and open source. We wanted both a full-on client/server option, as well as a file-based relational database option. For our file-based relational database we chose SQLite, after exploring other alternatives.

In fitting SQLite into the Biggy workflow, I got to the chance to familiarize myself with SQLite, it strengths, some weaknesses, some peculiarities to watch for, and some tips and tricks for getting the most out of the product.

In this post, we will get familiar with the basics of using the database in a Windows environment. Next post, we will explore integration with .NET development, and Visual Studio. But, learn the hard way first always say, so… let''s get our command line on.

Getting Started - Using SQLite on Windows

Before we look at using SQLite in Visual Studio, let's walk through the basics of using SQLite in a Windows environment outside the IDE.

First, download the pre-compiled binaries from the SQLite Downloads page. At a minimum you will want the binaries for the Win32 x86 SQLite dll, and for the SQLite x86 Command Shell. Unzip the contents of the files in a folder named C:\SQLite3 (or whatever other location suits your needs). Then add C:\SQLite3 to your PATH variable so that you can invoke the SQLite Command Shell right from the Windows console.

In your new directory C:\SQLite3 you should now have the following items:

  • sqlite3.def
  • sqlite3.dll
  • sqlite3.exe

If we run the sqlite3.exe, we are greeted with a Console Application designed to allow us to work with SQLite databases:

The SQLite Console:

Command Prompt - open - sqlite3

The command prompt is easy to use. Text entered without the "." qualifier will be treated as SQL (and succeed or fail accordingly). There are a set of commands preceded with the "." qualified which are application commands. An example is shown in the console window above, where we are instructed to use the .open command to open a database file.

The complete list of SQLite console commands is beyond the scope of this article, but we will walk through a list of the most useful here.

Open a New Database and Create Some Tables from the SQLite3 Console

The SQLite3 Console will open in the current directory (or in the directory in which the .exe is found, if you double-click in the GUI). Let's start by opening a new Windows terminal (which should generally open in our home directory), create a new sub-directory named sqlite_data, and navigating into that folder:

Create a new Directory and Navigate Into the New Directory:
C:\Users\John> mkdir sqlite_databases
C:\Users\John> cd sqlite_databases

 

Next, let's try on that .open command. Open sqlite3 and open a new database in the directory we just created:

Open SQlite3.exe and Open a New Database File:
C:\Users\John\sqlite_databases>sqlite3
sqlite> .open test.sqlite

 

Your console output should now look like this:

Console Output after Opening SQLite3 and Creating a New Database File:

open-new-database

Next, let's create a few tables to play with.

Entering SQL in the SQLite Console

Recall that plain text entered without the "." qualifier will be interpreted by the SQLite console as SQL. There are a few additional things to bear in mind:

  • SQL text may span multiple lines - the enter key will not cause the text following the prompt to execute until it is ended with a semi-colon.
  • You can create multi-line SQL statements simply by hitting the Enter key without ending the statement with a semi-colon.
  • SQLite uses either square brackets or double-quotes as delimiters for object names, in cases where the literal column name would not be allowed. For example, Last Name would NOT be a valid column name, but will work as [Last Name] . Likewise, the keyword Group is not allowed as a column name, but "Group" will work.
  • SQLite is not case-sensitive. Unlike some other databases (most notably Postgresql), casing in both the SQL syntax, and in object names, is ignored.

So, with that said, let's create a table or two.We will keep this really basic, since we are interested in how the console works, more so that a SQLite SQL syntax tutorial.

Create a Table in a Single-Line Statement:

create-table-users-single-line

Above, we just kept typing our whole SQL statement, and allowed the console to wrap the text when it needed to (that lovely Windows console, with its under-developed display characteristics…). Kinda ugly and hard to read. Let's try a multi-line statement.

Create a Table Using a Multi-Line Statement:

create-table-groups-multi-line

Aside from the ugliness that is the Windows Console, that's a little more readable.

Now let's add a few records.

Insert Records into Test Database:

insert-beatles

Notice how the case of my SQL doesn't matter in the above? And, yes, as a matter of fact, that IS a syntax error in the midst of things there. I accidentally used an angle bracket instead of a paren…

So now, we have added a little data. Let's read it back:

Select Data from Users Table:

select-beatles

Here we see that for unrelated reasons (ahem… I closed the wrong window…), I had to exit the application, and then go back in. However, once I opened our test.sqlite database, I was able to enter a standard SELECT statement, and return the data.

See that ...>? That was the result of me forgetting to add the semi-colon at the end of my SELECT statement. If you do that (and you WILL…), simply add a semi-colon on the continued line, and the statement will execute (remember, until SQLite3 sees a semi-colon, it will continue to interpret text input as more SQL).

Formatting the Console Output

We can tell SQLite3 how we would like our data displayed. For example, we may prefer to see a more tabular display, with columns and headers. To accomplish this, we use a few of those application commands, prefixed with a period:

Change the Display Mode for the SQLite Console

We can use the following two commands to change the display mode and use columns and headers in our console output:

Use Column Display Mode with Headers in SQLite3
sqlite> .mode column
sqlite> .headers on

 

If we run our SELECT statement again, the output looks like this:

Console Output Using Columns and Headers:

display-with-columns

Executing Script Files from the SQLite Console Using the .Read Command

Of course, typing in SQL in the console can become painful. While it is fine for quick-and-diry queries and maintenance tasks, doing a lot of work is better accomplished by scripting out what you need in a text file, and then executing that from the Console.

To see this in action, we will download my personal favorite test database, the Chinook database. Chinook has a database script for most of the popular database platforms, providing a handy way to use the same data set for evaluating multiple platforms (among other things). Download the Chinook Database, extract the .zip file, and locate the Chinook_Sqlite_AutoIncrementPKs.sql file. To keep things simple, drop a copy of it into your sqlite_databases folder, so it is in the current directory. Then, also to keep out typing down, rename the file you just moved to simply "Chinook.sql".

We can execute SQL scripts using the SQLite .read command. To illustrate, we will read in the Chinook database.

You will notice a couple things when we do this. First,  the console may show an error (which you can see in the image below), but the script is still running - errors are logged out to the console.

Second, executing this script in its current form is SLOOOOWWWW. This is due to a peculiarity with SQLite we will address momentarily, but was not addressed by the creators of the Chinook Database script.

Execute SQL Script from the SQLite Console Using the .Read Command
sqlite> .read Chinook.sql

 

The script may run for a good number of minutes, so go grab a cup of coffee or something. your computer has not seized up. The Console will return when the script is finished (really, this took about 10 minutes on my machine, but we're going to fix that...

<Coffeee Brake . . .>

Ok. Now that the script has finished running, let's use the .tables command to see a list of the tables in our database. If everything worked as we expect, we should see our own users and groups tables, as well as a bunch of new ones populated with Chinook data:

List Tables Using the .Tables Command:
sqlite> .tables

 

We should see something like this:

Console Output from .Tables Command:

list-tables-after-chinook-import

Now, why the hell did it take so long to run that script??!!

Wrap Multiple Actions in Transactions for Instant Performance Boost

SQLite is inherently transaction-based. Meaning, unless you specify otherwise, each statement will be treated as an individual transaction, which must succeed, or be rolled back.

Transactions are a key feature of relational databases, and critical in the big scheme of things. However, individually, transactions add significant performance overhead, and when we are inserting (or updating, or otherwise modifying) thousands of records in multiple tables, treating each insert as an individual transaction slows things WAAAAYYYY DOWWN.

If we go through the Chinook.sql script and place a BEGIN; statement before the inserts for each table, and a COMMIT; statement at the end of the INSERTs for each table, we will see several orders magnitude better performance from this script.

We can skip wrapping the DROP and CREATE table statements in transactions for our purposes here. As an example, open the file in your favorite text editor, go through and find the beginning of the INSERTs for the Genre table. Add a BEGIN and COMMIT clause like so:

Wrap Table Inserts in Transactions:
BEGIN;
INSERT INTO [Genre] ([Name]) VALUES ('Rock');
INSERT INTO [Genre] ([Name]) VALUES ('Jazz');
... Etc ...
INSERT INTO [Genre] ([Name]) VALUES ('Alternative');
INSERT INTO [Genre] ([Name]) VALUES ('Classical');
INSERT INTO [Genre] ([Name]) VALUES ('Opera');
COMMIT;

 

Now scroll on down, and do the same for each table. When you are done, let's create a dedicated Chinook database to try it out.

Open the Windows Console, navigate back to sqlite_databases directory, run sqlite3, and open a new database named chinook.db. Then use .read to execute the chinook.sql script again:

Read Chinook Script into Chinook.db:
C:\Users\John>cd sqlite_databases
C:\Users\John\sqlite_databases>sqlite3
sqlite> .open chinook.db
sqlite> .read chinook.sql

Next, use the .tables command again to see that all the tables were created. The console output should look like this:

Console Output from Execution after Wrapping Table Inserts in Transactions:

fast-script-execution-with-transactions

We see there is still a little error bugaboo at Line 1 (most likely due to some unicode issue at the beginning of the file - welcome to the world of scripts). However, we can see if our data imported fairly easily:

Select Artists from the Chinook Artists Table:select-from-artist-table

 

GUI-Based Tools

We've covered enough here that we can explore what SQLite has to offer from the Windows console, and become familiar with this fantastic little database. Of course, there are other tools available to work with SQLite databases, including a terrific multi-platform GUI-based interface, SQLiteBrowser, which is a very competent management interface for SQLite databases.

As mentioned previously, the documentation available at SQL.org is first-rate, and there are a host of other resources out there as well.

SQLite is a handy, mature, highly performant database which is easy to use, and works on all the major OS platforms. Database files created on a Windows machine can move seamlessly between OSX and *Nix OSes, as can most of the tools designed to work with them.

I like to start everything with the most fundamental tools available, and then once I have developed a solid understanding of the system, move on up to more advanced tools. Take some time and get to know SQLite from the basic CLI interface. You won't regret it.

Additional Resources and Items of Interest

 

Posted on December 7 2014 04:14 PM by jatten     

Comments (0)

About the author

My name is John Atten, and my "handle" on many of my online accounts is xivSolutions. I am Fascinated by all things technology and software development. I work mostly with C#, JavaScript/Node, and databases of many flavors. Actively learning always. I dig web development. I am always looking for new information, and value your feedback (especially where I got something wrong!). You can email me at:

jatten at typecastexception dot com

Web Hosting by