Keeping Things Fast for Large n

Some years ago, we had customers reporting poor network response time when fetching content from the server. Our product was not anywhere near being wildly popular, and the number of records in the database were still counted in tens of thousands. Even our puny instance was able to cache the entire database into memory. Not just a single table or the results of a few queries. The entire database could be cached in RAM. So the slowdown probably wasn’t caused by something in the database. All customers reported more or less similar latency, irrespective of their geographical locations or internet service providers or time of the day. That also ruled out network problems.

So I rolled up my sleeves and began investigating.

One of the features of this product was that while it used sequential 128-bit integers for the primary key columns, the data retrieval was done with a shorter 5 character identifier, called a short-code. The short-code was also unique, but made up of fewer characters for legibility when users passed links around. The short-code wasn’t appropriate for a primary key column though, as its randomness would cause too much index fragmentation.

The short-code was generated by hashing the primary key value using the MD5 algorithm, and truncating it to the first 5 characters of the result. If there was a collision…well nobody had thought about that back then. It was one of the subtle bugs that would come back to trouble us years later. But that’s another story.

Someone had decided to implement this feature in the application code. When a content link was required, it was computed using .NET cryptography libraries and the result embedded into a URL string template for the user to share. The hashed value was not stored in the database, even though it was going to remain the same every time. And we would be paying heavily later for this oversight.

Now since the application had no way to identify the record directly by its short-code, the developers had to come up with a Rube Goldberg-esque contraption to retrieve it again. For this, they fetched the ID column for all the content records, ran the MD5 function on each row, truncated the result and compared it to the value given in the incoming request, until a match was found. The CS101 guys already know where we are getting at with this approach. Since everything is fast for small n, this technique worked flawlessly on the developers’ own computers. It was only when the application was deployed to production, and stayed there for a few months, that the performance bottlenecks began to show up.

Locating the bug itself was easy. I set up a network trace using WireShark, inspecting the queries between the application and the database server, and promptly proceeded to fall of the chair in disbelief.

After excluding essential communications such as handshakes and authorisation, the application was still receiving almost half a megabyte of data, split into 400 packets, for a table containing only 31,000 rows. All this activity before it could even begin looking for a match.

This was going to require some re-engineering to fix.

Due to unrelated reasons, our goal was only to reduce the amount of data being received from the database server with zero changes in the public API or modifications to the database tables. We could only change the application code and deploy a new build. This code was written to use ADO.NET and inline queries to perform data operations. So changing its behaviour was going to be relatively easy.

The first thing was to assemble a query that could generate the MD5 of an integer.

SELECT HASHBYTES('MD5', CAST([Article].[Id] AS CHAR(36))) AS [Hash]
FROM [Article]
WHERE [Article].[Id] = '6BA1CE84-FDB1-EA11-8269-C038960D1C7A';

Since the HASHBYTES function in T-SQL works only with char, nchar or binary data, the uniqueidentifier had to be cast into a fixed-width char. The output of this function was like so.

-- 0x704E87BA59EB6F930C020E5D6DA6B444

This hash was converted into a string by using the CONVERT function, and finally truncated to the first 5 characters, resulting in the output shown below.

SELECT LEFT(
            CONVERT(
                CHAR(32),
                HASHBYTES(
                    'MD5',
                    CAST([Article].[Id] AS CHAR(36))
                ), 2), 5) AS [Hash]
FROM [Article]
WHERE [Article].[Id] = '6BA1CE84-FDB1-EA11-8269-C038960D1C7A';
-- 704E8

Cool!

Now came the retrieval by the short-code. The hash-computation query was nested inside another simple select query.

SELECT *
FROM
(
    SELECT [Article].[Id],
           [Article].[Name] AS [ArticleName],
           LEFT(CONVERT(CHAR(32), HASHBYTES('MD5', CAST([Article].[Id] AS CHAR(36))), 2), 5) AS [Hash]
    FROM [Article]
) [Article_]
WHERE [Article_].[Hash] = '704E8';

This was executed against the database and measured again using WireShark.

The results were remarkably different, but not at all unexpected. Only 882 bytes of data were transferred between the database and the application, and of that, 630 bytes were the query string going into the database server. The only record the server now returned was 252 bytes long, and required no further processing in the application.

There was still had a lot of processing going on in the database itself, which was still ripe for optimisation. Storing the short-code in the table permanently and indexing the column would improve the product’s performance even further.

But for that moment, I was king of the world.

This story has been altered slightly to protect the guilty and gloss over irrelevant details. The performance bottleneck was made much worse by nested loops (yay, quadratic growth!) and suboptimal data types.

Runtime Resource Authorisation in ASP.NET MVC

The Authorize attribute is a feature of the ASP.NET MVC framework that programmers learn early on. While it is a good out of the box solution for general cases, it doesn’t work well for dynamic authorisation. Take the HTTP request shown below.

GET /posts/edit/12 HTTP/1.1
Host: www.example.com

In colloquial MVC, this requests the PostsController to retrieve the contents of the post with ID 12 and display them in a form. The Authorize attribute does not determine if the currently logged in user has been granted editing rights for that specific post. At best, operations are allowed based on roles or claims, which still becomes an all or nothing situation. Either an individual user can edit all posts, or none at all.

Finer-grained control over individual resources for each user in the system requires a custom solution.

The system described below eschews the Authorize attribute entirely, and chooses to instead use filters in the ASP.NET request pipeline. It imposes the restriction that the name of the resource identifier parameter should always be well-known, such as id. Since the default route already follows this convention, this usually isn’t a problem.

Identifying the What

The first piece of the puzzle is a custom action filter called SecuredAttribute. This class inherits from System.Attribute and is applied to methods. Any controller action method that is marked with this attribute identifies as a sensitive access point that requires some kind of screening procedure before being invoked.

But this attribute only identifies the method. It does not perform any kind of screening on incoming requests. This is also why it doesn’t inherit from any of the more higher-level attributes from the MVC framework, such as ActionFilterAttribute.

public class SecuredAttribute : Attribute
{
}

The SecuredAttribute is used by applying it to the top of the controller method that needs runtime screening.

public class AdminController : Controller
{
    [Secured]
    public IActionResult Edit(int id)
    {
        …
    }
}

Implementing the How

The screening is performed by a class that implements IActionFilter. There can be multiple screening filters, and they are queued up in the GlobalFilterCollection during Application_Start(). The screening process is performed before the action method is executed, by implementing it in the OnActionExecuting method of the filter class.

public class AuthorizationFilter : IActionFilter
{
    … 
    public void OnActionExecuting(ActionExecutingContext context)
    {
        var secured = context.ActionDescriptor.GetCustomAttribute(typeof(SecuredAttribute), false).FirstOrDefault();
        if (null == secured)
        {
            return;
        }

        var user = context.HttpContext.User;
        var param = context.ActionParameters.Where(p => p.Key == "id").FirstOrDefault();
        var id = Convert.ToInt32(param.Value);

        // Invoke a service to check if the request should be allowed
        var isAllowed = securityService.IsAllowed(user, id);
        if (!isAllowed)
        {
             context.Result = new HttpStatusCodeResult(HttpStatusCode.Unauthorized);
        }
    }
}

The filter looks for the [Secured] attribute. If the method being invoked doesn’t have the attribute, the filter immediately returns and lets the method execution proceed. If the attribute is found, the filter performs a screening procedure to determine if the request should be allowed or not. It may use a injected service class or even a third-party API to perform this action.

Since the attribute is only identifying the method, it remains simple. Discrete behaviours can be attached to the same action method, that can also be dependent on the request context (e.g. invocation through web vs. API) while maintaining a clean separation of concerns.

Some of these techniques are shown below.

Extending Beyond Simple Authorisation

The method attribute can be leveraged for performing other cross-functional requirements, which are tangent to authorisation. The secured method may require an audit trail.

public class AuditTrailFilter : IActionFilter
{
    public void OnActionExecuting(ActionExecutingContext filterContext)
    {
        var secured = filterContext.ActionDescriptor.GetCustomAttribute(typeof(SecuredAttribute), false).FirstOrDefault();
        if (null == secured)
        {
            return;
        }

        // Invoke a service to log the method access
        Logger.Info(…);
    }
}

The authorisation and audit trail filters can coexist and are fired independently. They use the same marker to identify the methods, but perform widely different tasks with different resources at their disposal. AuditTrailFilter can be programmed to log requests to secured location in one store and all other requests into another store, while AuthorizationFilter always allows requests to unsecured locations.

Another example is to return different responses to the client based on its type. When a request comes from a browser, its Accepts header is set to text/html, while an API client such as a SPA or a mobile app sets it to application/xml or application/json. The WebAuthorizationFilter class returns the access-denied error as a HTML view, which the browser displays as a user-friendly error page.

public class WebAuthorizationFilter : IActionFilter
{
    … 
    public void OnActionExecuting(ActionExecutingContext context)
    {
        // Return if a non-API request is received
        var acceptTypes = HttpContext.Current.Request.AcceptTypes;
        if (!acceptTypes.Contains("text/html"))
        {
            return;
        }

        var secured = context.ActionDescriptor.GetCustomAttribute(typeof(SecuredAttribute), false).FirstOrDefault();
        if (null == secured)
        {
            return;
        }

        var user = context.HttpContext.User;
        var param = context.ActionParameters.Where(p => p.Key == "id").FirstOrDefault();
        var id = Convert.ToInt32(param.Value);

        // Invoke a service to check if the request should be allowed
        var isAllowed = securityService.IsAllowed(user, id);
        if (!isAllowed)
        {
            context.Result = new ViewResult()
            {
                ViewName = "AccessDenied",
            }
        }
    }
}

The ApiAuthorizationFilter class, on the other hand, returns a HTTP status code 403 in the response. The API client generates an appropriate error view on the client-side.

public class ApiAuthorizationFilter : IActionFilter
{
    … 
    public void OnActionExecuting(ActionExecutingContext context)
    {
        // Return if a non-API request is received
        var acceptTypes = HttpContext.Current.Request.AcceptTypes;
        if (!acceptTypes.Contains("application/xml"))
        {
            return;
        }

        var secured = context.ActionDescriptor.GetCustomAttribute(typeof(SecuredAttribute), false).FirstOrDefault();
        if (null == secured)
        {
            return;
        }

        var user = context.HttpContext.User;
        var param = context.ActionParameters.Where(p => p.Key == "id").FirstOrDefault();
        var id = Convert.ToInt32(param.Value);

        // Invoke a service to check if the request should be allowed
        var isAllowed = securityService.IsAllowed(user, id);
        if (!isAllowed)
        {
             context.Result = new HttpStatusCodeResult(HttpStatusCode.Unauthorized);
        }
    }
}

Practical Design Patterns in C# – Singleton

The Singleton pattern is one of the simplest and most commonly used in practice. Its purpose is to restrict a class to only one instance throughout the lifetime of the application. This is desirable in cases when there is one or only a few resources available and access has to be controlled.

A common use case is resource pools to objects which are expensive to acquire. Some examples are a database connection pool or a local cache of the results of a network operation. The object pool itself is a design pattern worthy of its own detailed discussion. In this article, however, I will limit myself to describing its implementation as a singleton.

Single-threaded, Lazy Initialisation

The first example is the simplest possible implementation of this pattern. This version is not thread-safe, but can use lazy initialisation in a single-threaded scenario. If there are multiple threads attempting to invoke the Instance property of this class, many or all of them will cause an invocation of the constructor. In this case, it has to be eagerly initialised.

public sealed class ConnectionPool
{
    private static ConnectionPool _instance;

    private ISet<IDbConnection> _released;

    private ISet<IDbConnection> _acquired;

    private ConnectionPool()
    {
        // Initialise the collection objects
    }

    public IDbConnection Acquire()
    {
        // Retrieve a free connection instance
        // Instantiate a new one if there are no free instances
        // Move the connection instance into the acquired collection
    }

    public void Release(IDbConnection connection)
    {
        // Remove the connection from the acquired collection
        // Add it into the released collection
    }

    public static ConnectionPool Instance
    {
        get
        {
            if (null == _instance)
            {
                _instance = new ConnectionPool();
            }

            return _instance;
        }
    }
}

This could be an acceptable solution for simple situations where multiple threads are not involved. But with multi-threaded applications being more or less the norm in a language like C#, the actual number of situations where this code will be safe to use are very limited. This is truer than ever before with the advent of opaque multi-threading as supported by TPL. In my experience, it is safer to err on the side of caution and not use this implementation in production.

In multi-threaded situations, the risk of invoking the constructor multiple times can be mitigated by calling the Instance property immediately on startup before launching any additional threads, i.e. by using eager initialisation. But the cost of eager initialisation and high risk of introducing synchronisation bugs in the future makes it an unattractive option.

If you need thread-safety in your singleton, you need a different solution. The simplest way is to use a lock.

Multi-threaded, Lazy Initialisation

public sealed class ConnectionPool
{
    private static ConnectionPool _instance;

    …

    private readonly object padlock = new { };

    private ConnectionPool()
    {
        // Initialise the collection objects
    }

    public IDbConnection Acquire()
    {
        …
    }

    public void Release(IDbConnection connection)
    {
        …
    }

    public static ConnectionPool Instance
    {
        get
        {
            lock (padlock)
            {
                if (null == _instance)
                {
                    _instance = new ConnectionPool();
                }

                return _instance;
            }
        }
    }
}

The drawback to this technique is that every invocation of Instance has to acquire a lock. This is an unnecessary performance hit after an instance has been created once, because the property does not modify the instance ever afterwards.

Again, this is not incorrect. The code will perform exactly according to the pattern specification, and can be used if bleeding-edge performance is not an absolute must-have or the instance is not being referenced frequently. In practice, locking impacts performance by a single-digit percentage.

Lock-free, Multi-threaded, Lazy Initialisation

But faster it can get. The advantages of ease in maintaining and reasoning about lock-free code are a bonus.

public class ConnectionPool
{
    private static readonly ConnectionPool _instance = new ConnectionPool();

    static ConnectionPool()
    {
    }

    private ConnectionPool()
    {
    }

    public IDbConnection Acquire()
    {
        …
    }

    public void Release(IDbConnection connection)
    {
        …
    }

    public static ConnectionPool Instance
    {
        get
        {
            return _instance;
        }
    }
}

The above example contains a static constructor along with the regular instance constructor. The static constructor itself does not require any code. But as a side-effect of its presence in the code, the type initialiser method of this class is only executed when a static or instance field is accessed for the first time, or when a static, instance or virtual method of the type is invoked for the first time.

This is a by-product of the language design and described in detail by Jon Skeet on his website. The runtime runs this check before creating a new type, and guarantees that this executes only once. This implementation leverages this behaviour to constrain the class instantiation to just once, only when the Instance property of the class is invoked for the first time.

Practical Design Patterns in C# – Prototype

The prototype pattern simplifies creating multiple instances of a class by copying the contents of an existing instance to create a new one. This alleviates the cost of any expensive computation or I/O being performed in the constructor, or simply copies over the state of an existing object to the new instance for further modification.

This article demonstrates a document editor that has the ability to make and display multiple concurrent snap shots of a document. Each can be reviewed and edited individually and compared with each other. The process begins with a single document in memory, which the user can edit before cloning. When a draft is ready, the user duplicates the object by invoking a command from the UI. This copies the contents of the document over into a new instance. The document also contains a date and time field which is modified to match the time when the duplicate is created.

Both versions now exist in memory simultaneously and the user can cycle between them as needed.

For this, there must be a way for the document class to be duplicated. The .NET framework already defines an interface called System.ICloneable with a method called Clone() for just this purpose. The caveat of this technique is that the return type of the Clone() method is an object, thus requiring type-casting of the object instance which is returned. It is quite easy to declare a generic version of the ICloneable interface to avoid this problem.

public interface ICloneable<T>
{
    T Clone();
}

The Document class implements this interface as shown below. This snippet also shows the CreatedOn and Text properties that the Document class exposes.

[Serializable]
public class Document : ICloneable<Document>
{
    private int _version = 0;

    public Document()
    {
        CreatedOn = DateTime.UtcNow;
        Text = string.Empty;
        _version = 1;
    }

    public DateTime CreatedOn
    {
        get;
        private set;
    }

    public string Text
    {
        get;
        set;
    }

    public Document Clone()
    {
        Document clone;

        using (var stream = new MemoryStream())
        {
            var formatter = new BinaryFormatter();
            formatter.Serialize(stream, this);
            var self = stream.ToArray();

            stream.Seek(0, SeekOrigin.Begin);
            clone = (Document)formatter.Deserialize(stream);
            clone.CreatedOn = DateTime.UtcNow;
            clone._version++;
        }

        return clone;
    }
}

When the document instance has to be duplicated, the client calls the Clone method to create a whole new instance of the object based on the contents of the original.

The implementation of the Clone method itself is highly specific to the language and platform in use. The example above demonstrates a straightforward binary copy by serializing the class into a byte array, then immediately serializing it back into a new object instance. This mandates that the Document class be decorated with the Serializable attribute.

This technique works in C# and other .NET languages which implement the BinaryFormatter and MemoryStream APIs.

Another technique to create a duplicate is calling the constructor and setting each property and field individually. Since the Clone method is a part of the same class, it has access to the private members of the new instance as well.

private Document Clone()
{
    var clone = new Document();
    clone.CreatedOn = DateTime.UtcNow;
    clone.Text = Text;
    clone._version++;

    return clone;
}

Finally, the .NET Framework provides a built-in method called MemberwiseClone, which creates a shallow object copy.

private Document Clone()
{
    var clone = (Document)base.MemberwiseClone();
    clone.CreatedOn = DateTime.UtcNow;
    clone._version++;

    return clone;
}

All methods shown above create a shallow copy of the object. In order to create a deep copy, each reference type that the Document class uses must implement a similar Clone method which is invoked when the instance has to be cloned. This can quickly escalate out of hand if these classes further reference other types. For this reason, this pattern is best applied to classes that use value types only.

Practical Design Patterns in C# – Builder

The builder pattern abstracts multiple steps required to create a complex object into a coherent interface. Objects that need several input parameters are prime candidates for refactoring behind a builder object.

This example shows how to create an email message as well as a JSON message object from the same input. The utility of builders can be increased manifold through the use of generic interfaces. This is also demonstrated in this example. To understand the benefits of this pattern, we must first analyse some code that does not utilise it.

var message = new MailMessage("john@example.com",
                "jane@example.com",
                "101 Ways to Refactor",
                "Lorem ipsu…");

Quick, can you identify what is each parameter in this statement? Is John the sender or recipient? How do you add more than one recipient? How does one create HTML-formatted email? These questions are not immediately obvious. And adding more configurability into the MailMessage class would make for ever longer constructor calls.

A more readable alternative is clearly required, at least to retain the maintenance programmer’s sanity for the future. And there’s our hook for the builder pattern.

The fundamental type in this example is the builder itself, EmailBuilder in this case. This class separates the components of the message into discrete methods.

public class EmailBuilder
{
    private MailMessage _message;

    public EmailBuilder()
    {
        _message = new MailMessage();
    }

    public EmailBuilder AddressedTo(string recipient)
    {
        …
    }

    public EmailBuilder From(string sender)
    {
        …
    }

    public EmailBuilder WithBody(string body)
    {
        …
    }

    public EmailBuilder WithSubject(string subject)
    {
        …
    }

    public MailMessage GetResults()
    {
        return _message;
    }
}

The client code consumes this interface by invoking each method one after the other. All methods in the builder class return the instance itself (i.e. this) so that multiple methods can be chained together into a single, fluid statement.

var builder = new EmailBuilder();
var message = builder.From("john@example.com")
    .AddressedTo("jane@example.com")
    .AddressedTo("jim@example.com")
    .WithSubject("101 Ways to Refactor")
    .WithBody("Lorem ipsu…")
    .GetResults();

This simple class already makes the code exponentially readable and maintainable. But there’s more!

What if the program has to create multiple types of messages, such as an email plus a browser notification encoded into a JSON object? The API from the EmailBuilder can be easily extracted into an IMessageBuilder interface, then applied to different types of builders.

public interface IMessageBuilder<T>
{
    IMessageBuilder WithSubject(string subject);

    IMessageBuilder<T> WithBody(string body);

    IMessageBuilder<T> From(string sender);

    IMessageBuilder<T> AddressedTo(string recipient);

    T GetResults();
 }

This is a generic interface as the return type of each builder differs, depending on the type of message it creates. As a result, the EmailBuilder class definition and method signatures change as shown below. Any code that consumes this class is not affected.

public class EmailBuilder : IMessageBuilder<MailMessage>
{
    private MailMessage _message;

    public EmailBuilder()
    {
        _message = new MailMessage();
    }

    public IMessageBuilder<MailMessage> AddressedTo(string recipient)
    {
        …
    }

    public IMessageBuilder<MailMessage> From(string sender)
    {
        …
    }

    public IMessageBuilder<MailMessage> WithBody(string body)
    {
        …
    }

    public IMessageBuilder<MailMessage> WithSubject(string subject)
    {
        …
    }

    public MailMessage GetResults()
    {
        return _message;
    }
}

This done, we can move on to the JSON builder. For this example, we use types defined in the wildly popular Json.NET library from Newtonsoft.

public class JsonBuilder : IMessageBuilder
{
    private StringBuilder _builder;

    private JsonTextWriter _writer;

    public JsonBuilder()
    {
        _builder = new StringBuilder();
        var sw = new StringWriter(_builder);
        _writer = new JsonTextWriter(sw);
        _writer.WriteStartObject();
    }

    public IMessageBuilder<string> AddressedTo(string recipient)
    {
        _writer.WritePropertyName("AddressedTo");
        _writer.WriteValue(recipient);
        return this;
    }

    public IMessageBuilder<string> From(string sender)
    {
        _writer.WritePropertyName("From");
        _writer.WriteValue(sender);
        return this;
    }

    public IMessageBuilder<string> WithBody(string body)
    {
        _writer.WritePropertyName("Body");
        _writer.WriteValue(body);
        return this;
    }

    public IMessageBuilder<string> WithSubject(string subject)
    {
        _writer.WritePropertyName("Subject");
        _writer.WriteValue(subject);
        return this;
    }

    public string Build()
    {
        _writer.WriteEndObject();
        return _builder.ToString();
    }
}

The client code must instantiate the JsonBuilder and invoke the same methods on it as it does on any other builder. Each builder can have its own customisations, such as only storing a single recipient’s name in the JSON object, which gets overwritten by any subsequent call to AddressedTo().

var builder = new JsonBuilder();
var message = builder.From("john@example.com")
    .AddressedTo("jane@example.com")
    .AddressedTo("jim@example.com")
    .WithSubject("101 Ways to Refactor")
    .WithBody("Lorem ipsu…")
    .GetResults();

Practical Design Patterns in C# – Abstract Factory

The abstract factory design pattern provides an interface to create instances of related objects which operate in sync with each other. This pattern is a superset of the factory method design pattern, as it separates the object creation from consumption, while adding the necessary checks to ensure that only related types of objects can be instantiated at a given time.

In this example, we look at an application that offers options to connect to SQL Server or MySQL database engines to store its data. This is a common problem faced by many applications. A rudimentary way of implementing this functionality is to pepper the code with if-else statements throughout the application. Needless to say, this is impractical, brittle and error prone – all traits to avoid in good code.

A database operation requires types to make a connection to the database, a command to contain the query and a type to hold a query parameter. All three have to belong to the same family in order to operate correctly. Swapping any type of them for a type from another family is not allowed. E.g. a MySQL connection instance will not be able to operate with a SQL Server command instance. Therefore, if a program has to offer the ability to work with various types of database engines, it has to guarantee that only compatible types are instantiated at a given time.

This pattern consists of at least 2 types – one to define the abstract factory interface, and a concrete factory class that implements this interface. More families of related objects are added to the application by implementing more factories.

The core API is provided by the IDbFactory interface.

public interface IDbFactory
{
    IDbConnection CreateDbConnection();

    IDbCommand CreateDbCommand();

    IDbDataParameter CreateDbDataParameter();
}

IDbFactory defines methods for the database factory.  IDbConnection, IDbCommand and IDbDataParameter are interfaces provided by the framework in the System.Data namespace. All major third-party database vendors provide bindings for their engines which are compatible with these interfaces.

SQL Server Factory

When the application has to operate against SQL Server, it instantiates the SqlFactory class, which implements the IDbFactory interface and returns the SQL Server-specific concrete types for IDbConnection, IDbCommand and IDbDataParameter.

public class SqlFactory : IDbFactory
{
    public IDbConnection CreateDbConnection()
    {
        return new System.Data.SqlClient.SqlConnection();
    }

    public IDbCommand CreateDbCommand()
    {
        return new System.Data.SqlClient.SqlDbCommand();
    }

    public IDbDataParameter CreateDbDataParmeter()
    {
        return new System.Data.SqlClient.SqlParameter();
    }
}

MySQL Factory

The MySqlFactory class operates in the same way and returns objects from the MySql.Data.MySqlClient namespace.

public class MySqlFactory : IDbFactory
{
    public IDbConnection CreateDbConnection()
    {
        return new MySql.Data.MySqlClient.MySqlConnection();
    }

    public IDbCommand CreateDbCommand()
    {
        return new MySql.Data.MySqlClient.MySqlCommand();
    }

    public IDbDataParameter CreateDbDataParameter()
    {
        return new MySql.Data.MySqlClient.MySqlParameter();
    }
}

Usage

When the application is launched, it determines the active database engine it has to operate against (through UI or configuration or some other means), and then instantiates the appropriate IDbFactory implementation. All objects required to operate on the database are subsequently instantiated from this IDbFactory instance.

public class Program
{
    public static void Main(string[] args)
    {
        IDbFactory factory;

        if ("sql" == args[1])
        {
            factory = new SqlFactory();
        }
        else
        {
            factory = new MySqlFactory();
        }

        using (var connection = factory.CreateDbConnection())
        {
            connection.Open();
            var command = factory.CreateDbCommand();
            command.Connection = connection;
            command.CommandText = "SELECT * FROM [Users] WHERE [UserId] = ?";
            var param = factory.CreateDbDataParameter();
            param.DbType = DbType.Int32;
            param.Value = 42;
            command.Parameters.Add(param);

            command.ExecuteReader();
        }
    }
}

Practical Design Patterns in C# – Factory Method

The factory method is exactly what is says on the label – a method that produces object instances on demand. Ergo, a factory method. This method separates the creation of object instances from the consumption. It adds very little value to a code-base on its own in most cases. However, the factory method is the foundation to the abstract factory, which offers much more utility.

In this example, we look at an application that has to write data to one of two different file formats. Separate file writer classes are required for each format, but both must share a common interface for this pattern to work. The correct class is instantiated and the Write method invoked upon it to perform the save operation.

All writer classes are required to implement the IFileWriter interface.

public interface IFileWriter
{
    void Write(byte[] data, string filename);
}

This interface is implemented by each writer class. The example below shows the JpegWriter and PngWriter classes.

public class JpegWriter : IFileWriter
{
    void Write(byte[] data, string filename) { … }
}
public class PngWriter : IFileWriter
{
    void Write(byte[] data, string filename) { … }
}

Then a factory interface is declared, and a factory class created for each file type.

public interface IFileWriterFactory
{
    public IFileWriter Create();
}
public class JpegWriterFactory : IFileWriterFactory
{
    public Create()
    {
        return new JpegWriter();
    }
}
public class PngWriterFactory : IFileWriterFactory
{
    public Create()
    {
        return new PngWriter();
    }
}

Usage

The application determines the file type that is required, then invokes the Create method on the appropriate writer factory, which in turn returns a concrete implementation of the IFileWriter class.

The application then invokes the Write method on the writer instance, which handles format-specific details such as endianness, file headers, encoding and more.

static class Program
{
    private static _jpegWriterFactory = new JpegWriterFactory();

    private static _pngWriterFactory = new PngWriterFactory();

    static void Main()
    {
        // Image created and serialized into byte[]
        …

        IFileWriter writer;
        if ("jpg" == args[1])
        {
            writer = _jpegWriterFactory.Create();
        }
        else
        {
            writer = _pngWriterFactory.Create();
        }

        writer.Write(data, "lena");
    }
}

The Content Data Context

This post is part of a series on learning how to use Entity Framework. The rest of the posts in the series are linked below.

Basics of Entity Framework

Code First

Database First


Recall from an earlier post how the DbContext class provides the framework to establish and maintain an open connection to the database. This class is typically extended with a custom context for the application, which allows the developer to add DbSet instances for all the types which are part of the application’s Entity Framework model. Each DbSet property is a collection of entity instances, one for each record in the database. The DbSet is an implementation of IDbSet, which in turn extends IQueryable and IEnumerable.

The DbContext primarily fulfils the following roles.

Querying

Provides an interface to convert statements from LINQ-to-Entities (using queries as well as method syntax) to the SQL syntax specific to the underlying database provider.

Object Materialisation

Converts database records from ADO.NET types such as DataSet to application-specific types through mapping rules.

Change Tracking

Tracks the state of newly created, modified or deleted entities for later committing to the database.

Persistence

Triggering INSERT, UPDATE or DELETE queries on changed entities when the SaveChanges method is called.

Caching

Provides a local, lightweight embedded store for recently retrieved entities to avoid repeated querying against the primary database.

Connecting to the Database

The DbContext class automatically creates a database whose name is derived from the fully-qualified name of the context class. However, this behaviour can be overridden by the programmer by passing a custom connection string, database name, or connection string name from the .config file associated with the running application.

The DbContext class has 7 variants of the constructor to provide different ways of establishing a connection to the database server. The default constructor triggers the behaviour described above.

The constructor variant that takes a string parameter can be used to specify any one of the following.

  1. A standard connection string.
  2. A name of a connection string declared in the configuration file (using the syntax “name=connectionString”).
  3. An Entity Framework connection string declared in the configuration file.
  4. A database name (which is sought on the server .\SQLEXPRESS by default).

Alternative constructors can accept DbConnection or ObjectContext instances instead of a string in order to work with already open connections.

Entity Collections

Records in the database tables are referenced through a DbSet instance, which is an implementation of the set abstract data type specifically designed for Entity Framework. A collection can be non-generic by using the System.Data.Entity.DbSet type, or generic through the use of System.Data.Entity.DbSet<TEntity> type.

Collections of root entities are often exposed as DbSet properties of the application’s context class. But specific entity types can also be extracted from the database by using the Set method (or its generic variant) on the context.

The DbSet class exposes methods to perform your standard CRUD operations on the table. The most important methods are Create, Add, Find, Remove and Attach. You will notice that most of these methods are also available on the Set ADT, which the DbSet type is based on.

Create, Add and Attach sound similar in nature, but are quite different. The Create method instantiates an entity without adding it to the DbSet. The Add method is used to add the entity to the database and mark it for saving later. Finally, Attach adds an entity to the collection without marking it for saving. This is useful for adding temporary entities to read-only collections.

The revised ContentContext class for this application looks like this.

namespace Notadesigner.Blog
{
    public class ContentContext : DbContext
    {
        public ContentContext(“Notadesigner”)
        {
        }

        public virtual DbSet Blogs
        {
            get;
            set;
        }

        public virtual DbSet Posts
        {
            get;
            set;
        }

        public virtual DbSet Categories
        {
            get;
            set;
        }
    }
}

Based on this structure, the code to retrieve a specific Post by its primary key would be as shown below.

var db = new ContentContext();
var post = db.Posts.Find(10);

Inserting a new record is done in two parts. First, the entity instance is created, then it is added to the collection and the changes saved to the database.

var db = new ContentContext();
var post = db.Posts.Create();
post.Title = "Ways to Try and Take Over the World";
post.Slug = GenerateSlug(post.Title);
// Set other properties on the entity
…
db.Posts.Add(post);
db.SaveChanges();

Attach is useful when you wish to update only a few properties in a record which contains many columns.

var db = new ContentContext();
var post = db.Posts
                .AsNoTracking()
                .Where(p => p.PostId = 10)
                .FirstOrDefault();
// Change the title only
post.Title = "More Ways to Try and Take Over the World";
db.Posts.Attach(post);
db.SaveChanges();

The above code generates an update statement that only affects the Title column, as opposed to updating all the columns in the Posts table.

Deleting records is as simple as finding an entity and marking it for deletion through the Remove method.

var db = new ContentContext();
var post = db.Posts.Find(10);
db.Posts.Remove(post);
db.SaveChanges();

Irrespective of the method used, changes are committed only if the SaveChanges method is called on the context instance.

Conventions, Overrides

When an application using the code-first approach is launched for the first time, Entity Framework creates tables for each entity in the newly created database by converting the entity name into its plural form. For example, the Post entity is associated with a table called “Posts”. The columns in the table are also created automatically based on the entity properties. The property PostId is associated with a column of non-nullable integers of the same name, and it is assigned as the primary key. Similarly, the Title property gets associated with the non-nullable varchar column called Title, whose maximum size is 256 characters. Default conventions can be overridden by using attributes as shown in the previous post, or by using the Fluent API in the initialisation strategy.

Some of the most essential default conventions are shown below.

Schema

The dbo schema is used for all database operations by default.

Tables

Table names are derived from the plural form of the entity they represent.

Primary and Foreign Keys

The Id property or <Entity>Id property is the default primary key. A property with the same name is required in the dependent table. The foreign key is named by combining the dependent navigation property name with the principal primary key property name, separated by an underscore (i.e. Post_OwnerId).

Nullability

Columns to store reference type and nullable primitive type properties are nullable, and primitive type properties are stored in non-nullable columns.

Laying the Groundwork

This post is part of a series on learning how to use Entity Framework. The rest of the posts in the series are linked below.

Basics of Entity Framework

Code First

Database First


Many application developers prefer a code-first approach to implementing a business application with Entity Framework. In this workflow, the programmer begins by writing classes that represent entities in the domain model. As we are aware, each instance of these classes represents a single row from the database. The entity classes correspond to a single table. Public properties on the class are mapped to columns in this table.

For example, a Person class maps to the Persons table in the database in the conventional Entity Framework implementation. The properties Name (string), Age (int) and Gender (enum), correspond to columns Name (nvarchar), Age (int) and Gender (int). When a row is retrieved from the database, the values from its columns are loaded into the properties of this instance. Conversely, modified values in the Person instance can be written back to the table by passing its reference back into the appropriate Entity Framework APIs.

Traditionalists would cringe at the thought of writing application code without a rock-solid database model already in place to back it up. And their outlook is borne out of poor outcomes from automatic table-generation tools. Fortunately, Entity Framework (and several other modern ORM tools) address the table performance characteristics and data integrity problems very well.

If performance or correctness are a problem, they are probably a result of poor description of the application domain by the programmer, rather than a shortcoming of the ORM. Being able to generate entity structures from the application layer does not take away the need to understand the performance characteristics of a database engine. And errors in designing or describing the domain model by inexperienced application developer can still affect the product adversely.

The Business Model

The subject of this exercise is a multi-user blog that can be used to host and serve posts authored by one or more registered users. Posts are created and owned by any user with the appropriate level of access to the editor interface. The owner can then invite more authors to collaborate with them. Collaborators can edit the contents of a post, but cannot delete it.

Finally, anonymous users can view the post by visiting a specific URL for each post. The blog home page is a read-only index of all posts, sorted in reverse chronological order.

The Blog

The blog is the primary entity in our application. It has several descriptive properties of the site itself, such as the name, description and URL. A blog is owned by a user who has super-admin rights to the application. This user has to be created and linked to the blog at the time of setup.

using System.ComponentModel.DataAnnotations;

namespace Notadesigner.Blog.Models
{
    public class Blog
    {
        [Key]
        public int BlogId { get; set; }

        [Required]
        [StringLength(128)]
        public string Name { get; set; }

        [Required]
        public string BaseUrl { get; set; }

        [Required]
        [ForeignKey(“Owner”)]
        public int OwnerId { get; set; }

        public User Owner { get; set; }

        public ICollection<Category> Categories { get; set; }

        public ICollection<Post> Posts { get; set; }

        public ICollection<User> Users { get; set; }
    }
}

This example introduces several classes from the DataAnnotations namespace which Entity Framework uses when creating the database tables.

Key

The default convention of the framework is to make any property suffixed with the word Id into the primary key for the corresponding table. The Key attribute is not required on the BlogId property. But making it explicit helps maintenance later down the line. The index in the database engine is prefixed with IX_, e.g. IX_PostId.

Required

Nullable columns are the default convention in Entity Framework. The Required attribute changes this and marks them as non-nullable. In addition to enforcing data integrity, this attribute is also used to provide client-side validation in the MVC framework.

StringLength

This attribute defines an explicit maximum on the size of string columns, such as nvarchar in MS SQL. Like Required, this attribute is also usable for client-side validation.

ForeignKey

This attribute defines a foreign key between two entities which enforces relational integrity between tables in the database. The UserId column can only contain a value which already exists in the corresponding column in the User table.

This attribute requires the name of the property which contains the referenced entity record. In this case, the parameter is “Owner”, which is of type User. The User class is described further below.

Users & Roles

A User is the unique identification of a person who operates or accesses content from the blog. Is is composed of a minimum of a unique identification number and a unique user name. A person who wishes to identify as a particular User in the application also requires the password for that account. Passwords are stored as salted and hashed strings.

Roles are coarse-level controls over the actions that a single User can perform in the application. A role is composed of several granular permissions, which can either be explicitly granted or denied. Permissions are not provided by individual modules in the application. For example, the Posts module might have the permissions view, create, edit-title, edit-body, edit-attributes, publish, unpublish and delete. The Users module might offer the permissions create, reset-password, recover, edit-attributes and delete. A Role organises these permissions into a cohesive group of related permissions (e.g. Subscriber, Publisher, Content Creator, Intern, Moderator, Administrator, etc.).

In a typical usage scenario, Users must belong to at least one role in order to be able to perform any useful identifying activity in the application. But the application can also be configured to allow anonymous access to the entire application.

Posts

A Post is a single unit of content in the application. It is created and owned by a single user, and can have one or more collaborators who contribute to the content.

using System.ComponentModel.DataAnnotations;

namespace Notadesigner.Blog.Models
{
    public class Post
    {
        [Key]
        public int PostId { get; set; }

        [Required]
        [StringLength(256)]
        public string Title { get; set; }

        [Required]
        [Index(IsUnique = true)]
        [StringLength(128)]
        public string Slug { get; set; }

        [StringLength(102400)]
        public string Body { get; set; }

        [Required]
        public PostStatus Status { get; set; }

        public DateTime CreatedOn { get; set; }

        public DateTime ModifiedOn { get; set; }

        [Required]
        [ForeignKey(“Owner”)]
        public OwnerId { get; set; }

        public User Owner { get; set; }

        public ICollection<User> Collaborators { get; set; }

        public ICollection<Category> Categories { get; set; }
    }
}

The class is fairly vanilla, and has all the data annotations which were described previously. It also introduces the Index attribute on the Slug property. A slug is a human-friendly identifier for a post. Words and phrases add more context to the human mind than plain integers. So it is easier to remember and parse what the phrase “a-field-guide-to-entity-framework” points to, than it is to remember the post ID.

In order for slugs to work correctly, they have to be unique. Therefore, the model requires a unique constraint on the value contained in the slug. This instruction is converted into a unique, clustered index in the database table.

A single entity can have multiple indexes, only one of which can be clustered. An index can be composed of multiple properties.

Categories

Categories provide the taxonomy framework to organise posts into groups of common topics. It is a fairly common feature in all content management systems.

using System.ComponentModel.DataAnnotations;

namespace Notadesigner.Blog.Models
{
    public class Category
    {
        [Key]
        public int CategoryId { get; set; }

        [Required]
        [StringLength(256)]
        public string Name { get; set; }

        [Required]
        [Index(IsUnique = true)]
        public string Slug { get; set; }

        [Required]
        [ForeignKey(“Owner”)]
        public OwnerId { get; set; }

        public User Owner { get; set; }

        public ICollection<Post> Posts { get; set; }
    }
}

These classes collectively provide all the data objects needed to store and manipulate records from the database in the application. The next post in the series will cover the process of connecting to a data source and loading records into memory by using the data context.

Domain Entities

This post is part of a series on learning how to use Entity Framework. The rest of the posts in the series are linked below.

Basics of Entity Framework

Code First

Database First


An entity class is written in your application programming language using standard language constructs such as the qualifier, class name and public properties. All properties must have an accessor and a mutator.

namespace Notadesigner.Blog
{
    public class Post
    {
        public int PostId { get; set; }
        public string Title { get; set; }
        public string Body { get; set; }
        public DateTime CreatedOn { get; set; }
    }
}

The class is utilised into the Entity Framework by creating a DbSet instance property for it in the DbContext.

namespace Notadesigner.Blog
{
    public class ContentContext : DbContext
    {
        public DbSet Posts { get; set; }
    }
}

Scalar Properties

Entity classes have scalar properties to represent values which are stored in the database column directly. For example, the values for PostId, Title, Body and CreatedOn fields are stored in columns in the Posts table in the database.

Navigation Properties

Navigation properties are used to represent relationships between database tables. Entity Framework supports relationship multiplicities of zero-or-one, one, zero-or-many, or one-or-many. Object references are used to represent relationships. Properties whose type is of another entity are used for zero-or-one, or one multiplicity. Properties whose type is a generic collection instance represent zero-or-many, or one-or-many relationships.

namespace Notadesigner.Blog
{
    public class Post
    {
        public int PostId { get; set; }
        …
        // Multiplicity of one or zero-or-one
        public Author CreatedBy { get; set; }

        // Multiplicity of zero-or-many, or many
        public ICollection { get; set; }
    }
}

Entity States

Each entity class instance has an associated state which is maintained by the Entity Framework internally. The value of the state property changes automatically based on the operations that the developer performs on the entity instance or the database context instance.

This property can transition between the following enumerations.

  • Added
  • Modified
  • Deleted
  • Unchanged
  • Detached

The context instance in the object service layer is responsible for maintaining the current state and changing it in response to API calls. When the SaveChanges method is called, the context instance determines what database operation has to be performed on each entity instance in the DbSet collection based on the value of its State property.

An insert operation is performed for entities whose state is set to Added, update for entities whose state property is set to Modified and delete for entities whose state property is set to Deleted.