Splitting and Merging Pdf Files in C# Using iTextSharp

Posted on March 9 2013 11:44 AM by John Atten in CodeProject, C#   ||   Comments (3)

I recently posted about using PdfBox.net to manipulate Pdf documents in your C# application. This time, I take a quick look at iTextSharp, another library for working with Pdf documents from within the .NET framework.

Some Navigation Aids:

What is iTextSharp?

iTextSharp is a direct .NET port of the open source iText Java library for PDF generation and manipulation. As the project’s summary page on SourceForge states, iText “  . . . can be used to create PDF Documents from scratch, to convert XML to PDF . . . to fill out interactive PDF forms, to stamp new content on existing PDF documents, to split and merge existing PDF documents, and much more.”

iTextSharp presents a formidable set of tools for developers who need to create and/or manipulate Pdf files. This does come with a cost, however. The Pdf file format itself is complex; therefore, programming libraries which seek to provide a flexible interface for working with Pdf files become complex by default. iText is no exception.

I noted in my previous post on PdfBox that PdfBox was a little easier for me to get up and running with, at least for rather basic tasks such as splitting and merging existing Pdf files. I also noted that iText looked to be a little more complex, and I was correct. However, iTextSharp does not suffer some of the performance drawbacks inherent to PdfBox, at least on the .net platform.

Superior Performance vs. PdfBox

Aston-Martin-V8-Sports-Car-For-EveryAs I observed in my previous post, PdfBox.net is NOT a direct port of the PdfBox Java library, but instead is a Java library running within .net using IKVM. While I found it very cool to be able to run Java code in a .NET context, there was a serious performance hit, most notably the first time the PdfBox library was called, and the massive IKVM library spun up what amounts to a .Net implementation of the Java Virtual Machine, within which the Java code of the PdfBox library is then executed.

Needless to say, iTextSharp does not suffer this limitation. the library itself it relatively lightweight, and fast.

Extracting and Merging Pages from an Existing Pdf File

One of the most common tasks we need to do is extract pages from one Pdf into a new file. We’ll take a look at some relatively basic sample code which does just that, and get a feel for using the iTextSharp programming model.

In the following code sample, the primary iTextSharp classes we will be using are the PdfReader, Document, PdfCopy, and PdfImportedPage classes.

My simplified understanding of how this works is as follows: The PdfReader instance contains the content of the source PDF file. The Document class, once initialized with the PdfReader instance and a new output FileStream, essentially becomes a container into which pages extracted from the source file represented in the PdfReader class will be copied. Note that the Document class represents the Pdf content as HTML, which will be used to construct a properly formatted Pdf file. The result is then output to the Filestream, and saved to disk at the location specified by the destination file name.

You can download the iTextSharp source code and binaries as a single package from Files page at the iTextSharp project site. Just click on the “Download itextsharp-all-5.4.0.zip” link. Extract the files from the .zip archive, and stash them somewhere convenient. Next, set a reference in your project to the itextsharp.dll. You will need to browse to the folder where you stashed the extracted contents of the iTextSharp download.

NOTE: The complete example code for this post is available at my Github Repo.

I went ahead and created a project named iTextTools, with a class file named PdfExtractorUtility. Add the following using statements at the top of the file:

Set up references and Using Statements to use iTextSharp

using iTextSharp.text;
using iTextSharp.text.pdf;
using System;
// CLASS DEPENDS ON iTextSharp: http://sourceforge.net/projects/itextsharp/
namespace iTextTools
{
    public class PdfExtractorUtility
    {
    }
}

 

First, I’ll add a simple method to extract a single page from an existing PDF file and save to a new file:

Extract Single Page from Existing PDF to a new File:

public void ExtractPage(string sourcePdfPath, string outputPdfPath, 
    int pageNumber, string password = "")
{
    PdfReader reader = null;
    Document document = null;
    PdfCopy pdfCopyProvider = null;
    PdfImportedPage importedPage = null;
    try
    {
        // Intialize a new PdfReader instance with the contents of the source Pdf file:
        reader = new PdfReader(sourcePdfPath);
 
        // Capture the correct size and orientation for the page:
        document = new Document(reader.GetPageSizeWithRotation(pageNumber));
 
        // Initialize an instance of the PdfCopyClass with the source 
        // document and an output file stream:
        pdfCopyProvider = new PdfCopy(document, 
            new System.IO.FileStream(outputPdfPath, System.IO.FileMode.Create));
        document.Open();
 
        // Extract the desired page number:
        importedPage = pdfCopyProvider.GetImportedPage(reader, pageNumber);
        pdfCopyProvider.AddPage(importedPage);
        document.Close();
        reader.Close();
    }
    catch (Exception ex)
    {
        throw ex;
    }
}

 

As you can see, simply pass in the path to the source document, the page number to be extracted, and an output file path, and you’re done.

If we want to be able to a range of contiguous pages, we might add another method defining a start and end point:

Extract a Range of Pages from Existing PDF to a new File:

public void ExtractPages(string sourcePdfPath, string outputPdfPath, 
    int startPage, int endPage)
{
    PdfReader reader = null;
    Document sourceDocument = null;
    PdfCopy pdfCopyProvider = null;
    PdfImportedPage importedPage = null;
    try
    {
        // Intialize a new PdfReader instance with the contents of the source Pdf file:
        reader = new PdfReader(sourcePdfPath);
 
        // For simplicity, I am assuming all the pages share the same size
        // and rotation as the first page:
        sourceDocument = new Document(reader.GetPageSizeWithRotation(startPage));
 
        // Initialize an instance of the PdfCopyClass with the source 
        // document and an output file stream:
        pdfCopyProvider = new PdfCopy(sourceDocument, 
            new System.IO.FileStream(outputPdfPath, System.IO.FileMode.Create));
 
            sourceDocument.Open();
 
        // Walk the specified range and add the page copies to the output file:
        for (int i = startPage; i <= endPage; i++)
        {
            importedPage = pdfCopyProvider.GetImportedPage(reader, i);
            pdfCopyProvider.AddPage(importedPage);
        }
        sourceDocument.Close();
        reader.Close();
    }
    catch (Exception ex)
    {
        throw ex;
    }
}

 

What if we want non-contiguous pages from the source document? Well, we might override the above method with one which accepts an array of ints representing the desired pages:

Extract multiple non-contiguous pages from Existing PDF to a new File:

public void ExtractPages(string sourcePdfPath, 
    string outputPdfPath, int[] extractThesePages)
{
    PdfReader reader = null;
    Document sourceDocument = null;
    PdfCopy pdfCopyProvider = null;
    PdfImportedPage importedPage = null;
    try
    {
        // Intialize a new PdfReader instance with the 
        // contents of the source Pdf file:
        reader = new PdfReader(sourcePdfPath);
 
        // For simplicity, I am assuming all the pages share the same size
        // and rotation as the first page:
        sourceDocument = new Document(reader.GetPageSizeWithRotation(extractThesePages[0]));
 
        // Initialize an instance of the PdfCopyClass with the source 
        // document and an output file stream:
        pdfCopyProvider = new PdfCopy(sourceDocument,
            new System.IO.FileStream(outputPdfPath, System.IO.FileMode.Create));
        sourceDocument.Open();
 
        // Walk the array and add the page copies to the output file:
        foreach (int pageNumber in extractThesePages)
        {
            importedPage = pdfCopyProvider.GetImportedPage(reader, pageNumber);
            pdfCopyProvider.AddPage(importedPage);
        }
        sourceDocument.Close();
        reader.Close();
    }
    catch (Exception ex)
    {
        throw ex;
    }
}

 

Scratching the Surface

Obviously, the example(s) above are a simplistic first exploration of what appears to be a powerful library. What I notice about iText in general is that, unlike some API’s, the path to achieving your desired result is often not intuitive. I believe this is as much to do with the nature of the PDF file format, and possibly the structure of lower-level libraries upon which iTextSharp is built.

That said, there is without a doubt much to be discerned by exploring the iTextSharp source code. Additionally, there are a number of resources to assist the erstwhile developer in using this library:

Additional Resources for iTextSharp

Lastly, there is a book authored by one of the primary contributors to the iText project, Bruno Lowagie:

 

Posted on March 9 2013 11:44 AM by John Atten     

Comments (3)

Working with Pdf Files in C# Using PdfBox and IKVM

Posted on January 30 2013 07:09 PM by John Atten in Java, C#, Hacks, CodeProject   ||   Comments (2)

I have found two primary libraries for programmatically manipulating PDF files;  PdfBox and iText. These are both Java libraries, but I needed something I could use with C Sharp. Well, as it turns out there is an implementation of each of these libraries for .NET, each with its own strengths and weaknesses:

Some Navigation Links:

PdfBox - .Net version

The .NET implementation of PdfBox is not a direct port - rather, it uses IKVM to run the Java version inter-operably with .NET. IKVM features an actual .net implementation of a Java Virtual Machine, and a .net implementation of Java Class Libraries along with tools which enable Java and .Net interoperability. 

PdfIcon_PngPdfBox’s dependency on IKVK incurs a lot of baggage in performance terms. When the IKVM libraries load, and (I am assuming) the “’Virtual’ Java Virtual Machine” spins up, things slow way down until the load is complete. On the other hand, for some of the more common things one might want to do with a PDF programmatically, the API is (relatively) straightforward, and well documented.

When you run a project which uses PdfBox, you WILL notice a lag the first time PdfBox and IKVM are loaded. After that, things seem to perform sufficiently, at least for what I needed to do.

Side Note: iTextSharp

iTextSharp is a direct port of the Java library to .Net.

iTextSharp looks to be the more robust library in terms of fine-grained control, and is extensively documented in a book by one of the authors of the library, iText in Action (Second Edition). However, the learning curve was a little steeper for iText, and I needed to get a project out the door. I will examine iTextSharp in another post, because it looks really cool, and supposedly does not suffer the performance limitations of PdfBox.

Getting started with PdfBoxVS-References-After_adding-PdfBox-and-IKVM

Before you can use PdfBox, you need to either build the project from source, or download the ready-to-use binaries. I just downloaded the binaries for version 1.2.1 from this helpful gentleman’s site, which, since they depend on IKVM, also includes the IKVM binaries. However, there are detailed instruction for building from source on the PdfBox site. Personally, I would start with the downloaded binaries to see if PdfBox is what you want to use first.

Important to note here: apparently, the PdfBox binaries are dependent upon the exact dependent DLL’s used to build them. See the notes on the PdfBox .Net Version page.

Once you have built or downloaded the binaries, you will need to set references to PdfBox and ALL the included IKVM binaries in your Visual Studio Project. Create a new Visual Studio project named “PdfBoxExamples” and add references to ALL the PdfBox and IKVM binaries. There are a LOT. Deal with it. Your project references folder will look like the picture to the right when you are done.

The PdfBox API is quite dense, but there is a handy reference at the Apache Pdfbox site.  The PDF file format is complex, to say the least, so when you first take a gander at the available classes and methods presented by the PDF box API, it can be difficult to know where to begin. Also, there is the small issue that what you are looking at is a Java API, so some of the naming conventions are a little different. Also, the PdfBox API often returns what appear to be Java classes. This comes back to that .Net implementation of the Java Class libraries I mentioned earlier.

Things to Do with PdfBox

It seems like there are three common things I often want to do with PDF files: Extract text into a string or text file, split the document into one or more parts, or merge pages or documents together. To get started with using PdfBox we will look at extracting text first, since the set up for this is pretty straightforward, and there isn’t any real Java/.Net weirdness here.

Extracting Text from a PDF File

To do this, we will call upon two PdfBox namespaces (“Packages” in Java, loosely), and two Classes:

The namespace org.apache.pdfbox.pdmodel gives us access to the PDDocument class and the namespace  org.apache.pdfbox.util   gives us the PDFTextStripper class.

In your new PdfBoxExamples project, add a new class, name it “PdfTextExtractor," and add the following code:

The PdfTextExtractor Class
using System;
using org.apache.pdfbox.pdmodel;
using org.apache.pdfbox.util;
namespace PdfBoxExamples
{
    public class pdfTextExtractor
    {
        public static String PDFText(String PDFFilePath)
        {
            PDDocument doc = PDDocument.load(PDFFilePath);
            PDFTextStripper stripper = new PDFTextStripper();
            return stripper.getText(doc);
        }
    }
}

 

As you can see, we use the PDDocument class (from the org.apache.pdfbox.pdmodel namespace) and initialize is using the static .load method defined as a class member on PDDocument. As long as we pass it a valid file path, the .load method will return an instance of PDDocument, ready for us to work with.

Once we have the PDDocument instance, we need an instance of the PDFTextStripper class, from the namespace org.apache.pdfbox.util. We pass our instance of PDDocument in as a parameter, and get back a string representing the text contained in the original PDF file.

Be prepared. PDF documents can employ some strange layouts, especially when there are tables and/or form fields involved. The text you get back will tend not to retain the formatting from the document, and in some cases can be bizarre.

However, the ability to strip text in this manner can be very useful, For example, I recently needed to download an individual PDF file for each county in the state of Missouri, and strip some tabular data our of each one. I hacked together an iterator/downloader to pull down the files, and the, using a modified version of the text stripping tool illustrated above and some rather painful Regex, I was able to get what I needed.

Splitting the Pages of a PDF File

At the simplest level, suppose you had a PDF file and you wanted to split it into individual pages. We can use the Splitter Class, again from the org.apache.pdf.util namespace. Add another class to you project, named PDFFileSplitter, and copy the following code into the editor:

The PdfFileSplitter Class
using org.apache.pdfbox.pdmodel;
using org.apache.pdfbox.util;
namespace PdfBoxExamples
{
    public class PDFFileSplitter
    {
        public static java.util.List SplitPDFFile(string SourcePath, 
            int splitPageQty = 1)
        {
            var doc = PDDocument.load(SourcePath);
            var splitter = new Splitter();
            splitter.setSplitAtPage(splitPageQty);
            return (java.util.List)splitter.split(doc);
        }
    }
}

 

Notice anything strange in the code above? That’s right. We have declared a static method with a return type of java.util.List. WHAT? This is where working with PdfBox and more importantly, IKVM becomes weird/cool. Cool, because I am using a direct Java class implementation in Visual Studio, in my C# code. Weird, because my method returns a bizarre type (from a C# perspective, anyway) that I was unsure what to do with.

I would probably add to the above class so that the splitter persisted the split documents to disk, or change the return type of my method to object[], and use the .ToArray() method, like so:

The PdfFileSplitter Class (improved?)
public static object[] SplitPDFFile(string SourcePath, 
    int splitPageQty = 1)
{
    var doc = PDDocument.load(SourcePath);
    var splitter = new Splitter();
    splitter.setSplitAtPage(splitPageQty);
    return (object[])splitter.split(doc).toArray();
}

 

In any case, the code in either example loads up the specified PDF file into a PDDocument instance, which is then passed to the org.apache.pdfbox.Splitter, along with an int parameter. The output in the example above is a Java ArrayList containing a single page from your original document in each element. Your original document is not altered by this process, by the way.

The int parameter is telling the Splitter how many pages should be in each split section. In other words, if you start with a six-page PDF file, the output will be three two-page files. If you started with a 5-page file, the output would be two two-page files and one single-page file. You get the idea.

Extract Multiple Pages from a PDF Into a New File

Something slightly more useful might be a method which accepts an array of integers as a parameter, with each integer representing a page number within a group to be extracted into a new, composite document. For example, say I needed pages 1, 6, and 7 from a 44 page PDF pulled out and merged into a new document (in reality, I needed to do this for pages 1, 6, and 7 for each of about 200  individual documents). We might add a method to our PdfFileSplitter Class as follows:

The ExtractToSingleFile Method
public static void ExtractToSingleFile(int[] PageNumbers, 
    string sourceFilePath, string outputFilePath)
{
    var originalDocument = PDDocument.load(sourceFilePath);
    var originalCatalog = originalDocument.getDocumentCatalog();
    java.util.List sourceDocumentPages = originalCatalog.getAllPages();
    var newDocument = new PDDocument();
    foreach (var pageNumber in PageNumbers)
    {
        // Page numbers are 1-based, but PDPages are contained in a zero-based array:
        int pageIndex = pageNumber - 1;
        newDocument.addPage((PDPage)sourceDocumentPages.get(pageIndex));
    }
    newDocument.save(outputFilePath);
}

 

Below is a simple example to illustrate how we might call this method from a client:

Calling the ExtractToSingleFile Method:
public void ExtractAndMergePages()
{
    string sourcePath = @"C:\SomeDirectory\YourFile.pdf";
    string outputPath = @"C:\SomeDirectory\YourNewFile.pdf";
    int[] pageNumbers = { 1, 6, 7 };
    PDFFileSplitter.ExtractToSingleFile(pageNumbers, sourcePath, outputPath);
}

 

Limit Class Dependency on PdfBox

It is always good to limit dependencies within a project. In this case, especially, I would want to keep those odd Java class references constrained to the highest degree possible. In other words, where possible, I would attempt to either return standard .net types from my classes which consume the PdfBox API, or otherwise complete execution so that client code calling upon this class doesn’t need to be aware of IKVM, or funky C#/Java hybrid types.

Or, I would build out my own “PdfUtilities” library project, within which objects are free to depend upon and intermix this Java hybrid. However, I would make sure public methods defined within the library itself accepted and returned only standard C# types.

In fact, that is precisely what I am doing, and I’ll look at that in a following post.

Links to resources:

 

Posted on January 30 2013 07:09 PM by John Atten     

Comments (2)

.Net DataGridview: Change Formatting of the Active Column Header to Track User Location

Posted on October 9 2012 10:34 PM by John Atten in C#, Hacks, CodeProject, Controls   ||   Comments (0)

I bumped into a question on StackOverflow this evening that I felt might make a short post. Accompanying code is available at my  

The guy who posted observed that the standard .net DataGridview control provides a helpful little glyph next to the row which contains the active cell:

DataGridView-Standard-Active-Row-Indicator

The original poster of the question was wondering how he might include a similar glyph to indicate the active column as well. My problem with that is that there already exists an option for an arrow-like glyph in a column header. Unfortunately, THAT glyph, by convention, tends to mean “Click  here to sort on this column.”

That does NOT mean that the OP was off-base, though. I can think of many cases where it would be handy to have some sort of reference to the active column in addition to the active row.

Emphasize the Active Column with the DataGridViewColumn.HeaderCell.Style property

One way to approach this is to simply cause the text in the header cell to be bold when the user navigates to a cell within that column. We can create a class which inherits from DataGridView, and take advantage of the CellEnter Event to cause this to happen:

DataGridView-Bold-Text-Active-Row-Indicator

In the following code, we have a member variable which holds a reference to the last active column. In our constructor, we initialize this column object so that when the control is instantiated, the reference is not null.

We also add an event handler to catch the CellEnter event locally. When this event fires, the handler (dgvControl_CellEnter) catches it, and makes a call to our final method, OnColumnFocus. This method accepts a column index as a parameter, and uses the index to identify the new active column. From there, we can use the HeaderCell.Style property to set the font to “bold” for this particular column.

In our constructor, note that we have to make an initial call to the OnColumnFocus method, so that the default starting column will be highlighted when the control is displayed at first. However, we have to check to see if there are actually any columns present first. This is because the Visual Studio Designer needs to be able to draw the empty control when we first place it on a form.

DataGridView: Cause the Active Column Header to Display Bold Text
class dgvControl : DataGridView
{
    // hold a reference to the last active column:
    private DataGridViewColumn _currentColumn;
 
    public dgvControl() : base()
    {
 
        // Add a handler for the cell enter event:
        this.CellEnter += new DataGridViewCellEventHandler(dgvControl_CellEnter);

 
        // When the Control is initialized, instantiate the placeholder
        // variable as a new object:
        _currentColumn = new DataGridViewColumn();

 
        // In case there are no columns added (for the designer):
        if (this.Columns.Count > 0)
        {
            this.OnColumnFocus(0);
        }
    }
    
 
    void dgvControl_CellEnter(object sender, DataGridViewCellEventArgs e)
    {
        this.OnColumnFocus(e.ColumnIndex);        
    }

 
    void OnColumnFocus(int ColumnIndex)
    {
 
        // If the new cell is in the same column, do nothing:
        if (ColumnIndex != _currentColumn.Index)
        {
 
            // Set up a custom font to represent the current column:
            Font selectedFont = new Font(this.Font, FontStyle.Bold);

 
            // Grab a reference to the current column:
            var newColumn = this.Columns[ColumnIndex];

 
            // Change the font to indicate status:
            newColumn.HeaderCell.Style.Font = selectedFont;

 
            // Set the font of the previous column back to normal:
            _currentColumn.HeaderCell.Style.Font = this.Font;

 
            // Set the current column placeholder to refer to the new column:
            _currentColumn = newColumn;
        }
    }
}

 

What if I want More?

What if we want more than just bold text in the active header? Well, things get trickier. Manipulating the other properties of the HeaderCell Style require setting EnableHeaderVisualStyles to false. This has the unfortunate side effect of flattening out the styling which some from the Windows 7 GUI styles. The slight gradient and color scheme are replaced by a much flatter header. While we could work around this by overriding the OnPaint method (at least to a degree) and implementing our own painting scheme, the impact of the effect is not too disturbing.

For example, we could decide that in addition to bolding the text in the header, we will set the BackColor to a slightly darker gray:

DataGridView-Bold-and-Gray-Active-Row-Indicator

To do this, we need only add three lines of code. First off, in our constructor, we set the EnableHeaderVisualStyles property to false. Next, in our OnColumnFocus method, we set the Style.BackColor property of the new active column to a darker shade of gray, and restore the previous active column to the default (empty) backcolor:

DataGridView: Cause the Active Column Header to Display Bold Text with a Darker Back Color:
class dgvControl : DataGridView
{
    // hold a reference to the last active column:
    private DataGridViewColumn _currentColumn;
    public dgvControl() : base()
    {
        this.EnableHeadersVisualStyles = false;

 
        // Add a handler for the cell enter event:
        this.CellEnter += new DataGridViewCellEventHandler(dgvControl_CellEnter);

 
        // When the Control is initialized, instantiate the placeholder
        // variable as a new object:
        _currentColumn = new DataGridViewColumn();

 
        // In case there are no columns added (for the designer):
        if (this.Columns.Count > 0)
        {
            this.OnColumnFocus(0);
        }
    }

 
    void dgvControl_CellEnter(object sender, DataGridViewCellEventArgs e)
    {
        this.OnColumnFocus(e.ColumnIndex);        
    }

 
    void OnColumnFocus(int ColumnIndex)
    {
        // If the new cell is in the same column, do nothing:
        if (ColumnIndex != _currentColumn.Index)
        {
            // Set up a custom font to represent the current column:
            Font selectedFont = new Font(this.Font, FontStyle.Bold);

 
            // Grab a reference to the current column:
            var newColumn = this.Columns[ColumnIndex];

 
            // Change the font to indicate status:
            newColumn.HeaderCell.Style.Font = selectedFont;

 
            // Change the color to a slightly darker shade of gray:
            newColumn.HeaderCell.Style.BackColor = Color.LightGray;

 
            // Set the font of the previous column back to normal:
            _currentColumn.HeaderCell.Style.Font = this.Font;

 
            // Change the color of the previous column back to the default:
            _currentColumn.HeaderCell.Style.BackColor = Color.Empty;

 
            // Set the current column placeholder to refer to the new column:
            _currentColumn = newColumn;
        }
    }
}

 

There are other options you might explore. In this post, we walked through some very basic ways to provide visual feedback to the user about their location within the DataGridView control.

The source code for this post is available at my

 

Posted on October 9 2012 10:34 PM by John Atten     

Comments (0)

C#: A Better Date-Masked Text Box

Posted on September 25 2012 06:14 PM by John Atten in C#, CodeProject, Hacks   ||   Comments (0)

Let’s face it. Managing date information within the .net framework (or any framework, really . . . Java is not much better) is a pain the the ass. Really. What makes it even worse is managing user data entry of date information. If that isn’t bad enough, there is a definite data type mismatch between the manner in which the .net framework represents date information, and the way relational databases handle dates.

The for this project (with a silly demo) is available on Github as a VS 2010 solution. Please feel free to fork, and if you make happy improvements, hit me with a pull request. There is plenty of room for improvement.

The Date Time Picker is Not Appropriate for All Situations . . . Because Sometimes, the Date is Unknown . . .

Sometimes we need to provide a means for users to enter a date if they have the information, and/or leave the date empty (null, if you will) until such time as they do. For example, in entering form data for a person, we may or may not know their Date of Birth. Do we really want to require some date, if we don’t know the correct birthdate? If we use the .Net DateTimePicker control, we have to. While there are hacks and workarounds for this, most require some sort of painful validation checking in our code

Never mind that the DateTimePicker is not the preferred data-entry choice for people who know how to tab through fields. Folks who are good, tab through fields, and enter data. When they come to a date field for which they have no data, they skip past it. They do NOT leave the default date there. And the DateTimePicker requires some date to be present. Not to mention the temptation to stop the tab-type-tab workflow by making you pick from a popup calendar.

The .Net/Winforms Masked Textbox Sucks for Date Entry

That’s right. You heard me. Once upon a time, way back in MS Access, there existed a decent masking approach for entering date values into a textbox. MS seems to have tossed this aside, and delivered the lame control we have at our disposal in the .Net Framework. There are probably reasons for this, but I don’t know what they are. If you have tried using the MaskedTextbox control in a .net application for the purpose of masking date entry, you know what I mean. If you haven’t, go try it out. Then come back here, and see if my solution might be of help.

What I Needed for a Project at Work

I have been stuck working with a rather dull database application at work, and what I needed was a means to perform date entry with the following requirements:

  • Null Values are allowed and desirable in the Database backend.
  • There will be many places where date-entry is performed, across many forms (it is a date-heavy application related to property management), so the date entry control must be easy to toss onto a form and build around, without a bunch of bs validation and string parsing every time.
  • Null values are allowed, but other invalid entries are not.
  • All date entry will be performed using the USA-centric mm/dd/yyyy format.
  • The time component is irrelevant, or will be handled as a separate entry using a different control
  • Given entry of a valid date string, a .net DateTime object should be retrieved from the control.
  • Only dates between 1900 and 2099 will be recognized as valid.

What I wanted, for this project, and for general use in whatever other context pops up, was a means to allow the typing in of a date into a text box, validation of the result as a valid date, and the ability for the client code to simply retrieve a nullable datetime object.

For the purposes of my project, I have achieved these requirements. The control has flaws to this point, in terms of general use (limiting the acceptable centuries comes immediately to mind), but it is a starting point.

My Solution: The MaskedDateTextbox Control

I set out to replicate the user-facing aspects of the venerable VBA Masking approach found in the MS Access Textbox, and join it with the .net type system such that a text string date representation could be validated, and then returned to the client code in a useful form, even if null was present.

Inheriting From MaskedTextbox

I began by inheriting from the crusty .net MaskedTextBox control. First order of business was to define the mask we would be using for date entry. For my purposes, I needed to get this done fast, and the project I am working on will only ever require dates in the US-style mm/dd/yyyy format, so I opted to basically fix this as the only mask available.

But what about globalization?

As you can see, for this work-specific implementation, I thwart attempts to change to a different mask, because the code to this point depends upon a final output format of mm/dd/yyyy Anything else will require more work. That said, it would be a small issue to adapt the code to utilize a different date format. I didn’t have time to build in the kind of flexibility which would allow the mask, and the required validations and text manipulations, to be variable. But adapting the code to recognize and work with some other format should be a small problem.

In any case, the following code defines a class, MaskedDateTextBox, which inherits from the .net MaskedTextBox. As you can see, I have defined a few private members, a couple event handlers, a constructor overload, and overridden the OnMaskChanged method. Most importantly, I have set the mask and the prompt character to the standard format, using private constants. The mask format 00/00/0000 requires integers where the zero placeholders are, and ignores non-integer entries.

The Beginnings of the MaskedDateTextBox Class:
public class MaskedDateTextBox : MaskedTextBox
{
    // Default setting is to require a valid date string before allowing 
    // the user to navigate away from the control:
    public bool _RequireValidEntry = true;
 
    // The default mask is traditional, USA-centric mm/dd/yyyy format. 
    private const string DEFAULT_MASK = "00/00/0000";
    private const char DEFAULT_PROMPT = '_';
 
    // A flag is set when control initialization is complete. This 
    // will be used to determine if the Mask property of the control
    // (inherited from the Base class) can be changed. 
    private bool _Initialized = false;
 
 
    public MaskedDateTextBox() : this(true) { }
 
 
    public MaskedDateTextBox(bool RequireValidEntry = true) : base()
    {
 
        // This is the only mask that will work in the current implementation:
        this.Mask = DEFAULT_MASK;
        this.PromptChar = DEFAULT_PROMPT;
 
        // Handle Events:
        this.Enter +=new EventHandler(MaskedDateTextBox_SelectAllOnEnter);
        this.PreviewKeyDown +=new PreviewKeyDownEventHandler(MaskedDateBox_PreviewKeyDown);
 
        // prevent further changes to the mask:
        _Initialized = true;
    }
 
 
    protected override void OnMaskChanged(EventArgs e)
    {
        if (_Initialized)
        {
            throw new NotImplementedException("The Mask is not chageable in this control");
        }
    }
}

 

Note the boolean member _IsInitialized. This is set to true immediately after the mask is set in the constructor. From this point forward, attempts by client code to change the mask will fail, and throw a not implemented exception when the OnMaskChanged method is called by the base.

Also note the optional Constructor parameter RequireValidEntry, which is used to set the local member _RequireValidEntry. This matters a little further along. As it is, the constructor parameter defaults to true, even when the default constructor is used. However, there are cases in which one might prefer to handle invalid date entry from the client code, and this parameter (and the member it sets) come into play at that point. More on this in a minute.

Straightening Out The Date

The core of this control, and the reason I needed to build it, are evidenced in the following logic, which examines user input, and attempts to get it into the proper mm/dd/yyyy format. The trick here is that some people may enter 1/1/2012, others may enter 01/01/2012, and still others may try to use 1-1-12. In my mind, all of these should resolve to the same date.

Add this code to the MaskedDateTextBox class:

Correcting Date Text Entry to Match the Standard Format:
void CorrectDateText(MaskedTextBox dateTextBox)
{
    // Replace any odd date separators with the mm/dd/yyyy Standard:
    Regex rgx = new Regex(@"(\\|-|\.)");
    string FormattedDate = rgx.Replace(dateTextBox.Text, @"/");
 
    // Separate the date components as delimited by standard mm/dd/yyyy formatting:
    string[] dateComponents = FormattedDate.Split('/');
    string month = dateComponents[0].Trim(); ;
    string day = dateComponents[1].Trim();
    string year = dateComponents[2].Trim();
 
    // We require a two-digit month. If there is only one digit, add a leading zero:
    if (month.Length == 1)
        month = "0" + month;
 
    // We require a two-digit day. If there is only one digit, add a leading zero:
    if (day.Length == 1)
        day = "0" + day;
 
    // We require a four-digit year. If there are only two digits, add 
    // two digits denoting the current century as leading numerals:
    if (year.Length == 2)
        year = "20" + year;
 
    // Put the date back together again with proper delimiters, and 
    dateTextBox.Text = month + "/" + day + "/" + year;
}

Note that we pass this method a reference to a MaskedTextBox. I could have accessed the properties of the containing MaskedTextBox class directly, but it seemed cleaner this way. Also, I may decide to extract this method out into its own class (DateStringFormatter?).

OK. So, we will use the previous method from a number of locations. First off, directly, and the user is entering text. We want to force the user’s input into the proper format as they type (for example, if they separate their date parts with dashes instead of slashes). For this, we will handle the PreviewKeyDown Event. Remember, in our constructor, we added a handler for the PreviewKeyDown Event? Now we’re going to handle that event:

Handle User Input as it Happens with PreviewKeyDown Event:
protected virtual void MaskedDateBox_PreviewKeyDown(object sender, 
                                        PreviewKeyDownEventArgs e)
{
    MaskedTextBox txt = (MaskedTextBox)sender;
 
    // Check for common date delimiting characters. When encountered, 
    // adjust the text entry for proper date formatting:
    if (e.KeyCode == Keys.Divide
        || e.KeyCode == Keys.Oem5
        || e.KeyCode == Keys.OemQuestion
        || e.KeyCode == Keys.OemPeriod
        || e.KeyValue == 190
        || e.KeyValue == 110)
 
        // If any of the above key values are encountered, apply a formatting 
        // check to the text entered so far, and make adjustments as needed. 
        this.CorrectDateText(txt);
}

In the above, we test for the various keys which might indicate the wrong sorts of date delimiter inputs. If any of these undesirable characters are found, we make a quick call to our CorrectDateText method, and straighten things out on the fly, so to speak.

Validate the User Input

Next, we want to perform a check when the user navigates away from the control, to be sure that what they have entered is, in fact, a valid date, as well as to perform any additional re-formatting required. We need three methods to do this. The OnLeave method, which overrides the same method on the base class, uses the boolean function IsValidDate to see if the string represented in the control is a valid date. If so, the overridden method calls the OnLeave method on the base and allows the user to navigate away from the control. If the date is not valid, then the OnInvalidDateEntry method is executed, which raises the InvalidDateEntered event, and depending upon the state of _RequireValidEntry, returns the user to the control to correct the issue.

Testing for a Valid Date Entry Before Leaving the Control
bool IsValidDate(MaskedTextBox dateTextBox)
{
    // Remove delimiters from the text contained in the control. 
    string DateContents = dateTextBox.Text.Replace("/", "").Trim();
 
    // if no date was entered, we will be left with an empty string 
    // or whitespace.
    if (!string.IsNullOrEmpty(DateContents) && DateContents != "")
    {
        // Split the original date into components:
        string[] dateSoFar = dateTextBox.Text.Split('/');
        string month = dateSoFar[0].Trim(); ;
        string day = dateSoFar[1].Trim();
        string year = dateSoFar[2].Trim();
 
        // If the component values are of the proper length for mm/dd/yyyy formatting:
        if (month.Length == 2
            && day.Length == 2
            && year.Length == 4
            && (year.StartsWith("19") || year.StartsWith("20")))
        {
            // Check to see if the string resolves to a valid date:
            DateTime d;
            if (!DateTime.TryParse(dateTextBox.Text, out d))
            {
                // The string did NOT resolve to a valid date:
                return false;
            }
            else
                // The string resolved to a valid date:
                return true;
        }
        else
        {
            // The Components are not of the correct size, and automatic adjustment
            // is unsuccessful:
            return false;
 
        } // End if Components are correctly sized
    }
    else
        // The date string is empty or whitespace - no date is a valid return:
        return true;
} 
 
 
protected override void OnLeave(EventArgs e)
{
    // Perform a final adjustment of the text entry to fit the mm/dd/yyyy format:
    this.CorrectDateText(this);
 
    // If the entry is a valid date, fire the leave event. We are done here. 
    if (this.IsValidDate(this))
    {
        base.OnLeave(e);
    }
    else
    {
        this.OnInvalidDateEntry(this, new InvalidDateTextEventArgs(this.Text.Trim()));
 
        // if a valid date entry is not required, the user is free to navigate away
        // from the control:
        if (!_RequireValidEntry)
        {
            base.OnLeave(e);
        }
    }
}
 
 
protected virtual void OnInvalidDateEntry(object sender, InvalidDateTextEventArgs e)
{
    if (_RequireValidEntry)
    {
        // Force the user to address the problem before 
        // navigating away from the control:
        MessageBox.Show(e.Message);
        this.Focus();
        this.MaskedDateTextBox_SelectAllOnEnter(this, new EventArgs());
    }
 
    // Raise the invalid entry event either way. Client code can determine 
    // if and how invalid entry should be dealt with:
    if (InvalidDateEntered != null)
    {
        InvalidDateEntered(this, e);
    }
}

 

Define a Custom Event Handler: InvalidDateTextEventArgs

We hooked up the InvalideDateEntered event in our constructor. However, I defined a custom EventArgs class which is required by the InvalidDateEntered Event. Define the following class in a separate code file:

A Custom EventArgs Class: InvalidDateTextEventArgs
using System;
 
 
namespace MaskedDateEntryControl
{
    public class InvalidDateTextEventArgs : EventArgs
    {
 
        private string _Message = "" 
            + "Text does not resolve to a valid date. "
            + "Enter a date in mm/dd/yyyy format, "
            + "or clear the text to represent an empty date.";
 
        private string _InvalidDateString = "";
 
 
        public InvalidDateTextEventArgs(string InvalidDateString) : base()
        {
            _InvalidDateString = InvalidDateString;
        }
 
 
        public InvalidDateTextEventArgs(string InvalidDateString, string Message) 
            : this(InvalidDateString)
            {
                _Message = Message;
            }
 
 
        public String Message
        {
            get { return _Message; }
            set { _Message = value; }
        }
 
 
        public String InvalidDateString
        {
            get { return _InvalidDateString; }
        }
    }
}

 

Return a Nullable DateTime Object

Ok. Remember one of my requirements was the ability to retrieve a DateTime object (or null) directly from the MaskedDateTextBox control? This next bit of code is where we do that. Add this code to the MaskedDateTextBox control right after the OnInvalidDateEntry method:

A Property Which Returns a Nullable DateTime Object Based on User Input:
public DateTime? DateValue
{
    get
    {
        DateTime d;
        DateTime? Result = null;
        if (DateTime.TryParse(this.Text, out d))
        {
            Result = d;
        }
        return Result;
    }
    set
    {
        string DateString = "";
        if (value.HasValue)
            DateString = value.Value.ToString("MM/dd/yyyy");
        this.Text = DateString;
    }
}

 

Using BeginInvoke to Overcome a Deficiency in the .Net MaskedTextBox Control:

So, the thing which was driving me crazy about this control (and the .net MaskedTextBox from which it derives) was that there did NOT seem to be a way to cause the text in the control to be selected upon entry, such as when an invalid string was entered, and the user is returned to the control to fix it. I wanted the complete text existing in the control to be selected at such time as the user enters the control. Unfortunately, there is apparently a bug in the implementation of the MaskedTextBox control which prevents this from happening using the familiar Textbox.Select() method. As it turns out, there is a workaround,which requires the following code:

Use BeginInvoke to Select the Text contained in the MaskedTextBox Control (and derived controls):
void MaskedDateTextBox_SelectAllOnEnter(object sender, EventArgs e)
{
    MaskedTextBox m = (MaskedTextBox)sender;
 this.BeginInvoke((MethodInvoker)delegate()
    {
        m.SelectAll();
    });
}

 

In the end, the MaskedDateTextBox as described here provided the solution for the moment. Is this a great control? No. It needs work. There are some limitations resulting from the need to get it done NOW which some design improvements would correct. What could be made better?

  • Not restricted to s single date format
  • Not restricted to years beginning with 19 and 20
  • Probably some refactoring could be done with some of the nested conditionals

You can find the source code for this at my Clone the VS2010 project if you want to see the complete code. If you see ways to improve it, please, feel free. If you succeed, hit me up with a pull request - I will happily merge your changes.

 

Posted on September 25 2012 06:14 PM by John Atten     

Comments (0)

Getting Started with Git for the Windows Developer (Part I)

Posted on September 1 2012 08:50 AM by John Atten in C#, CodeProject, Git, Microsoft   ||   Comments (2)

I am a late-comer to version control in general, and, having grown up teaching myself programming in the Windows/Visual Studio/C# realm, It took the growing prominence of to draw my attention to what is currently the most visible Distributed Version Control System (and “social coding” site) in the developer universe.

Once I graduated to using version control for my code, there was no turning back.

Hasn’t All of this been covered somewhere else?

Why, yes it has. In many cases, better than I am about to cover it. The web is chock full of tutorials on using git and Github. Some are better than others, and it is great to get information and opinions from a variety of sources.

I am writing this post as much for myself as my two readers, as a means of increasing my familiarity with the Git/Bash ecosystem (what better way than to explain it to someone else, right?), and so that I will have a reference to my own best thinking on the subject, relevant links, and such.

In this series, plan to walk through the basics of getting started with git in a Windows environment:

Get Git for Windows

Git is a Unix-based application. In order to use Git in Windows, it is necessary to install the Windows port, msysgit.  msysgit is an open source project, freely available for download. As of this writing, the most recent release is version 1.7.11. When you go to the project download page, you will notice that all versions of msysgit are named Git-1.x.xx-preview and are tagged with a little “beta” flag. Don’t worry about this. msysgit is widely used. Below is a link to the downloads page for the “Full Windows Installer” version of the download.

Go get it now at the link below. I’ll wait.

Installing Git for Windows

When you first run the downloaded installer, you may be greeted with a security warning about unknown publisher and such. You can ignore the ominous warning, and click the “Run” button. Click next to move through the “Welcome to the Git Setup Wizard” splash/greeting window, and again to accept the license terms. The next window is the installation components window:

Installation Components

Git Setup-Install-Components

The default values here should be fine, but make sure that the “Windows Explorer Integration” item is checked, and that “Git Bash Here” and “Git GUI Here” are selected. Now click Next again:

Command Line Environment

Git Setup-PATH-Environment

Of the three options available here, which you choose will depend upon your comfort level with the windows command line. Personally, I prefer to use Git Bash only. I figure, it can only help me as a developer to become fluent with the world of the *nix Bash command line, and the hybrid environment(s) created by the other two options seem more like a potential source of irritation than anything else.

Line Ending Conversion

Git Setup-Line-Ending-Conversion

Unix/Linux systems use a different convention for line endings than Windows. I recommend the default option, which should be “Checkout Windows-style, commit Unix-style line endings.” This option offers the greatest flexibility if you will be sharing your code with others.

Click next, and the installation will start. This should only take a minute or so (or less). Great! You now have Git installed on your windows machine. What next?

Configuring Git on your Windows Machine

Now that Git is installed on your Windows machine, you need to do some basic configuration. Git maintains several configuration files:

  • The highest order configuration file contains system-level configuration values for all users, and all repositories on the system. This file is usually created in a directory relative to the msysgit installation directory.
  • The global user-level configuration file, which is specific to the individual user, and usually resides in the user’s home folder (Usually, but not always, C:\Documents and Settings\<UserName>). This is where your Username and email address values are generally set up, and will maintain default configuration settings for your repo’s.
  • Each repository also contains a configuration file, specific to that repository, which is located in the .git directory of the specific repository or folder.

Note that configuration settings in each more specific level override those in the level above. For example, you can override the global user name and email address values (global here, again, means user-level) within a specific repository by setting them in the repository-level configuration file.

We’ll look more closely at this momentarily.

We are concerned right now with what we call the global configuration, where we provide values for your default user configuration. Open the Git Bash command window though your start menu. You should find it at Start Menu/All Programs/Git/Git Bash.

The Git Bash Command Line Interface

Bash-Command-Window

If you are new to command-line interfaces (I was, and still am), overcome your fears. We’re not going to be doing anything scary, and if you are exploring version control, your are likely a developer of some level, or learning to be. We are not to live in fear of the command line anymore! Follow along for now. I will take an introductory look at command line usage in another post. For the moment, note the following:

  • You will notice that the window opens with the text "xivSolutions@XIVMAIN ~” then the next line contains a “$” symbol. The xivSolutions@XIVMAIN is my log-in name on the local system and the name of the local computer: LocalUserName@LocalSystemName
  • The “$” symbol you see is the default command prompt. Text you type here represents a command which will be executed when you press the enter key.
  • Typing into the command prompt can be finicky. Bash is case-sensitive, and every character counts. Placement of spaces count. While you are following along, read the commands carefully, and type exactly into the command line what you are reading here (or from the images of my window).
  • Once you hit the enter key, Bash will either:
    • Execute an the command and present you with a new command prompt, or
    • If the command was one which is supposed to return data or feedback, the requested information will be displayed, followed by a new command prompt.
  • Commands tend (with some variation) to adhere to the format CommandName –Option1 –Option2  . . . –OptionN InputValue

The first thing we want to do is set our Global Username and Global User Email. Git uses these two pieces of information to identify us with, and associate us with each commit to the repository. When we want to access the user’s global config file, we supply the –global option with your command.

Again: Bash, like Linux, is case sensitive. Pay careful attention to typing your commands. While you are unlikely to hurt anything by mistyping, the command will fail to execute if you don’t use the proper case. Additionally, the spacing of commands and options is important.

Set the Global User.Name Property

Type the following into the command window, being careful to use the correct case, and note the single space which precedes the double dash in front of the global option.  (the “$” symbol is the command prompt, and should already be visible at the beginning of the line):

git config --global user.name “Your Username”

Git Configuration: The User.Name Value:

Bash-Type-UserName

When you hit “Enter",” if you have typed correctly, you should be rewarded with . . . a new command prompt.

Git Configuration: After Entering the User.Name Value:

Bash-Type-UserName-After-Enter

It is a principle of Linux programming that a function or command which executed properly does so silently, unless there is a compelling reason to do otherwise (like when the command is supposed to return data, or report progress). However, we can check to see what happened by typing:

git config –global user.name

Note that this is essentially the command again, but without any input.

Git Config: Confirming User.Name Value – Before Enter:

Bash-Check-UserName-Before-Enter

Then hit the Enter key:

Git Config: Confirming User.Name Value – After Enter:

Bash-Check-UserName-After-Enter

The line immediately following our command represents the return value, in this case, my user name as set in the previous command. Then Bash presents us with a new command line.

Set the Global User.Email Property

Next we will set the global user email. Type the following into the fresh command prompt:

git config --global user.email “yourEmailAddress”

Git Config - Set User.Email Value:

Bash-Type-User-Email

Then hit enter:

Git Config - Set User.Email Value – After Enter Key:

Bash-Type-UserEmail-After-Enter

You can check the value of this setting the same as before: retype the command, sans any input value (In an attempt at brevity, I show the entry and the result in one step this time):

Git Config - Check User.Email Value – After Enter Key:

Bash-Check-UserEmail-After-Enter

Congratulations! You have now installed and configured Git on your Windows machine. In the next post, We will look at using git to get some things done.

Summary

To this point, we have:

  • Downloaded and installed Git for Windows:
  • Walked through the basic installation  of git and default set up for your Windows Machine
  • Performed the most basic initial configuration of git on your local machine so that you can actually start using git to do meaningful things.

Wow. That seems like a long post just to install an application and do the minimum initial configuration for use. It seems strange to leave this post at this point, because we really haven’t looked at using git for anything useful. But I am trying to break this up into meaningful pieces, of easily digestible length. Next up, we will take a short excursion into using the Bash command line and look at a list of basic, frequently-used git commands.

Related Posts:

 

 

Posted on September 1 2012 08:50 AM by John Atten     

Comments (2)

About the author

My name is John Atten, and my username on many of my online accounts is xivSolutions. I am Fascinated by all things technology and software development. I work mostly with C#, Java, SQL Server 2012, learning ASP.NET MVC, html 5/CSS/Javascript. I am always looking for new information, and value your feedback (especially where I got something wrong!).

Web Hosting by