I have found two primary libraries for programmatically manipulating PDF files; PdfBox and iText. These are both Java libraries, but I needed something I could use with C Sharp. Well, as it turns out there is an implementation of each of these libraries for .NET, each with its own strengths and weaknesses:
Some Navigation Links:
PdfBox - .Net version
The .NET implementation of PdfBox is not a direct port - rather, it uses IKVM to run the Java version inter-operably with .NET. IKVM features an actual .net implementation of a Java Virtual Machine, and a .net implementation of Java Class Libraries along with tools which enable Java and .Net interoperability.
PdfBox’s dependency on IKVK incurs a lot of baggage in performance terms. When the IKVM libraries load, and (I am assuming) the “’Virtual’ Java Virtual Machine” spins up, things slow way down until the load is complete. On the other hand, for some of the more common things one might want to do with a PDF programmatically, the API is (relatively) straightforward, and well documented.
When you run a project which uses PdfBox, you WILL notice a lag the first time PdfBox and IKVM are loaded. After that, things seem to perform sufficiently, at least for what I needed to do.
Side Note: iTextSharp
iTextSharp is a direct port of the Java library to .Net.
iTextSharp looks to be the more robust library in terms of fine-grained control, and is extensively documented in a book by one of the authors of the library, iText in Action (Second Edition)
. However, the learning curve was a little steeper for iText, and I needed to get a project out the door. I will examine iTextSharp in another post, because it looks really cool, and supposedly does not suffer the performance limitations of PdfBox.
Getting started with PdfBox
Before you can use PdfBox, you need to either build the project from source, or download the ready-to-use binaries. I just downloaded the binaries for version 1.2.1 from this helpful gentleman’s site, which, since they depend on IKVM, also includes the IKVM binaries. However, there are detailed instruction for building from source on the PdfBox site. Personally, I would start with the downloaded binaries to see if PdfBox is what you want to use first.
Important to note here: apparently, the PdfBox binaries are dependent upon the exact dependent DLL’s used to build them. See the notes on the PdfBox .Net Version page.
Once you have built or downloaded the binaries, you will need to set references to PdfBox and ALL the included IKVM binaries in your Visual Studio Project. Create a new Visual Studio project named “PdfBoxExamples” and add references to ALL the PdfBox and IKVM binaries. There are a LOT. Deal with it. Your project references folder will look like the picture to the right when you are done.
The PdfBox API is quite dense, but there is a handy reference at the Apache Pdfbox site. The PDF file format is complex, to say the least, so when you first take a gander at the available classes and methods presented by the PDF box API, it can be difficult to know where to begin. Also, there is the small issue that what you are looking at is a Java API, so some of the naming conventions are a little different. Also, the PdfBox API often returns what appear to be Java classes. This comes back to that .Net implementation of the Java Class libraries I mentioned earlier.
Things to Do with PdfBox
It seems like there are three common things I often want to do with PDF files: Extract text into a string or text file, split the document into one or more parts, or merge pages or documents together. To get started with using PdfBox we will look at extracting text first, since the set up for this is pretty straightforward, and there isn’t any real Java/.Net weirdness here.
To do this, we will call upon two PdfBox namespaces (“Packages” in Java, loosely), and two Classes:
The namespace org.apache.pdfbox.pdmodel gives us access to the PDDocument class and the namespace org.apache.pdfbox.util gives us the PDFTextStripper class.
In your new PdfBoxExamples project, add a new class, name it “PdfTextExtractor," and add the following code:
The PdfTextExtractor Class
using System;
using org.apache.pdfbox.pdmodel;
using org.apache.pdfbox.util;
namespace PdfBoxExamples
{
public class pdfTextExtractor
{
public static String PDFText(String PDFFilePath)
{
PDDocument doc = PDDocument.load(PDFFilePath);
PDFTextStripper stripper = new PDFTextStripper();
return stripper.getText(doc);
}
}
}
As you can see, we use the PDDocument class (from the org.apache.pdfbox.pdmodel namespace) and initialize is using the static .load method defined as a class member on PDDocument. As long as we pass it a valid file path, the .load method will return an instance of PDDocument, ready for us to work with.
Once we have the PDDocument instance, we need an instance of the PDFTextStripper class, from the namespace org.apache.pdfbox.util. We pass our instance of PDDocument in as a parameter, and get back a string representing the text contained in the original PDF file.
Be prepared. PDF documents can employ some strange layouts, especially when there are tables and/or form fields involved. The text you get back will tend not to retain the formatting from the document, and in some cases can be bizarre.
However, the ability to strip text in this manner can be very useful, For example, I recently needed to download an individual PDF file for each county in the state of Missouri, and strip some tabular data our of each one. I hacked together an iterator/downloader to pull down the files, and the, using a modified version of the text stripping tool illustrated above and some rather painful Regex, I was able to get what I needed.
At the simplest level, suppose you had a PDF file and you wanted to split it into individual pages. We can use the Splitter Class, again from the org.apache.pdf.util namespace. Add another class to you project, named PDFFileSplitter, and copy the following code into the editor:
The PdfFileSplitter Class
using org.apache.pdfbox.pdmodel;
using org.apache.pdfbox.util;
namespace PdfBoxExamples
{
public class PDFFileSplitter
{
public static java.util.List SplitPDFFile(string SourcePath,
int splitPageQty = 1)
{
var doc = PDDocument.load(SourcePath);
var splitter = new Splitter();
splitter.setSplitAtPage(splitPageQty);
return (java.util.List)splitter.split(doc);
}
}
}
Notice anything strange in the code above? That’s right. We have declared a static method with a return type of java.util.List. WHAT? This is where working with PdfBox and more importantly, IKVM becomes weird/cool. Cool, because I am using a direct Java class implementation in Visual Studio, in my C# code. Weird, because my method returns a bizarre type (from a C# perspective, anyway) that I was unsure what to do with.
I would probably add to the above class so that the splitter persisted the split documents to disk, or change the return type of my method to object[], and use the .ToArray() method, like so:
The PdfFileSplitter Class (improved?)
public static object[] SplitPDFFile(string SourcePath,
int splitPageQty = 1)
{
var doc = PDDocument.load(SourcePath);
var splitter = new Splitter();
splitter.setSplitAtPage(splitPageQty);
return (object[])splitter.split(doc).toArray();
}
In any case, the code in either example loads up the specified PDF file into a PDDocument instance, which is then passed to the org.apache.pdfbox.Splitter, along with an int parameter. The output in the example above is a Java ArrayList containing a single page from your original document in each element. Your original document is not altered by this process, by the way.
The int parameter is telling the Splitter how many pages should be in each split section. In other words, if you start with a six-page PDF file, the output will be three two-page files. If you started with a 5-page file, the output would be two two-page files and one single-page file. You get the idea.
Something slightly more useful might be a method which accepts an array of integers as a parameter, with each integer representing a page number within a group to be extracted into a new, composite document. For example, say I needed pages 1, 6, and 7 from a 44 page PDF pulled out and merged into a new document (in reality, I needed to do this for pages 1, 6, and 7 for each of about 200 individual documents). We might add a method to our PdfFileSplitter Class as follows:
The ExtractToSingleFile Method
public static void ExtractToSingleFile(int[] PageNumbers,
string sourceFilePath, string outputFilePath)
{
var originalDocument = PDDocument.load(sourceFilePath);
var originalCatalog = originalDocument.getDocumentCatalog();
java.util.List sourceDocumentPages = originalCatalog.getAllPages();
var newDocument = new PDDocument();
foreach (var pageNumber in PageNumbers)
{
// Page numbers are 1-based, but PDPages are contained in a zero-based array:
int pageIndex = pageNumber - 1;
newDocument.addPage((PDPage)sourceDocumentPages.get(pageIndex));
}
newDocument.save(outputFilePath);
}
Below is a simple example to illustrate how we might call this method from a client:
Calling the ExtractToSingleFile Method:
public void ExtractAndMergePages()
{
string sourcePath = @"C:\SomeDirectory\YourFile.pdf";
string outputPath = @"C:\SomeDirectory\YourNewFile.pdf";
int[] pageNumbers = { 1, 6, 7 };
PDFFileSplitter.ExtractToSingleFile(pageNumbers, sourcePath, outputPath);
}
Limit Class Dependency on PdfBox
It is always good to limit dependencies within a project. In this case, especially, I would want to keep those odd Java class references constrained to the highest degree possible. In other words, where possible, I would attempt to either return standard .net types from my classes which consume the PdfBox API, or otherwise complete execution so that client code calling upon this class doesn’t need to be aware of IKVM, or funky C#/Java hybrid types.
Or, I would build out my own “PdfUtilities” library project, within which objects are free to depend upon and intermix this Java hybrid. However, I would make sure public methods defined within the library itself accepted and returned only standard C# types.
In fact, that is precisely what I am doing, and I’ll look at that in a following post.
John on Google CodeProject
This thing is long. Here are some navigation links to topic headers:
Most of what I know (or at least, think I know) was learned in the .net/C# environment. However, I believe it is nearly axiomatic that exceptions are to be used for exceptional circumstances, and (mostly) should not be used as a business logic construct. In the Oracle article, and in my example above, the InvalidAccountException, StopPaymentException and the InsufficientFundsException would appear to flagrantly violate this principle. Within limits, the only valid contingency exception here is AccountNotAvailableException, which is really just an obfuscation of SQLException.
Can we re-write these classes to do away with some of the possibly extraneous exception handling?
First off, three of our four contingencies look a whole lot like validation problems. Depending on our project architecture, we could do away with InvalidAccountException, StopPaymentException and InsufficientFundsException and check for these contingencies from our client code before calling the getCheckingAccount method defined on the Bank class, and prior to calling the processCheck method defined on the CheckingAccount Class. First, we can add a couple of boolean methods on CheckingAccount to tell us if there is a stop payment order for a check submitted, and whether there are sufficient funds in the account to process the check:
The re-worked CheckingAccount Class:
public class CheckingAccount
{
private String _accountID;
private double _currentBalance;
private ArrayList<Integer> _stoppedCheckNumbers;
public String getAccountID()
{
return _accountID;
}
public double getCurrentBalance()
{
return _currentBalance;
}
public void setAccountID(String accountID)
{
_accountID = accountID;
}
public void setCurrentBalance(double currentBalance)
{
_currentBalance = currentBalance;
}
public ArrayList<Integer> getStoppedCheckNumbers()
{
if(_stoppedCheckNumbers == null)
{
_stoppedCheckNumbers = new ArrayList<Integer>();
}
return _stoppedCheckNumbers;
}
public boolean checkAmountApproved(double amount)
{
double testBalance = _currentBalance - amount;
if(testBalance > 0)
{
return true;
}
else
{
return false;
}
}
public boolean checkPaymentStopped(int checkNo)
{
if(_stoppedCheckNumbers.contains(checkNo))
{
return true;
}
else
{
return false;
}
}
public double processCheck(Check submitted)
throws DatabaseAccessException
{
double newBalance = _currentBalance - submitted.getAmount();
try
{
// <Code to Update Database to reflect current transaction>
}
catch(Exception e)
{
// <Code to log SQLException details>
/*
* After logging and/or otherwise handling the SQL failure, throw
* an exception more appropriate to the context of the client code,
* which does not care about the details of the data access operation,
* only that the information could not be retreived.
*/
throw new DatabaseAccessException("Database Error");
}
return newBalance;
}
}
Note in the above that we have done away with all but our single contingency, DatabaseAccessException, which in reality, is our API response to a fault. Theoretically, the only exception client code is required to handle now is the DatabaseAccessException. Now, what happens to our Bank class?
The re-Worked Bank Class:
public class Bank
{
public static CheckingAccount getCheckingAccount(String AccountID)
throws DatabaseAccessException
{
CheckingAccount account = new CheckingAccount();
try
{
/*
* <Code to retrieve Account data from data store>
*/
// Use test data to initialize an account instance:
account.setAccountID("0001 1234 5678");
account.setCurrentBalance(500.25);
account.getStoppedCheckNumbers().add(1000);
}
catch(Exception e)
{
// <Code to log SQLException details>
/*
* After logging and/or otherwise handling the SQL failure, throw
* an exception more appropriate to the context of the client code,
* which does not care about the details of the data access operation,
* only that the information could not be retrieved.
*/
throw new DatabaseAccessException("Database Error");
}
return account;
}
public static boolean checkAccountExists(String AccountID)
{
boolean exists = false;
try
{
/*
* <Code to check accountID exists in data store>
*/
if(//a valid row is returned for AccountID)
{
exists = true;
}
}
catch(SQLException e)
{
// <Code to log SQLException details>
/*
* After logging and/or otherwise handling the SQL failure, throw
* an exception more appropriate to the context of the client code,
* which does not care about the details of the data access operation,
* only that the information could not be retrieved.
*/
throw new DatabaseAccessException("Database Error");
}
return exists;
}
}
Here, we have eliminated the InvalidAccountException, and added a boolean method to check whether a valid account exists for a given account number. As before, since the original static method getCheckingAccount is still performing data access, we need to retain our DatabaseAccessException.
Last, what does our client code look like now? We have done away with using Exceptions in a business logic context, what was the impact?
The re-worked Mock Client Code:
public class MockClientCode {
/*
* Assume this code is supporting UI operations.
*/
static String SYSTEM_ERROR_MSG_UI = ""
+ "The requested account is unavailable due to a system error. "
+ "Please try again later.";
static String INVALID_ACCOUNT_MSG_UI = ""
+ "The account number provided is invalid. Please try again.";
static String INSUFFICIENT_FUNDS_MSG_UI = ""
+ "There are insufficient funds in the account to process this check.";
static String STOP_PAYMENT_MSG_UI = ""
+ "There is a stop payment order on the check submitted.
+ " The transaction cannot be processed";
public static void main(String args[])
{
// Sample Data:
String accountID = "0001 1234 5678";
int checkNo = 1000;
double checkAmount = 100.00;
// Use test data to initialize a test check instance:
Check customerCheck = new Check(accountID, checkNo, checkAmount);
CheckingAccount customerAccount = null;
double newBalance;
if(Bank.checkAccountExists(customerCheck.getAccountID()))
{
try
{
customerAccount = Bank.getCheckingAccount(customerCheck.getAccountID());
if(!customerAccount.checkPaymentStopped(customerCheck.getCheckNo()))
{
if(customerAccount.checkAmountApproved(customerCheck.getAmount()))
{
newBalance = customerAccount.processCheck(customerCheck);
// Output transaction result to UI:
System.out.printf(""
+ "The transaction has been processed. New Balance is: "
+ DecimalFormat.getCurrencyInstance().format(newBalance));
}
else // there were insufficient funds
{
// Output the message to the user interface:
System.out.println(INSUFFICIENT_FUNDS_MSG_UI);
}
}
else // payment was stopped on this check no.
{
// TODO Auto-generated catch block
System.out.println(STOP_PAYMENT_MSG_UI);
}
}
catch (DatabaseAccessException e)
{
// Output the message to the user interface:
System.out.println(SYSTEM_ERROR_MSG_UI);
}
}
else // No valid account
{
// Output the message to the user interface:
System.out.println(INVALID_ACCOUNT_MSG_UI);
}
}
Wow. Look at that ugly nested conditional. Well, we have absolved our client code of having to handle a bunch of exceptions related to our business logic, and it is possible that improvements to the class structure, and a little refactoring could make significant improvements. In the end, though, it seems like a bit of a trade off.
In concept, I liked the idea that a method could declare, as part of its signature, that it throws a specific type of exception. What I do NOT like about it is that, you HAVE to. In many, cases, this mechanism can become quite annoying. A shining example, excerpted from the article link above, as follows:
“To programmers, it seemed like most of the common methods in Java library classes declared checked exceptions for every possible failure. For example, the java.io package relies heavily on the checked exception IOException. At least 63 Java library packages issue this exception, either directly or through one of its dozens of subclasses.”
“An I/O failure is a serious but extremely rare event. On top of that, there is usually nothing your code can do to recover from one. Java programmers found themselves forced to provide for IOException and similar unrecoverable events that could possibly occur in a simple Java library method call. Catching these exceptions added clutter to what should be simple code because there was very little that could be done in a catch block to help the situation. Not catching them was probably worse since the compiler required that you add them to the list of exceptions your method throws. This exposes implementation details that good object-oriented design would naturally want to hide.
Obviously, it seems that it is possible to go a little too far with this scenario.
On the other hand, wouldn’t it be a handy optional language feature to be able to add that throws clause if, in your infinite designer wisdom, it seemed the superior design choice?
I grew up, so to speak, using C#, which does not have anything like the Java check-or-specify policy. It is up to the developer to properly anticipate, test for, and handle exceptions. Or to throw them programmatically, as the design may require. But in any case, client code accessing an API is blissfully unaware that a method might throw a particular type of exception until either the designed thinks of it, or it occurs in use (er, I mean, “testing”).
I propose that a useful (but at present, non-existent) feature for an existing language would be to put that check-or-specify policy into the hands of the developer, and allow the developer to invoke it by adding the throws clause to a method signature. My hypothetical mechanism would look like this:
- If a method defined on a class throws one or more exceptions, a compiler warning will evidence itself (at least, in the land of IDE’s such as Eclipse or Visual Studio) as opposed to the compiler error we get in Eclipse with Java. However, if there is not a throws clause included in the method signature, the warning is all you would get. You could still consume the method without handling or propagating the exception from below.
- If the developer or designer adds a throws clause to the method signature specifying one or more particular exception types, then client code is required to recognize and address those exceptions, similar to the existing Java Check-or-Specify policy.
The above mechanism would place the control in the hands of the designer, and allow selective enforcement of such a policy, while still providing helpful compiler warnings when, by design, such enforcement is relaxed. After all, if one team invests the time and mental energy to think trough the exception possibilities in their code, why not spare the consumer of that code the headache of doing it all over again? This would leave responsibility upon the developer of the client code to address, (or not) such exceptions as he/she sees fit.
- Exceptions in Java are often misused (as they are in C# and other languages).
- Checked Exceptions, and the Check-or-Specify Policy, can add complexity to a design, and potentially create the need for a large numbers of additional types within a project.
- There is a solid theoretical philosophy behind Checked Exceptions in Java that has not been well-implemented in actual use. Understanding the intent behind this language/environment feature can help in design decisions.
- Thinking of Exceptions in terms of “Faults” and “Contingencies” is a helpful way to guide Exception Handling design in accordance with the previous point.
- Thinking of Exceptions in terms of “Faults” and “Contingencies” encourages the use of Exceptions to enforce business rules (Not sure how I feel about this).
- As with most things in programming, effective use of Exceptions is more often than not a case of making appropriate tradeoffs and design decisions. Careful analysis of the problem domain, exception context, and attention to client code context may demand overriding theory and dogma.
- More study and analysis is needed on my part.
In researching this post, and through the discussion on the r/java sub-reddit, I have developed the following thoughts. In examining an API or class structure, and designing an effective Exception mechanism, ask yourself:
- Who (which component of your code) cares? Where is the specific exception most effectively handled in your design? Can you deal with the specific exception appropriately at the point it is thrown, and propagate (or otherwise notify) a different exception up the call stack which does not contain implementation details of the source class? Can you maintain encapsulation?
- There is no law which states you can’t log the initial exception, and then throw another which is more appropriate to the context of the calling code.
- Is the exception a result of user input? If so, this may be a sign that it is more properly handled with a validation mechanism and/or improved implementation of business rules.
- The previous point is not always true. It could be a tradeoff, where the exception results from user input, but deep within a series of calls. In this case it might be more practically handled with an exception, even though this breaks form, and uses an exception in place of business rules and validation.
- The Java Checked Exception and Check or Specify policy is a real mixed bag. Philosophically, I like it. However, employing it effectively, and the way I was intended, requires careful design - more, I think, than many put into exception handling.
- It is easy to use an exception when you don’t know what else to do (meaning, you need to re-examine your design, or go back to the documentation. I am 100% guilty of this at times).
- Don’t use exceptions to “Pass the buck"” and make what should have been YOUR problem into the nest guy’s problem. As a consumer of your API, he will have even less context do deal with YOUR exception than you do.
This has been a difficult post to write. We tend to think of exceptions and exception handling as something we understand well. I for one tend to think about it when I need it, and not much else. Further, I am learning much of this as I go, meaning I am self-taught, and there is always the danger that I will think I have something figured out, only to learn (often in quite humbling, “what the hell am I doing writing about this” kinds of ways) I had it all wrong, or missed something which should have been obvious.
In trying to formulate a coherent representation of my understanding here, I have developed a greater appreciation for the difficult design choices we are faced with. Trying to wrap my head around the deeper intentions of the Java Exception mechanism, and the semi-polarized disagreement about its usefulness among Java devs, has been an eye-opening experience. The education win for me extends beyond Java, and into my toolbox.
John on Google CodeProject
This post is a re-examination of some topics I discuss in an older post. This one got long, so I broke into two. I am also going to provide some navigation links:
The Enthusiasm of Discovery
Almost exactly a year ago, I had completed an initial exploration of the Java language, and written a post here in this space about a feature I found attractive in the exception handling mechanism defined by the language. Specifically, I found checked exceptions and the “check or specify” policy associated with them of interest.
That post is here, enthusiastically titled Things to love about Java: Exception Handling. Ok, perhaps I got a little carried away with the title. Also, in my enthusiasm, I had not yet learned enough to authoritatively comment on the subject. Lastly, I confused the entire Checked Exception paradigm with the small piece of it which I found of interest. More on that in a moment.
In the interest of getting some feedback on my observations, I posted a link here in /r/java of Reddit. Apparently, in my naiveté, I had touched on a sore point in the Java community. The very first comment on the post was a good-natured “Well I see a religious war starting here fairly soon.”
“Well I see a religious war starting here fairly soon.”
-Reddit commentor
Soon after this first provocative (but humorous and good-natured) comment, there flowed a wealth of fascinating discussion, as experienced Java devs weighed in, both pro and con on the subject of checked exceptions, and provided a plethora of useful context and information. In considering my responses to some of these, I realized that I had missed my mark in my original article (or, the discussion helped me find it, anyway!).
One very, very helpful tidbit was a link to this article on the Oracle website describing what some of the original thinking was around the checked exception mechanism. This article establishes on paper, at least, an attractive design philosophy, and example cases for the use of checked exceptions.
Cases Made Against the Existing Java Checked Exception Mechanism
Among the discussion points in the Reddit string, the following stand out to me as strong arguments, not necessarily against the Checked Exception architecture itself, but more against the manner it which it has often come to be used in the field:
- Many Java libraries and frameworks (especially/including core Java API’s) don’t use checked exceptions in accordance with the original design philosophy.
- Checked Exceptions are often used in cases which represent programmer error, rather than predictable events which are at times unavoidable, such as network timeouts, or a storage device is damaged or not available.
- Implementation of an existing interface can be problematic if the implementation code requires a checked exception be thrown, and the method definition on the interface does not throw the correct exception.
- Using Checked Exceptions is often equivalent to “passing the buck” to the consumer of your Method, class, or API.
- The reasoning and actual implementation decisions regarding checked and unchecked exceptions are inconsistent, and no clear rules are available.
Cases Made in Favor of the Checked Exception Mechanism
The following points were made in support of the Checked Exception notion. Philosophically, I agree with them all. It sounds like, though, that points 2 and 3 above may come into play more often than they should in actual practice and negate some of these philosophically sound points:
- When methods declare their exceptions, it forces the designer of an API to carefully consider what can go wrong and throw the appropriate exceptions.
- Exceptions, declared as part of a method signature, forces the consumer of an API to anticipate and handle what may go wrong.
- A method signature represents a contract between the caller and the implementer. It defines the arguments required, the return type, and in the case of Java Checked Exceptions, makes the error cases visible. Requiring a consumer to address anticipatable exceptions theoretically should result in a more robust API (points 2 and three from the previous section notwithstanding).
Clearly, there are some good points on either side of this argument, and like in politics, I find myself stuck on the fence. We will examine my thoughts on this in a bit, after we walk through my current understanding of how things were supposed to be.
On Faults and Contingencies - The Way it was Supposed to Be
According to the article referenced in the link above, a central notion to effective usage of the Java exception architecture is the differentiation between Faults and Contingencies:
Contingency An expected condition demanding an alternative response from a method that can be expressed in terms of the method's intended purpose. The caller of the method expects these kinds of conditions and has a strategy for coping with them.Fault
An unplanned condition that prevents a method from achieving its intended purpose that cannot be described without reference to the method's internal implementation.
The article provides a simple example case in which an API defines a processCheck() method. processCheck will either process a check as requested by the client code, or throw one of two exceptions related to the problem domain: StopPaymentException or InsufficientFundsException. These are presented as examples of Contingencies for which any client calling upon this method should be prepared, and therefore, as exemplary models for Checked Exception usage.
The article additionally discusses a third possibility, in which database access, as part of the transaction processing performed by processCheck, utilizes the JDBC API. JDBC throws a single checked exception, SQLException, to report problems with accessing the data store. Therefore, our processCheck method is required to handle SQLException, or pass any such occurrence up the call stack by including SQLException in its throws clause. In this last case, client code is unlikely to have sufficient context to deal appropriately with whatever caused the SQLException (nor, for that matter, is the processCheck method itself) other than gracefully exiting and informing the calling procedure that something went wrong while accessing the database. This last case is an example of a fault.
To my way of thinking, Contingencies usually exist within the problem domain, and in fact client code calling upon an API is more likely to contain an effective strategy for dealing with them than is the API itself. Faults, on the other hand, represent unexpected conditions which, when our equipment and program is working as designed, should not occur at all. Note that I include “program working as designed” in that sentence. Programmer error and code bugs, for me, fall into this category.
A Overly Simple, Hacked Example
So, thinking I have absorbed the thinking put forth in the article, I construct a hasty example structure which extends the example in the article into a bit of pseudo-code. Note, I am not representing this to be good code, and it represents a hack design at best, greatly over-simplified. My objective is to illustrate the exception usage concepts under discussion. First is the CheckingAccountClass:
A silly mockup of the CheckingAccount Class:
public class CheckingAccount
{
private String _accountID;
private double _currentBalance;
private ArrayList<Integer> _stoppedCheckNumbers;
public String getAccountID()
{
return _accountID;
}
public double getCurrentBalance()
{
return _currentBalance;
}
public void setAccountID(String accountID)
{
_accountID = accountID;
}
public void setCurrentBalance(double currentBalance)
{
_currentBalance = currentBalance;
}
public ArrayList<Integer> getStoppedCheckNumbers()
{
if(_stoppedCheckNumbers == null)
{
_stoppedCheckNumbers = new ArrayList<Integer>();
}
return _stoppedCheckNumbers;
}
public double processCheck(Check submitted)
throws InsufficientFundsException, StopPaymentException,
DatabaseAccessException
{
if(_stoppedCheckNumbers.contains(submitted.getCheckNo()))
{
throw new StopPaymentException();
}
double newBalance = _currentBalance - submitted.getAmount();
if(newBalance < 0)
{
throw new InsufficientFundsException(_currentBalance,
submitted.getAmount(),
newBalance);
}
try
{
// <Code to Update Database to reflect current transaction>
}
catch(SQLException e)
{
// <Code to log SQLException details>
/*
* After logging and/or otherwise handling the SQL failure, throw
* an exception more appropriate to the context of the client code,
* which does not care about the details of the data access operation,
* only that the information could not be retrieved.
*/
throw new DatabaseAccessException("Database Error");
}
return newBalance;
}
}
In order to examine this in context, we will also want to look at a usage scenario with some mock client code. In order to do THAT, we also need a Bank class, which:
- Provides a static factory method to access CheckingAccount objects, and;
- Is also subject to the nefarious SQLException while doing so.
- Introduces yet another contingency - what if the account number submitted on a check does not exist? For this eventuality, we define a fourth Contingency Exception: InvalidAccountException.
The Bank Class (with pseudo-code):
public class Bank
{
public static CheckingAccount getCheckingAccount(String AccountID)
throws DatabaseAccessException, InvalidAccountException
{
CheckingAccount account = new CheckingAccount();
try
{
/*
* <Code to retrieve Account data from data store>
*/
if(//no row is returned for AccountID)
{
throw new InvalidAccountException();
}
// Use test data to initialize an account instance:
account.setAccountID("0001 1234 5678");
account.setCurrentBalance(500.25);
account.getStoppedCheckNumbers().add(1000);
}
catch(SQLException e)
{
// <Code to log SQLException details>
/*
* After logging and/or otherwise handling the SQL failure, throw
* an exception more appropriate to the context of the client code,
* which does not care about the details of the data access operation,
* only that the information could not be retrieved.
*/
throw new DatabaseAccessException("Database Error");
}
return account;
}
}
The Bank Class above provides the functionality needed to mock up some client code. I am not going to get all fancy with this, and the overall class structure is NOT what I am here to examine. We will pretend that the void main method used here actually represents some code in the service of a user interface, and see what our Checked Exception-heavy design looks like from the consumption standpoint:
Some Mock Client Code Consuming the Bank and CheckingAccount API’s:
public class MockClientCode
{
/*
* Assume this code is supporting UI operations.
*/
static String SYSTEM_ERROR_MSG_UI = ""
+ "The requested account is unavailable due to a system error. "
+ "Please try again later.";
static String INVALID_ACCOUNT_MSG_UI = ""
+ "The account number provided is invalid. Please try again.";
static String INSUFFICIENT_FUNDS_MSG_UI = ""
+ "There are insufficient funds in the account to process this check.";
static String STOP_PAYMENT_MSG_UI = ""
+ "There is a stop payment order on the check submitted.
+ "The transaction cannot be processed";
public static void main(String args[])
{
// Sample Data:
String accountID = "0001 1234 5678";
int checkNo = 1000;
double checkAmount = 100.00;
// Use test data to initialize a test check instance:
Check customerCheck = new Check(accountID, checkNo, checkAmount);
CheckingAccount customerAccount = null;
double newBalance;
try
{
customerAccount = Bank.getCheckingAccount(customerCheck.getAccountID());
newBalance = customerAccount.processCheck(customerCheck);
// Output transaction result to UI:
System.out.printf("The transaction has been processed. New Balance is: "
+ DecimalFormat.getCurrencyInstance().format(newBalance));
}
catch (DatabaseAccessException e)
{
// Output the message to the user interface:
System.out.println(SYSTEM_ERROR_MSG_UI);
}
catch (InvalidAccountException e)
{
// Output the message to the user interface:
System.out.println(INVALID_ACCOUNT_MSG_UI);
}
catch (InsufficientFundsException e)
{
// Output the message to the user interface:
System.out.println(INSUFFICIENT_FUNDS_MSG_UI);
}
catch (StopPaymentException e)
{
// TODO Auto-generated catch block
System.out.println(STOP_PAYMENT_MSG_UI);
}
}
}
As With Most Things, a Series of Trade-offs
The “design” above (I am using the term very loosely here) works in accordance with the Java Checked Exception mechanism, and specifically addresses most of the concerns discussed in the Oracle article. I included my own embellishment at the point where the SQLException is potentially thrown locally, by way of logging the data-access specifics for examination by the dev team and/or the dba, and throwing a more general contingency-type exception defined for the problem space (DatabaseAccessException) which can then be handled by client code in a proper context (“Sorry, we seem to be experiencing a system outage. Please try again later”).
Upsides:
On the upside, this code makes robust use of the Check-or-Specify policy built in to the Java environment. Code which attempts to call the public processCheck method will be required to handle all three contingency cases:
- The account number submitted on the check is not valid within the system
- There is a Stop Payment order on the check number being processed
- There are not sufficient funds available to cover the withdrawal
- There is a problem accessing account data (for whatever reason)
From an API design standpoint, this could be considered a good thing. Developers who may be utilizing this class as part of a library or framework will know immediately what contingencies they will have to address within their own code.
With respect to the SQLException (a fault, as opposed to a domain contingency), there is very little that can be done about this even at the point where it is first thrown, other than log the details and notify the calling method that there was a problem retrieving the requested data. In my mind, the farther this specific exception is allowed to propagate from its source, the there is even less ability to do anything with the information it contains. So in my example, I use the try . . . catch block to deal with it as best we can, and then propagate a more appropriate contingency exception.
Also, while the wealth of business-logic-related exceptions is a bit strange looking from my C# background, the code in the mocked up client class is actually pretty easy to read, and figure out what is going on.
Downsides:
On the other hand, what should be a (relatively) simple class structure actually introduces a total of FIVE new types into our project:
- The CheckingAccount class itself
- The Bank Class
- Four new exception types:
- InvalidAccountException
- StopPaymentException
- InsufficientFundsException
- DatabaseAccessException
Client code attempting to use our CheckingAccount class must be aware of all four types, which increases the number of dependencies within a client class. While not serious, as a project grows larger, the inflated number of new types created as contingency exceptions may grow large, and the number of dependencies may grow proportionately. All in all, this could substantially increase project complexity. As one Reddit commenter pointed out, one of the problems with implementations of the Checked Exception mechanism in Java is that it results in what is essentially a “shadow type system.” Hard not to disagree, in light of the example Oracle provides us with.
On top of this, the excessive number of catch clauses is almost as bad as introducing a big switch statement.
Also, within the Oracle article, and within this class definition, what we are essentially doing through the use of “contingency exceptions” is using the exception mechanism to address business logic concerns.
John on Google CodeProject
RHMABHT9JGEH
In my quest to learn and become a better developer, I undertook an excursion into Java-land. Partly this was driven by the fact that Java is the language used for Android app development, and partly because it was time to branch out and explore my first non-Microsoft-driven development platform. Until sometime in 2010, I had only worked within various Microsoft languages, mostly VB and C# .NET and VBA/VB6.
As with any significant change, there was some initial frustration. However, as I became accustomed to the Java way of doing things, I discovered a few implementation gems within the language which I really liked. Chief among these was the exception handling model.
Java identifies two categories of exception: the Checked Exception, which well designed code should anticipate and handle, and Unchecked Exceptions, which arise from errors external to the system, or within the runtime execution of the program, and which are difficult to anticipate and/or handle in any practical sense.
Java REQUIRES that any method which might potentially encounter or throw a Checked Exception adhere to a Check or Specify policy. What this means is that any method which throws such an exception must specify such as part of the method signature, and that client code consuming the method must either handle the exception, or again specify that it will throw the same exception.
A Trivial Example for Comparison Part I - The C# Way:
By way of illustrating the difference between exception handling in C# and that of Java, we will create a simple library class called RentalAgreement (We are really just focusing on exception handling here, and this is a REALLY trivial example, so I don't wanna hear about problems with the business logic, or clunkiness of the examples!) A rental agreement has a start date and an end date (and some additional information, but for the purpose of brevity, we will leave our class at that for the moment).
C# Example 1 – Basic C# Code:
public class RentalAgreement
{
private DateTime _startDate;
private DateTime _endDate;
publicRentalAgreement(DateTime StartDate, DateTime EndDate)
{
_startDate = StartDate;
_endDate = EndDate;
}
|
The constructor for this class accepts two arguments, a start date and and end date. Since our business model dictates that the end date must occur AFTER the start date, we might want to set up some exception handling to ensure a valid range between the start and end dates. Our code can then propagate an exception to any client code if such an event occurs. We'll modify our class slightly, adding a local function to compare two dates for precedence, and a if/else throw block in the constructor:
C# Example 2 – Improved C# Code:
public class RentalAgreement
{
private DateTime _startDate;
private DateTime _endDate;
public RentalAgreement(DateTime StartDate, DateTime EndDate)
{
//Use local function NoPrecedence to compare the start and end dates:
if (this.NoPrecedence(StartDate, EndDate))
{
_startDate = StartDate;
_endDate = EndDate;
}
else
{
// If the end date occurs before the start date, let client
// code know about it:
throw (new Exception("The end date cannot occur before the start date"));
}
}
private bool NoPrecedence(DateTime StartDate, DateTime EndDate)
{
if(EndDate < StartDate)
{
return true;
}
return false;
}
}
|
The exception thrown in the constructor will propagate up the call stack to the client code, which can then implement some well-thought-out handling. Or not. It could be that our erstwhile developer might have overlooked the need to validate user input, or otherwise missed the potential exception case. In any case, the following test code mimics what might happen if a user were to enter a start date of 1/1/2011, and an end date of 12/31/2010:
C# Example 3 – Bad, BAD Client Code:
private void button1_Click(object sender, EventArgs e)
{
DateTime startDate = new DateTime(2011, 1, 1);
DateTime endDate = new DateTime(2010, 12, 31);
RentalAgreement rentalAgreement = new RentalAgreement(startDate, endDate);
MessageBox.Show("Start Date = "
+ startDate.ToShortDateString()
+ " : End Date = " + endDate.ToShortDateString());
}
|
However it happened, our hapless user, on entering the above incorrect date combination, would be faced with THIS ugliness:
![Exception-Message-to-User-C-Sharp-Ex[2] Exception-Message-to-User-C-Sharp-Ex[2]](http://www.typecastexception.com/image.axd?picture=Exception-Message-to-User-C-Sharp-Ex%5B2%5D_thumb.png)
Of course, all of this might be averted if our developer implements some exception handling in his client code:
C# Example 4 – Much Better Client Code (kind of):
private void button1_Click(object sender, EventArgs e)
{
DateTime startDate = new DateTime(2011, 1, 1);
DateTime endDate = new DateTime(2010, 12, 31);
try
{
RentalAgreement rentalAgreement = new RentalAgreement(startDate, endDate);
MessageBox.Show("Start Date = "
+ startDate.ToShortDateString()
+ " : End Date = " + endDate.ToShortDateString());
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
// Now do some stuff to reset the GUI so that the user can see exactly
// where they screwed up . . .
}
}
|
A Trivial Example for Comparison Part II – The Java Way
The Java version of our class differs only slightly from the C# code. If we add the throw statement to our code in the else block before we add the throws declaration to the method signature, the compiler (I am using Eclipse) warns us that our method presents an unhandled exception, and will not compile. This is the Check or Specify policy informing us that we either need to handle the exception condition within the current method, or specify in the method signature that there is potential for the exception to occur, and that client code must provide the handling mechanism. Once we add the throws declaration in the method signature, everything is fine again:
Java Example #1 – with throws keyword
public class RentalAgreement
{
private Calendar startDate;
private Calendar endDate;
// Note the throws clause of the method signature:
public RentalAgreement(Calendar StartDate, Calendar EndDate) throws Exception
{
if (this.NoPrecedence(StartDate, EndDate))
{
startDate = StartDate;
endDate = EndDate;
}
else
{
// Because this method throws a checked exception, we are REQUIRED
// to either handle the exception condition or specify in the method
// signature that the exception might be thrown.
throw (new Exception(""));
}
}
private boolean NoPrecedence(Calendar StartDate, Calendar EndDate)
{
if(EndDate.getTimeInMillis() < StartDate.getTimeInMillis())
{
return true;
}
return false;
}
}
|
Now, we have defined a library class containing a method which throws a checked exception. Next, lets create another silly piece of code which consumes the class, mimicking some faulty user input:
Java Example #2 – Consuming the Method
public static void main(String[] args)
{
//Mimic some user input:
Calendar startDate = Calendar.getInstance();
startDate.set(Calendar.YEAR, 2011);
startDate.set(Calendar.MONTH, Calendar.JANUARY);
startDate.set(Calendar.DAY_OF_MONTH, 1);
Calendar endDate = Calendar.getInstance();
endDate.set(Calendar.YEAR, 2010);
endDate.set(Calendar.MONTH, Calendar.DECEMBER);
endDate.set(Calendar.DAY_OF_MONTH, 31);
//Attempt to create an instance of the RentalAgreement class:
RentalAgreement newRentalAgreement = new RentalAgreement(startDate, endDate);
}
|
The compiler flags our code at the point where we attempt to create an instance of the Rental Agreement class, and in fact will not compile as written. Why? Because the constructor of the RentalAgreement class posits that it might throw an exception, and we have not provided a handling or propagation mechanism for this. Our client code is REQUIRED by Java to either Check the exception (most often with a try . . .catch block) or Specify, again, that the exception may be raised by the current method, and declared as part of the method signature. Since the current code represents the application entry point, we will need to provide some graceful handling (with a try . . . catch block) of the exception before our application will even compile:
Java Example #3 – Client Code with Exception Handling:
public static void main(String[] args)
{
//Mimic some user input:
Calendar startDate = Calendar.getInstance();
startDate.set(Calendar.YEAR, 2011);
startDate.set(Calendar.MONTH, Calendar.JANUARY);
startDate.set(Calendar.DAY_OF_MONTH, 1);
Calendar endDate = Calendar.getInstance();
endDate.set(Calendar.YEAR, 2010);
endDate.set(Calendar.MONTH, Calendar.DECEMBER);
endDate.set(Calendar.DAY_OF_MONTH, 31);
//Attempt to create an instance of the RentalAgreement class:
try
{
RentalAgreement newRentalAgreement = new RentalAgreement(startDate, endDate);
SimpleDateFormat formatter = new SimpleDateFormat("MM/dd/yyyy");
System.out.print("State date = " +
formatter.format(startDate.getTime()) +
" : End Date = " +
formatter.format(endDate.getTime()));
}
catch (Exception ex)
{
// Inform the user about the error of their ways:
System.out.print(ex.getMessage());
}
}
|
The code in Java Example #3 compiles and runs properly.
Note that not ALL exceptions receive this special treatment within Java. Unchecked Exceptions which derive from the java.lang.RuntimeException or java.lang.Error do NOT require adherence to the Check or Specify policy. In fact, because the requirement to build the Check or Specify mechanism into code is often viewed as a pain the ass, some Java developers tend to write code which throws RuntimeExceptions where in fact a checked exception is warranted, or derive their own Exception classes from RuntimeException in order to avoid writing a bunch of handling code and/or adding the throws clause to method signatures.
In my humble opinion, these folks are depriving themselves (and more importantly, consumers of their code) of one of the more useful benefits of the Java language architecture. Yes, it IS a pain in the ass to follow up and Check/Specify all those Checked Exceptions. But this requires us to construct better code, in which many of the exceptional cases which should either be pinned down with proper handling, or eliminated through design improvements and structural code changes. This ALSO provides an informative mechanism for developers who may use our libraries in their own applications, through which they will know straight away what type of exception to expect when calling one of our methods.
I am a strong fan of C# and .NET in general. But one area where the designers of the Java language got things right was in requiring such handling of exceptions, and the Check/Specify policy.