|By Derek Ashmore||
|April 1, 2000 12:00 AM EST||
As a consultant, developer and database administrator, I've often been asked to provide coding guidelines and tuning assistance for Java code that utilizes JDBC. Over time, I've been introduced to or developed standard coding practices that make JDBC code faster and less error-prone, and easier to read, understand and use. This article documents some of the more important "best practices" for using JDBC libraries to perform database access. As most of my clients are using Oracle database technologies, I've included several practices that are Oracle-specific.
For the purposes of this article the goals of best practices for JDBC programming are maintainability, portability and performance.
- Maintainability refers to the ease with which developers can understand, debug and modify JDBC code that they didn't write.
- Portability refers to the ease with which JDBC code can be used with multiple databases. It turns out that JDBC doesn't make database programming as platform independent as I'd like. In addition, I consider portability a noble goal even if you have no current plans to support multiple databases. Who knows how long your code will be around and what kinds of changes will have to be made to it?
- Performance refers to optimizing the speed and/or memory needed to run JDBC code.
Best Practices for JDBC Programming
The most common recommendations I make to Java programmers using JDBC are the following (discussed individually later):
- Use host variables for literals - avoid hard-coding them (Oracle specific).
- Always close statements, prepared statements and connections.
- Consolidate formation of SQL statement strings.
- Use the delegate model for database connection.
- Use Date, Time and Timestamp objects as host variables for temporal fields (avoid using strings).
- Limit use of column functions.
- Always specify a column list with an select statement (avoid "select *").
- Always specify a column list with an insert statement.
I recommend that developers use host variables in SQL statements instead of hard-coding literals in SQL strings. As a convenience, many developers embed literals in SQL statements instead. I've provided an example of embedding literals in the following code. While the performance benefits of using host variables greatly improve Oracle performance, it won't hurt performance for other database platforms that I'm aware of. Note that this example places a user ID directly in the SQL statement. (As an aside, note that this example uses the "+" operator for string concatenation. While this is convenient, using StringBuffers and the StringBuffer.append() method is a faster way to concatenate strings.)
stmt = dbconnection.createStatement();
rst = stmt.executeQuery("select count(*) from portfolio_info where
USER_ID = " + userID);
count = rst.getInt(1);
To get the benefit of Oracle's optimizations, we need to use PreparedStatements instead of statements for SQL that will be executed multiple times. Furthermore, we need to use host variables instead of literals for literals that will change between executions. In the code above the SQL statement for User id 1 will be different than for User Id 2 ("where USER_ID = 1" is different from "where USER_ID = 2"). A better way to approach this SQL statement is the following:
pstmt = dbconnection.prepareStatement("select count(*) from portfolio_info where USER_ID = ? "); pstmt.setDouble(1,userID);
rst = pstmt.executeQuery();
count = rst.getInt(1);
In this code, because we're using host variables instead of literals, the SQL statement is identical no matter what the qualifying user ID is. Furthermore, we used a PreparedStatement instead of a statement. So that we can better understand the source of the performance benefit, let's walk through how SQL statements are processed by the Oracle optimizer. When SQL statements are executed, Oracle will execute (roughly speaking) the following steps:
- Look up the statement in the shared pool to see if it has already been parsed or interpreted. If yes, Oracle will go directly to step 4.
- Parse (or interpret) the statement.
- Figure out how it will get the data you want; record that information in a portion of memory called the shared pool.
- Get your data.
When an Oracle user looks up a SQL statement to see if it's already been executed (step 1), he or she attempts a character-by-character match of the SQL statement. If the user finds a match, he or she can use the parse information already in the shared pool and doesn't have to do steps 2 and 3 above because the work has already been done. If you hard-code literals in your SQL statements, the probability of finding a match is very low ("where USER_ID = 1" isn't the same as "where USER_ID = 2"). This means that Oracle will have to reparse the second code example for each portfolio selected. Had the code used host variables, that statement (which would look something like "where USER_ID = :1" in the shared pool) would have been parsed once and only once.
I've experienced anywhere from a 5% to a 25% performance increase by writing SQL statements that are reusable (results vary with transaction volume, number of users, network latency and many other things). More information on this can be found in the Oracle Tuning manual. Within this manual look at the "Writing Identical SQL Statements" subheading within the "Tuning the Shared Pool" section.
While this best practice is Oracle-specific, many database platforms optimize preparing and reusing similar SQL statements. Most database platforms do this by optimizing reuse of PreparedStatement objects. Some databases, such as Cloudscape, optionally will store prepared statements in the database so they can be reused and shared by many users. Following this practice won't hurt performance with any database platform I'm aware of.
Always Close Statements, Prepared Statements and Connections
Many databases allocate resources to servicing statements, prepared statements and connections. Many database platforms continue to allocate those resources for a period of time if these objects aren't closed after use. With Oracle databases it's possible to get a "max cursors exceeded" error message when you don't close statements or prepared statements. In addition, with Oracle databases, the connections stay around on the server. This practice improves time and resources spent on maintenance to keep errors from happening.
An example can be found in Listing 1. Note that I use a "finally" block to close the PreparedStatement. I don't close the connection in the example method as it is used elsewhere in the application. Note also that I call a utility to close the PreparedStatement for me. The code for this utility can be found in Listing 2. I use a utility to do the close so I don't have to replicate the exception-catching code everywhere.
Consolidate Formation of SQL Statement Strings
As a database administrator, a substantial portion of my time is spent reading the code of others and suggesting ways to improve performance. As you might expect, looking at the SQL statements being issued is of particular interest to me. It's hard to follow SQL statements that are constructed by string manipulation scattered over several methods. Developers who maintain this kind of code must have the same problem. It greatly enhances readability if you consolidate the logic that forms the SQL statement in one place.
Listing 2 is a good example of this point. The string manipulation to form the SQL statement is located in one place, and the SQL statement logic is in a separate static block instead of within the method itself. This is done to reduce the number of times this string concatenation happens. Also note that StringBuffers are used for the string manipulation, not Strings. StringBuffers are more efficient at string concatenation than Strings are. In a project I recently completed the development team adopted this convention of consolidating SQL statements in static blocks directly above the method in which they were used. We found this practice quite readable and maintainable.
Use Delegate Model for Database Connection
I recently had the task of making the same application runnable on Oracle 8i, Cloudscape and Oracle Lite with as few modifications to existing code as possible. The development team wanted to avoid making JDBC-related classes platform-aware. In addition, the team wanted to take advantage of some platform-specific features, such as array processing and write batching in Oracle 8i, in special cases.
I was able to port the application to multiple environments largely through manipulation of one class responsible for managing our database connection. We had the foresight to create a delegate class for the java.sql.connection that manages needed connection functions and allows us to take advantage of platform-specific performance-tuning enhancements. All of our code used the delegate, not a native JDBC connection, as illustrated in Figure 2. While the specific class used for the project is proprietary, I've created another delegate, dvt.util.db.Connection, that illustrates the concept for the purposes of this article. The source for this delegate can be found in Listing 3.
Note that dvt.util.db.Connection determines that the database platform is being used. If the platform is Oracle 8i, I establish array processing by setting the default row prefetch size (available with Oracle database connections) to improve the performance of our "select" statements. I also establish write batching to improve performance of update, insert and delete statements.
Since I consolidate the platform-specific code in my connection object delegate, classes that use my connection delegate don't need to be platform specific. In case they do, however, developers can use getPlatform() to get information about the database platform being used. Furthermore, I can add support for additional database platforms (e.g., Cloudscape and Sybase) largely by changing this class. The connection delegate won't solve all portability issues, but it will solve a good percentage of them.
I recommend using a connection delegate even for projects that current supporting only one database platform. As we saw from recent Y2K efforts, you may find that your code is used for longer than you think, and used in other applications down the road.
Use Date, Time and Timestamp Objects as Host Variables for Temporal Fields
(Avoid Using Strings)
For convenience, I've seen many developers use strings as host variables to represent dates, times and timestamps. I think they consider Java.sql.Date, Time and Timestamp awkward. I agree with from a coding perspective. Unfortunately, using strings as host variables for temporal fields can affect data access performance.
The following code snippet contains a SQL statement meant for an Oracle platform that uses a string variable to represent a DATE field. Without an understanding of how the database optimizers work, this appears to be an acceptable coding technique. For the small inconvenience of using a "to_char" function in the SQL statement, we avoid the Java work of converting a java.sql.Date or Timestamp into a more easily displayable data type elsewhere in the code.
Where to_char(sale_dt,'YYYY-MM-DD') >= ?
Unfortunately, Oracle and most database optimizers can't use an index to speed up performance of the query in this snippet. Developers will have to read all rows of the order_sales table and convert the sale_dt of all rows to a string before they can do the comparison to see which rows satisfy the where clause of the query.
If we rewrite the query in the snippet to use a java.sql.Timestamp hostvariable, Oracle (and most of the common database platforms) will use an index and significantly improve performance in most cases, as follows:
Where sale_dt >= ?
For applications that use Oracle exclusively, I recommend using java.sql.Timestamp exclusively. Oracle's DATE data type actually contains time information (hours, minutes, seconds) as well as date information. Most other database platforms would call this type of field a TIMESTAMP. Oracle has no direct counterpart for a DATE (which has year, month and day only) and TIME data type offered by other platforms.
Limit Use of Column Functions
I generally recommend that developers limit use of column functions to the select lists of select statements. Moreover, I tend to stick to aggregate functions (e.g., count, sum, average) needed for select statements that use a "group by" clause. I make this recommendation for two reasons: performance and portability. Limiting function use to select lists (and keeping it out of where clauses) means that the use of a function won't block the use of an index. In the same way that the use of the "to_char" function prohibited the database from using an index in the earlier code snippet, column functions in where clauses likely prohibit the database from using an index.
In addition, many of the operations for which developers use SQL column functions (data type conversion, value formatting, etc.) are faster in Java than if the database did them. I've had between a 5% and a 20% performance improvement in many applications by opting to avoid some column functions and implementing the logic in Java instead. Another way to look at it is that column functions aren't tunable as we don't control the source code. Implementing that logic in Java makes it code that we can tune if need be.
Moreover, using non-ANSIstandard column functions can also cause portability problems. There are large differences in which column functions are implemented by the database vendors. For instance, one of my favorite Oracle column functions, "decode", which allows you to translate one set of values into another, isn't implemented in many of the other major database platforms. In general, column function use such as the use of "decode" has the potential to become a portability issue.
Always Specify a Column List with a Select Statement (Avoid "Select *")
A common shortcut for developers is to use the "*" in select statements to avoid having to type out a column list. The line below illustrates this shortcut while the snippet immediate following illustrates the alternative where desired columns are explicitly listed.
Select * from customer
Select last_nm, first_nm, address, city, state, customer_nbr from customer
Select last_nm, first_nm, address, city, state, customer_nbr from customer
I recommend that developers explicitly list columns in select statements as illustrated above. The reason is that if the columns in any of the tables in the select are reordered or new columns are added, the results obtained with the select-asterisk shortcut will change and the class will have to be modified. For example, suppose a database administrator changes the order of the columns and puts column customer_nbr first (there are valid reasons why a DBA could reorder columns). In addition, suppose the DBA adds a column called country. The developer who used the shortcut select * from customer will have to change code. All the offset references used in processing the Resultset will change. The developer who explicitly listed all columns can be oblivious to the change because the code will still work.
Explicitly listing columns in a select statement is a best practice because it prevents the need for maintenance in some cases.
Always Specify a Column List with an Insert Statement
A common shortcut for developers is to omit the column list in insert statements to avoid having to type out a column list. By default, the column order is the same as physically defined in the table. The first snippet below illustrates this shortcut while the next one illustrates the alternative where desired columns are explicitly listed.
Insert into customer
Insert into customer
Values ('Ashmore','Derek','3023 N. Clark','Chicago','IL', 555555)
(last_nm, first_nm, address, city, state, customer_nbr)
Insert into customer
I recommend that developers explicitly list columns in insert statements as illustrated in the second snippet above. The reason is the same as why we should explicitly list columns in select statements. If the columns in any of the tables in the select are reordered or new columns are added, the insert could generate an exception and insert in class will have to be modified. For example, suppose a DBA, as in the previous example, changes the order of the columns, puts column customer_nbr first and adds a column called country. The developer who used the first shortcut above will have to change code. The developer who explicitly listed all columns may be oblivious to the change because the code may still work. In addition, note that the version in second snippet above uses host variables so the same PreparedStatement can be used for all inserts if there are multiple inserts.
Explicitly listing columns in an insert statement is a best practice because it prevents the need for maintenance in many cases.
Recommendations for Stored Procedure Usage
Stored procedure programming languages (such as Oracle's PL/SQL) are handy and in many cases very convenient. I use them often for utility scripts and data-cleansing activities. I'm often asked about recommendations for stored procedure use in applications, but as their capabilities differ greatly among the major database platforms, I can't give platform-independent advice on the subject. I can, however, provide some thoughts on stored procedure use as it relates to portability and performance.
As these languages differ so greatly, their use within applications causes portability issues. For instance, some stored procedure languages allow procedures to return result sets, some do not. Some stored procedure languages allow temporary tables (usable within the current session only), some do not. We could find many more differences, but I think the point is clear. If portability is a concern, I recommend avoiding use of stored procedures except for database triggers.
Performance is a tougher issue because it differs radically between database vendors. Stored procedure use for some database platforms enhances performance; in others it degrades it. For Oracle platforms I advocate stored procedures within Java applications for database triggers only. For most other situations their use provides no benefit. If you want a more detailed discussion on when and how to use stored procedures, functions and packages within Oracle databases, see my article in JDJ December 1999 (Vol. 4, issue 12).
This article has discussed several ways to make JDBC code more performance-, maintenance- and portability-friendly on an individual basis. I always recommend team code reviews and documented coding standards as ways to develop more best practices and consistently apply existing practices. Furthermore, team code reviews help further the goals of best practices by improving the maintainability and general quality of code within an application.
SYS-CON Events announced today that Super Micro Computer, Inc., a global leader in high-performance, high-efficiency server, storage technology and green computing, will exhibit at the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. Supermicro (NASDAQ: SMCI), the leading innovator in high-performance, high-efficiency server technology is a premier provider of advanced server Building Block Solutions® for Data ...
Oct. 13, 2015 09:30 AM EDT Reads: 201
Now, with more hardware! September 21, 2015, Don MacVittie, Sr. Solutions Architect. The “continuous” trend is continuing (get it..?), and we’ll soon reach the peek of the hype cycle, with continuous everything. At the pinnacle of the hype cycle, do not be surprised to see DDOS attacks re-branded as “continuous penetration testing!” and a fee … Read More Continuous Provisioning
Oct. 13, 2015 09:30 AM EDT
As the world moves towards more DevOps and microservices, application deployment to the cloud ought to become a lot simpler. The microservices architecture, which is the basis of many new age distributed systems such as OpenStack, NetFlix and so on, is at the heart of Cloud Foundry - a complete developer-oriented Platform as a Service (PaaS) that is IaaS agnostic and supports vCloud, OpenStack and AWS. In his session at 17th Cloud Expo, Raghavan "Rags" Srinivas, an Architect/Developer Evangeli...
Oct. 13, 2015 09:00 AM EDT Reads: 289
There’s no shortage of guides and blog posts available to provide you with best practices in architecting microservices. While all this information is helpful, what doesn’t seem to be available in such a great number are hands-on guidelines regarding how microservices can be scaled. Following a little research and sifting through lots of theoretical discussion, here is how load-balancing microservices is done in practice by the big players.
Oct. 13, 2015 09:00 AM EDT Reads: 159
As operational failure becomes more acceptable to discuss within the software industry, the necessity for holding constructive, actionable postmortems increases. But most of what we know about postmortems from "pop culture" isn't actually relevant for the software systems we work on and within. In his session at DevOps Summit, J. Paul Reed will look at postmortem pitfalls, techniques, and tools you'll be able to take back to your own environment so they will be able to lay the foundations for h...
Oct. 13, 2015 08:45 AM EDT Reads: 190
Containers are all the rage among developers and web companies, but they also represent two very substantial benefits to larger organizations. First, they have the potential to dramatically accelerate the application lifecycle from software builds and testing to deployment and upgrades. Second they represent the first truly hybrid-approach to consuming infrastructure, allowing organizations to run the same workloads on any cloud, virtual machine or physical server. Together, they represent a ver...
Oct. 13, 2015 08:45 AM EDT Reads: 215
SYS-CON Events announced today the Containers & Microservices Bootcamp, being held November 3-4, 2015, in conjunction with 17th Cloud Expo, @ThingsExpo, and @DevOpsSummit at the Santa Clara Convention Center in Santa Clara, CA. This is your chance to get started with the latest technology in the industry. Combined with real-world scenarios and use cases, the Containers and Microservices Bootcamp, led by Janakiram MSV, a Microsoft Regional Director, will include presentations as well as hands-on...
Oct. 13, 2015 08:15 AM EDT Reads: 164
What Is Emergent About Emergent Architecture? By @TheEbizWizard | @DevOpsSummit #DevOps #BigData #API
All we need to do is have our teams self-organize, and behold! Emergent design and/or architecture springs up out of the nothingness! If only it were that easy, right? I follow in the footsteps of so many people who have long wondered at the meanings of such simple words, as though they were dogma from on high. Emerge? Self-organizing? Profound, to be sure. But what do we really make of this sentence?
Oct. 13, 2015 08:00 AM EDT Reads: 466
Ten years ago, there may have been only a single application that talked directly to the database and spit out HTML; customer service, sales - most of the organizations I work with have been moving toward a design philosophy more like unix, where each application consists of a series of small tools stitched together. In web example above, that likely means a login service combines with webpages that call other services - like enter and update record. That allows the customer service team to writ...
Oct. 13, 2015 05:45 AM EDT Reads: 542
Last month, my partners in crime – Carmen DeArdo from Nationwide, Lee Reid, my colleague from IBM and I wrote a 3-part series of blog posts on DevOps.com. We titled our posts the Simple Math, Calculus and Art of DevOps. I would venture to say these are must-reads for any organization adopting DevOps. We examined all three ascpects – the Cultural, Automation and Process improvement side of DevOps. One of the key underlying themes of the three posts was the need for Cultural change – things like t...
Oct. 13, 2015 05:00 AM EDT Reads: 403
There once was a time when testers operated on their own, in isolation. They’d huddle as a group around the harsh glow of dozens of CRT monitors, clicking through GUIs and recording results. Anxiously, they’d wait for the developers in the other room to fix the bugs they found, yet they’d frequently leave the office disappointed as issues were filed away as non-critical. These teams would rarely interact, save for those scarce moments when a coder would wander in needing to reproduce a particula...
Oct. 13, 2015 05:00 AM EDT Reads: 389
In today's digital world, change is the one constant. Disruptive innovations like cloud, mobility, social media, and the Internet of Things have reshaped the market and set new standards in customer expectations. To remain competitive, businesses must tap the potential of emerging technologies and markets through the rapid release of new products and services. However, the rigid and siloed structures of traditional IT platforms and processes are slowing them down – resulting in lengthy delivery ...
Oct. 13, 2015 05:00 AM EDT Reads: 1,093
It is with great pleasure that I am able to announce that Jesse Proudman, Blue Box CTO, has been appointed to the position of IBM Distinguished Engineer. Jesse is the first employee at Blue Box to receive this honor, and I’m quite confident there will be more to follow given the amazing talent at Blue Box with whom I have had the pleasure to collaborate. I’d like to provide an overview of what it means to become an IBM Distinguished Engineer.
Oct. 13, 2015 04:00 AM EDT Reads: 362
Containers are changing the security landscape for software development and deployment. As with any security solutions, security approaches that work for developers, operations personnel and security professionals is a requirement. In his session at @DevOpsSummit, Kevin Gilpin, CTO and Co-Founder of Conjur, will discuss various security considerations for container-based infrastructure and related DevOps workflows.
Oct. 13, 2015 03:00 AM EDT Reads: 326
Between the compelling mockups and specs produced by analysts, and resulting applications built by developers, there exists a gulf where projects fail, costs spiral, and applications disappoint. Methodologies like Agile attempt to address this with intensified communication, with partial success but many limitations. In his session at DevOps Summit, Charles Kendrick, CTO and Chief Architect at Isomorphic Software, will present a revolutionary model enabled by new technologies. Learn how busine...
Oct. 13, 2015 02:00 AM EDT Reads: 393
IT data is typically silo'd by the various tools in place. Unifying all the log, metric and event data in one analytics platform stops finger pointing and provides the end-to-end correlation. Logs, metrics and custom event data can be joined to tell the holistic story of your software and operations. For example, users can correlate code deploys to system performance to application error codes.
Oct. 13, 2015 02:00 AM EDT Reads: 331
Achim Weiss is Chief Executive Officer and co-founder of ProfitBricks. In 1995, he broke off his studies to co-found the web hosting company "Schlund+Partner." The company "Schlund+Partner" later became the 1&1 web hosting product line. From 1995 to 2008, he was the technical director for several important projects: the largest web hosting platform in the world, the second largest DSL platform, a video on-demand delivery network, the largest eMail backend in Europe, and a universal billing syste...
Oct. 13, 2015 02:00 AM EDT Reads: 353
If you are new to Python, you might be confused about the different versions that are available. Although Python 3 is the latest generation of the language, many programmers still use Python 2.7, the final update to Python 2, which was released in 2010. There is currently no clear-cut answer to the question of which version of Python you should use; the decision depends on what you want to achieve. While Python 3 is clearly the future of the language, some programmers choose to remain with Py...
Oct. 13, 2015 02:00 AM EDT Reads: 326
When I describe Continuous Delivery to people I generally spend a fair amount of time impressing on them that it is not about tools and technicalities. It is not even about the relationship between developers and operations or product owners and testers. Continuous Delivery is about minimizing the gap between having an idea and getting that idea, in the form of working software, into the hands of users and seeing what they make of it. This vital feedback loop is at the core of not just good deve...
Oct. 13, 2015 01:00 AM EDT Reads: 169
Containers have changed the mind of IT in DevOps. They enable developers to work with dev, test, stage and production environments identically. Containers provide the right abstraction for microservices and many cloud platforms have integrated them into deployment pipelines. DevOps and containers together help companies achieve their business goals faster and more effectively. In his session at DevOps Summit, Ruslan Synytsky, CEO and Co-founder of Jelastic, will review the current landscape of...
Oct. 13, 2015 12:15 AM EDT Reads: 248