|By Derek Ashmore||
|April 1, 2000 12:00 AM EST||
As a consultant, developer and database administrator, I've often been asked to provide coding guidelines and tuning assistance for Java code that utilizes JDBC. Over time, I've been introduced to or developed standard coding practices that make JDBC code faster and less error-prone, and easier to read, understand and use. This article documents some of the more important "best practices" for using JDBC libraries to perform database access. As most of my clients are using Oracle database technologies, I've included several practices that are Oracle-specific.
For the purposes of this article the goals of best practices for JDBC programming are maintainability, portability and performance.
- Maintainability refers to the ease with which developers can understand, debug and modify JDBC code that they didn't write.
- Portability refers to the ease with which JDBC code can be used with multiple databases. It turns out that JDBC doesn't make database programming as platform independent as I'd like. In addition, I consider portability a noble goal even if you have no current plans to support multiple databases. Who knows how long your code will be around and what kinds of changes will have to be made to it?
- Performance refers to optimizing the speed and/or memory needed to run JDBC code.
Best Practices for JDBC Programming
The most common recommendations I make to Java programmers using JDBC are the following (discussed individually later):
- Use host variables for literals - avoid hard-coding them (Oracle specific).
- Always close statements, prepared statements and connections.
- Consolidate formation of SQL statement strings.
- Use the delegate model for database connection.
- Use Date, Time and Timestamp objects as host variables for temporal fields (avoid using strings).
- Limit use of column functions.
- Always specify a column list with an select statement (avoid "select *").
- Always specify a column list with an insert statement.
I recommend that developers use host variables in SQL statements instead of hard-coding literals in SQL strings. As a convenience, many developers embed literals in SQL statements instead. I've provided an example of embedding literals in the following code. While the performance benefits of using host variables greatly improve Oracle performance, it won't hurt performance for other database platforms that I'm aware of. Note that this example places a user ID directly in the SQL statement. (As an aside, note that this example uses the "+" operator for string concatenation. While this is convenient, using StringBuffers and the StringBuffer.append() method is a faster way to concatenate strings.)
stmt = dbconnection.createStatement();
rst = stmt.executeQuery("select count(*) from portfolio_info where
USER_ID = " + userID);
count = rst.getInt(1);
To get the benefit of Oracle's optimizations, we need to use PreparedStatements instead of statements for SQL that will be executed multiple times. Furthermore, we need to use host variables instead of literals for literals that will change between executions. In the code above the SQL statement for User id 1 will be different than for User Id 2 ("where USER_ID = 1" is different from "where USER_ID = 2"). A better way to approach this SQL statement is the following:
pstmt = dbconnection.prepareStatement("select count(*) from portfolio_info where USER_ID = ? "); pstmt.setDouble(1,userID);
rst = pstmt.executeQuery();
count = rst.getInt(1);
In this code, because we're using host variables instead of literals, the SQL statement is identical no matter what the qualifying user ID is. Furthermore, we used a PreparedStatement instead of a statement. So that we can better understand the source of the performance benefit, let's walk through how SQL statements are processed by the Oracle optimizer. When SQL statements are executed, Oracle will execute (roughly speaking) the following steps:
- Look up the statement in the shared pool to see if it has already been parsed or interpreted. If yes, Oracle will go directly to step 4.
- Parse (or interpret) the statement.
- Figure out how it will get the data you want; record that information in a portion of memory called the shared pool.
- Get your data.
When an Oracle user looks up a SQL statement to see if it's already been executed (step 1), he or she attempts a character-by-character match of the SQL statement. If the user finds a match, he or she can use the parse information already in the shared pool and doesn't have to do steps 2 and 3 above because the work has already been done. If you hard-code literals in your SQL statements, the probability of finding a match is very low ("where USER_ID = 1" isn't the same as "where USER_ID = 2"). This means that Oracle will have to reparse the second code example for each portfolio selected. Had the code used host variables, that statement (which would look something like "where USER_ID = :1" in the shared pool) would have been parsed once and only once.
I've experienced anywhere from a 5% to a 25% performance increase by writing SQL statements that are reusable (results vary with transaction volume, number of users, network latency and many other things). More information on this can be found in the Oracle Tuning manual. Within this manual look at the "Writing Identical SQL Statements" subheading within the "Tuning the Shared Pool" section.
While this best practice is Oracle-specific, many database platforms optimize preparing and reusing similar SQL statements. Most database platforms do this by optimizing reuse of PreparedStatement objects. Some databases, such as Cloudscape, optionally will store prepared statements in the database so they can be reused and shared by many users. Following this practice won't hurt performance with any database platform I'm aware of.
Always Close Statements, Prepared Statements and Connections
Many databases allocate resources to servicing statements, prepared statements and connections. Many database platforms continue to allocate those resources for a period of time if these objects aren't closed after use. With Oracle databases it's possible to get a "max cursors exceeded" error message when you don't close statements or prepared statements. In addition, with Oracle databases, the connections stay around on the server. This practice improves time and resources spent on maintenance to keep errors from happening.
An example can be found in Listing 1. Note that I use a "finally" block to close the PreparedStatement. I don't close the connection in the example method as it is used elsewhere in the application. Note also that I call a utility to close the PreparedStatement for me. The code for this utility can be found in Listing 2. I use a utility to do the close so I don't have to replicate the exception-catching code everywhere.
Consolidate Formation of SQL Statement Strings
As a database administrator, a substantial portion of my time is spent reading the code of others and suggesting ways to improve performance. As you might expect, looking at the SQL statements being issued is of particular interest to me. It's hard to follow SQL statements that are constructed by string manipulation scattered over several methods. Developers who maintain this kind of code must have the same problem. It greatly enhances readability if you consolidate the logic that forms the SQL statement in one place.
Listing 2 is a good example of this point. The string manipulation to form the SQL statement is located in one place, and the SQL statement logic is in a separate static block instead of within the method itself. This is done to reduce the number of times this string concatenation happens. Also note that StringBuffers are used for the string manipulation, not Strings. StringBuffers are more efficient at string concatenation than Strings are. In a project I recently completed the development team adopted this convention of consolidating SQL statements in static blocks directly above the method in which they were used. We found this practice quite readable and maintainable.
Use Delegate Model for Database Connection
I recently had the task of making the same application runnable on Oracle 8i, Cloudscape and Oracle Lite with as few modifications to existing code as possible. The development team wanted to avoid making JDBC-related classes platform-aware. In addition, the team wanted to take advantage of some platform-specific features, such as array processing and write batching in Oracle 8i, in special cases.
I was able to port the application to multiple environments largely through manipulation of one class responsible for managing our database connection. We had the foresight to create a delegate class for the java.sql.connection that manages needed connection functions and allows us to take advantage of platform-specific performance-tuning enhancements. All of our code used the delegate, not a native JDBC connection, as illustrated in Figure 2. While the specific class used for the project is proprietary, I've created another delegate, dvt.util.db.Connection, that illustrates the concept for the purposes of this article. The source for this delegate can be found in Listing 3.
Note that dvt.util.db.Connection determines that the database platform is being used. If the platform is Oracle 8i, I establish array processing by setting the default row prefetch size (available with Oracle database connections) to improve the performance of our "select" statements. I also establish write batching to improve performance of update, insert and delete statements.
Since I consolidate the platform-specific code in my connection object delegate, classes that use my connection delegate don't need to be platform specific. In case they do, however, developers can use getPlatform() to get information about the database platform being used. Furthermore, I can add support for additional database platforms (e.g., Cloudscape and Sybase) largely by changing this class. The connection delegate won't solve all portability issues, but it will solve a good percentage of them.
I recommend using a connection delegate even for projects that current supporting only one database platform. As we saw from recent Y2K efforts, you may find that your code is used for longer than you think, and used in other applications down the road.
Use Date, Time and Timestamp Objects as Host Variables for Temporal Fields
(Avoid Using Strings)
For convenience, I've seen many developers use strings as host variables to represent dates, times and timestamps. I think they consider Java.sql.Date, Time and Timestamp awkward. I agree with from a coding perspective. Unfortunately, using strings as host variables for temporal fields can affect data access performance.
The following code snippet contains a SQL statement meant for an Oracle platform that uses a string variable to represent a DATE field. Without an understanding of how the database optimizers work, this appears to be an acceptable coding technique. For the small inconvenience of using a "to_char" function in the SQL statement, we avoid the Java work of converting a java.sql.Date or Timestamp into a more easily displayable data type elsewhere in the code.
Where to_char(sale_dt,'YYYY-MM-DD') >= ?
Unfortunately, Oracle and most database optimizers can't use an index to speed up performance of the query in this snippet. Developers will have to read all rows of the order_sales table and convert the sale_dt of all rows to a string before they can do the comparison to see which rows satisfy the where clause of the query.
If we rewrite the query in the snippet to use a java.sql.Timestamp hostvariable, Oracle (and most of the common database platforms) will use an index and significantly improve performance in most cases, as follows:
Where sale_dt >= ?
For applications that use Oracle exclusively, I recommend using java.sql.Timestamp exclusively. Oracle's DATE data type actually contains time information (hours, minutes, seconds) as well as date information. Most other database platforms would call this type of field a TIMESTAMP. Oracle has no direct counterpart for a DATE (which has year, month and day only) and TIME data type offered by other platforms.
Limit Use of Column Functions
I generally recommend that developers limit use of column functions to the select lists of select statements. Moreover, I tend to stick to aggregate functions (e.g., count, sum, average) needed for select statements that use a "group by" clause. I make this recommendation for two reasons: performance and portability. Limiting function use to select lists (and keeping it out of where clauses) means that the use of a function won't block the use of an index. In the same way that the use of the "to_char" function prohibited the database from using an index in the earlier code snippet, column functions in where clauses likely prohibit the database from using an index.
In addition, many of the operations for which developers use SQL column functions (data type conversion, value formatting, etc.) are faster in Java than if the database did them. I've had between a 5% and a 20% performance improvement in many applications by opting to avoid some column functions and implementing the logic in Java instead. Another way to look at it is that column functions aren't tunable as we don't control the source code. Implementing that logic in Java makes it code that we can tune if need be.
Moreover, using non-ANSIstandard column functions can also cause portability problems. There are large differences in which column functions are implemented by the database vendors. For instance, one of my favorite Oracle column functions, "decode", which allows you to translate one set of values into another, isn't implemented in many of the other major database platforms. In general, column function use such as the use of "decode" has the potential to become a portability issue.
Always Specify a Column List with a Select Statement (Avoid "Select *")
A common shortcut for developers is to use the "*" in select statements to avoid having to type out a column list. The line below illustrates this shortcut while the snippet immediate following illustrates the alternative where desired columns are explicitly listed.
Select * from customer
Select last_nm, first_nm, address, city, state, customer_nbr from customer
Select last_nm, first_nm, address, city, state, customer_nbr from customer
I recommend that developers explicitly list columns in select statements as illustrated above. The reason is that if the columns in any of the tables in the select are reordered or new columns are added, the results obtained with the select-asterisk shortcut will change and the class will have to be modified. For example, suppose a database administrator changes the order of the columns and puts column customer_nbr first (there are valid reasons why a DBA could reorder columns). In addition, suppose the DBA adds a column called country. The developer who used the shortcut select * from customer will have to change code. All the offset references used in processing the Resultset will change. The developer who explicitly listed all columns can be oblivious to the change because the code will still work.
Explicitly listing columns in a select statement is a best practice because it prevents the need for maintenance in some cases.
Always Specify a Column List with an Insert Statement
A common shortcut for developers is to omit the column list in insert statements to avoid having to type out a column list. By default, the column order is the same as physically defined in the table. The first snippet below illustrates this shortcut while the next one illustrates the alternative where desired columns are explicitly listed.
Insert into customer
Insert into customer
Values ('Ashmore','Derek','3023 N. Clark','Chicago','IL', 555555)
(last_nm, first_nm, address, city, state, customer_nbr)
Insert into customer
I recommend that developers explicitly list columns in insert statements as illustrated in the second snippet above. The reason is the same as why we should explicitly list columns in select statements. If the columns in any of the tables in the select are reordered or new columns are added, the insert could generate an exception and insert in class will have to be modified. For example, suppose a DBA, as in the previous example, changes the order of the columns, puts column customer_nbr first and adds a column called country. The developer who used the first shortcut above will have to change code. The developer who explicitly listed all columns may be oblivious to the change because the code may still work. In addition, note that the version in second snippet above uses host variables so the same PreparedStatement can be used for all inserts if there are multiple inserts.
Explicitly listing columns in an insert statement is a best practice because it prevents the need for maintenance in many cases.
Recommendations for Stored Procedure Usage
Stored procedure programming languages (such as Oracle's PL/SQL) are handy and in many cases very convenient. I use them often for utility scripts and data-cleansing activities. I'm often asked about recommendations for stored procedure use in applications, but as their capabilities differ greatly among the major database platforms, I can't give platform-independent advice on the subject. I can, however, provide some thoughts on stored procedure use as it relates to portability and performance.
As these languages differ so greatly, their use within applications causes portability issues. For instance, some stored procedure languages allow procedures to return result sets, some do not. Some stored procedure languages allow temporary tables (usable within the current session only), some do not. We could find many more differences, but I think the point is clear. If portability is a concern, I recommend avoiding use of stored procedures except for database triggers.
Performance is a tougher issue because it differs radically between database vendors. Stored procedure use for some database platforms enhances performance; in others it degrades it. For Oracle platforms I advocate stored procedures within Java applications for database triggers only. For most other situations their use provides no benefit. If you want a more detailed discussion on when and how to use stored procedures, functions and packages within Oracle databases, see my article in JDJ December 1999 (Vol. 4, issue 12).
This article has discussed several ways to make JDBC code more performance-, maintenance- and portability-friendly on an individual basis. I always recommend team code reviews and documented coding standards as ways to develop more best practices and consistently apply existing practices. Furthermore, team code reviews help further the goals of best practices by improving the maintainability and general quality of code within an application.
Cloud Expo, Inc. has announced today that Andi Mann returns to DevOps Summit 2015 as Conference Chair. The 4th International DevOps Summit will take place on June 9-11, 2015, at the Javits Center in New York City. "DevOps is set to be one of the most profound disruptions to hit IT in decades," said Andi Mann. "It is a natural extension of cloud computing, and I have seen both firsthand and in independent research the fantastic results DevOps delivers. So I am excited to help the great team at ...
May. 24, 2015 02:00 PM EDT Reads: 1,876
Software is eating the world. Companies that were not previously in the technology space now find themselves competing with Google and Amazon on speed of innovation. As the innovation cycle accelerates, companies must embrace rapid and constant change to both applications and their infrastructure, and find a way to deliver speed and agility of development without sacrificing reliability or efficiency of operations. In her Day 2 Keynote DevOps Summit, Victoria Livschitz, CEO of Qubell, discussed...
May. 24, 2015 02:00 PM EDT Reads: 5,605
How does one bridge the gap between traditional enterprise storage infrastructures and the private, hybrid, and public cloud? In his session at 15th Cloud Expo, Dan Pollack, Chief Architect of Storage Operations at AOL Inc., examed the workload differences and required changes to reuse existing knowledge and components when building and using a cloud infrastructure. He also looked into the operational considerations, tool requirements, and behavioral changes required for private cloud storage s...
May. 24, 2015 02:00 PM EDT Reads: 2,491
There’s a lot of discussion around managing outages in production via the likes of DevOps principles and the corresponding software development lifecycles that does enable higher quality output from development, however, one cannot lay all blame for “bugs” and failures at the feet of those responsible for coding and development. As developers incorporate features and benefits of these paradigm shift, there is a learning curve and a point of not-knowing-what-is-not-known. Sometimes, the only way ...
May. 24, 2015 01:00 PM EDT Reads: 1,318
Working with Big Data is challenging, especially when decision makers depend on market insights and intelligence from your data but don't have quick access to it or find it unusable. In their session at 6th Big Data Expo, Ian Khan, Global Strategic Positioning & Brand Manager at Solgenia; Zel Bianco, President, CEO and Co-Founder of Interactive Edge of Solgenia; and Ermanno Bonifazi, CEO & Founder at Solgenia, discussed how a revolutionary cloud-based BI along with mobile analytics is already c...
May. 24, 2015 01:00 PM EDT Reads: 5,437
How can you compare one technology or tool to its competitors? Usually, there is no objective comparison available. So how do you know which is better? Eclipse or IntelliJ IDEA? Java EE or Spring? C# or Java? All you can usually find is a holy war and biased comparisons on vendor sites. But luckily, sometimes, you can find a fair comparison. How does this come to be? By having it co-authored by the stakeholders. The binary repository comparison matrix is one of those rare resources. It is edite...
May. 24, 2015 01:00 PM EDT Reads: 1,759
Hardware will never be more valuable than on the day it hits your loading dock. Each day new servers are not deployed to production the business is losing money. While Moore's Law is typically cited to explain the exponential density growth of chips, a critical consequence of this is rapid depreciation of servers. The hardware for clustered systems (e.g., Hadoop, OpenStack) tends to be significant capital expenses. In his session at Big Data Expo, Mason Katz, CTO and co-founder of StackIQ, disc...
May. 24, 2015 12:30 PM EDT Reads: 5,271
Container frameworks, such as Docker, provide a variety of benefits, including density of deployment across infrastructure, convenience for application developers to push updates with low operational hand-holding, and a fairly well-defined deployment workflow that can be orchestrated. Container frameworks also enable a DevOps approach to application development by cleanly separating concerns between operations and development teams. But running multi-container, multi-server apps with containers ...
May. 24, 2015 12:00 PM EDT Reads: 2,165
There is no doubt that Big Data is here and getting bigger every day. Building a Big Data infrastructure today is no easy task. There are an enormous number of choices for database engines and technologies. To make things even more challenging, requirements are getting more sophisticated, and the standard paradigm of supporting historical analytics queries is often just one facet of what is needed. As Big Data growth continues, organizations are demanding real-time access to data, allowing immed...
May. 24, 2015 11:30 AM EDT Reads: 3,126
SYS-CON Events announced today that EnterpriseDB (EDB), the leading worldwide provider of enterprise-class Postgres products and database compatibility solutions, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. EDB is the largest provider of Postgres software and services that provides enterprise-class performance and scalability and the open source freedom to divert budget from more costly traditiona...
May. 24, 2015 11:00 AM EDT Reads: 1,749
Enterprises are fast realizing the importance of integrating SaaS/Cloud applications, API and on-premises data and processes, to unleash hidden value. This webinar explores how managers can use a Microservice-centric approach to aggressively tackle the unexpected new integration challenges posed by proliferation of cloud, mobile, social and big data projects. Industry analyst and SOA expert Jason Bloomberg will strip away the hype from microservices, and clearly identify their advantages and d...
May. 24, 2015 10:45 AM EDT Reads: 1,995
Do you think development teams really update those BMC Remedy tickets with all the changes contained in a release? They don't. Most of them just "check the box" and move on. They rose a Risk Level that won't raise questions from the Change Control managers and they work around the checks and balances. The alternative is to stop and wait for a department that still thinks releases are rare events. When a release happens every day there's just not enough time for people to attend CAB meeting...
May. 24, 2015 10:45 AM EDT Reads: 1,535
Data-intensive companies that strive to gain insights from data using Big Data analytics tools can gain tremendous competitive advantage by deploying data-centric storage. Organizations generate large volumes of data, the vast majority of which is unstructured. As the volume and velocity of this unstructured data increases, the costs, risks and usability challenges associated with managing the unstructured data (regardless of file type, size or device) increases simultaneously, including end-to-...
May. 24, 2015 10:30 AM EDT Reads: 4,319
SYS-CON Events announced today that the "First Containers & Microservices Conference" will take place June 9-11, 2015, at the Javits Center in New York City. The “Second Containers & Microservices Conference” will take place November 3-5, 2015, at Santa Clara Convention Center, Santa Clara, CA. Containers and microservices have become topics of intense interest throughout the cloud developer and enterprise IT communities.
May. 24, 2015 10:00 AM EDT Reads: 2,255
Buzzword alert: Microservices and IoT at a DevOps conference? What could possibly go wrong? In this Power Panel at DevOps Summit, moderated by Jason Bloomberg, the leading expert on architecting agility for the enterprise and president of Intellyx, panelists will peel away the buzz and discuss the important architectural principles behind implementing IoT solutions for the enterprise. As remote IoT devices and sensors become increasingly intelligent, they become part of our distributed cloud en...
May. 24, 2015 10:00 AM EDT Reads: 1,998
I’ve been thinking a bit about microservices (μServices) recently. My immediate reaction is to think: “Isn’t this just yet another new term for the same stuff, Web Services->SOA->APIs->Microservices?” Followed shortly by the thought, “well yes it is, but there are some important differences/distinguishing factors.” Microservices is an evolutionary paradigm born out of the need for simplicity (i.e., get away from the ESB) and alignment with agile (think DevOps) and scalable (think Containerizati...
May. 24, 2015 09:45 AM EDT Reads: 1,488
In this Power Panel at DevOps Summit, moderated by Jason Bloomberg, president of Intellyx, panelists Roberto Medrano, Executive Vice President at Akana; Lori MacVittie, IoT_Microservices Power PanelEvangelist for F5 Networks; and Troy Topnik, ActiveState’s Technical Product Manager; will peel away the buzz and discuss the important architectural principles behind implementing IoT solutions for the enterprise. As remote IoT devices and sensors become increasingly intelligent, they become part of ...
May. 24, 2015 09:45 AM EDT Reads: 1,720
You often hear the two titles of "DevOps" and "Immutable Infrastructure" used independently. In his session at DevOps Summit, John Willis, Technical Evangelist for Docker, will cover the union between the two topics and why this is important. He will cover an overview of Immutable Infrastructure then show how an Immutable Continuous Delivery pipeline can be applied as a best practice for "DevOps." He will end the session with some interesting case study examples.
May. 24, 2015 09:00 AM EDT Reads: 2,110
SYS-CON Media named Andi Mann editor of DevOps Journal. DevOps Journal is focused on this critical enterprise IT topic in the world of cloud computing. DevOps Journal brings valuable information to DevOps professionals who are transforming the way enterprise IT is done. Andi Mann, Vice President, Strategic Solutions, at CA Technologies, is an accomplished digital business executive with extensive global expertise as a strategist, technologist, innovator, marketer, communicator, and thought lea...
May. 24, 2015 09:00 AM EDT Reads: 1,854
Even though it’s now Microservices Journal, long-time fans of SOA World Magazine can take comfort in the fact that the URL – soa.sys-con.com – remains unchanged. And that’s no mistake, as microservices are really nothing more than a new and improved take on the Service-Oriented Architecture (SOA) best practices we struggled to hammer out over the last decade. Skeptics, however, might say that this change is nothing more than an exercise in buzzword-hopping. SOA is passé, and now that people are ...
May. 24, 2015 09:00 AM EDT Reads: 3,657