Welcome!

SOA & WOA Authors: Marilyn Moux, Elizabeth White, Pat Romanski, Liz McMillan, Maureen O'Gara

Related Topics: SOA & WOA, Oracle

SOA & WOA: Article

Overcoming Large Oracle Database Performance Problems

Practical Database Sharding with Amazon Web Services

Typical Oracle VLDB is multi terabyte megalith running on big, expensive hardware. It is hard or impossible to back up, adding or modifying columns can take days and query optimization is very difficult. Database sharding is a well known method of breaking up a large database into smaller, manageable pieces ( database shards ). It is data warehouses i.e. VLDBs that can best take advantage of AWS database sharding capabilities. Basic premise is: manage huge volume of data by splitting it into multiple databases instead of creating table partitions.

Database sharding provides a method for scalability across independent servers, each with their own CPU, memory and disk. A database shard is a horizontal partition in a database. AWS quick instance creation/decommissioning capabilities make it quite easy to implement database sharding in very flexible fashion.

Fact table population can be done in usual ways - via batch jobs or other data load methods. Dimension tables are identical and replicated accross all databases. It is trivial to create copies of dimension tables since they are small in size.  User sessions should be directed to appropriate database either manually or in automatic manner - depending on a data segment they wish to access.

Summaries, i.e., queries that need access to data contained in all databases ( month end, year end, YTD summaries ) can be created by issuing distributed query unions across all database shards. Summary queries will take longer time to execute since distributed queries have to access all databases to produce a complete view of data. This can be optimized to perform all heavy join activity ( fact to dimenstions ) locally and only unions ( UNION ALL, to avoid distributed sorting ) are processed across all databases.

It is very simple to create new database instance and populate it with template dimension data. Whole process of instance creation can be scripted in a manner similar to automated creation of new partitions in data warehouses. Dimension data can be loaded either as full database restore, Data Pump import of small dimension tables or CTAS over database link. You should also put in place automated replication jobs to synchronize slowly changing dimension tables across database shards.

Database sharding can be done in classic, non-cloud environments too, but it is quick AWS provisioning capabilities that make this method practical and economical. This way you get all the performance and manageability gains without downsides typically associated with VLDBs.

More Stories By Ranko Mosic

Ranko Mosic is consultant - provider of remote Oracle Database Administration Services. He has more than 20 years of experience in IT industry in various consulting roles throught North America. He can be reached at ranko.mosic@gmail.com