| By Ranko Mosic | Article Rating: |
|
| November 4, 2011 05:09 AM EDT | Reads: |
1,990 |
Typical Oracle VLDB is multi terabyte megalith running on big, expensive hardware. It is hard or impossible to back up, adding or modifying columns can take days and query optimization is very difficult. Database sharding is a well known method of breaking up a large database into smaller, manageable pieces ( database shards ). It is data warehouses i.e. VLDBs that can best take advantage of AWS database sharding capabilities. Basic premise is: manage huge volume of data by splitting it into multiple databases instead of creating table partitions.
Database sharding provides a method for scalability across independent servers, each with their own CPU, memory and disk. A database shard is a horizontal partition in a database. AWS quick instance creation/decommissioning capabilities make it quite easy to implement database sharding in very flexible fashion.

Fact table population can be done in usual ways - via batch jobs or other data load methods. Dimension tables are identical and replicated accross all databases. It is trivial to create copies of dimension tables since they are small in size. User sessions should be directed to appropriate database either manually or in automatic manner - depending on a data segment they wish to access.
Summaries, i.e., queries that need access to data contained in all databases ( month end, year end, YTD summaries ) can be created by issuing distributed query unions across all database shards. Summary queries will take longer time to execute since distributed queries have to access all databases to produce a complete view of data. This can be optimized to perform all heavy join activity ( fact to dimenstions ) locally and only unions ( UNION ALL, to avoid distributed sorting ) are processed across all databases.
It is very simple to create new database instance and populate it with template dimension data. Whole process of instance creation can be scripted in a manner similar to automated creation of new partitions in data warehouses. Dimension data can be loaded either as full database restore, Data Pump import of small dimension tables or CTAS over database link. You should also put in place automated replication jobs to synchronize slowly changing dimension tables across database shards.
Database sharding can be done in classic, non-cloud environments too, but it is quick AWS provisioning capabilities that make this method practical and economical. This way you get all the performance and manageability gains without downsides typically associated with VLDBs.
Published November 4, 2011 Reads 1,990
Copyright © 2011 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Ranko Mosic
Ranko Mosic is consultant - provider of remote Oracle Database Administration Services. He has more than 20 years of experience in IT industry in various consulting roles throught North America. He can be reached at ranko.mosic@gmail.com
- Cloud Expo New York: Why PostgreSQL is the Database for the Cloud
- Cloud Expo New York Speaker Profile: Dave Linthicum – Blue Mountain Labs
- Agile Adoption – Crossing the Chasm
- Cloud Expo New York: The Java EE 7 Platform - Developing for the Cloud
- Write Once Run Anywhere or Cross Platform Mobile Development Tools
- Cross-Platform Mobile Website Development – a Tool Comparison
- Cloud Expo New York: Cloud Architectures Require Scale-Out Storage
- Cloud Expo New York: The Growing Big Data Tools Landscape
- Architecture Governance – the TOGAF Way
- Big Data – A Sea Change of Capabilities in IT
- Cloud Expo New York: Cloud Computing and Healthcare
- Cloud Expo New York: Mobilizing Enterprise Applications for the Cloud
- Cloud Expo New York: Why PostgreSQL is the Database for the Cloud
- Cloud Expo New York Speaker Profile: Dave Linthicum – Blue Mountain Labs
- Agile Adoption – Crossing the Chasm
- Red Hat Executive Appointed to Technology Services Industry Association (TSIA) Support Services Advisory Board
- Graal, a Dynamic Java Compiler in the Works
- Cloud Expo New York: The Java EE 7 Platform - Developing for the Cloud
- Write Once Run Anywhere or Cross Platform Mobile Development Tools
- Cross-Platform Mobile Website Development – a Tool Comparison
- Cloud Expo New York: Cloud Architectures Require Scale-Out Storage
- What Motivates Open Standards in the Cloud?
- Cloud Expo New York: The Growing Big Data Tools Landscape
- Architecture Governance – the TOGAF Way
- The i-Technology Right Stuff
- The Top 150 Players in Cloud Computing
- Who Are The All-Time Heroes of i-Technology?
- Where Are RIA Technologies Headed in 2008?
- Get the Message
- ESB Myth Busters: 10 Enterprise Service Bus Myths Debunked
- i-Technology Viewpoint: Is Web 2.0 the Global SOA?
- i-Technology Viewpoint: Thinking Outside the VC Box
- i-Technology Viewpoint: When to Leave Your First IT Job
- SOA Web Services Edge Conference Coverage on SYS-CON.TV
- SYS-CON.TV's "SOA Web Services" and "Enterprise Open Source" Programs To Air in December
- Five Reasons Why Web 2.0 Matters


















