Welcome!

Microservices Expo Authors: Pat Romanski, Liz McMillan, Mehdi Daoudi, Elizabeth White, Jason Bloomberg

Related Topics: @CloudExpo, Microservices Expo

@CloudExpo: Article

Database as a Service: A Different Way to Manage Data

An important tool in a developer’s toolbox for rapid development

SaaS Journal on Ulitzer

SaaS has rapidly evolved from an online application source to providing application building blocks such as

  • Platform as-a-Service (PaaS)
  • Infrastructure-as-a-Service (IaaS) and
  • Database-as-a-Service (DaaS)

DaaS is the latest entrant into the "as a Service" realm and typically provides tools for defining logical data structures, data services like APIs and web service interfaces, customizable user interfaces, and data storage, backup, recovery and export policies. To ensure successful DaaS implementations, developers and database professionals need to address traditional challenges associated with data design and performance tuning. They will also need to address new challenges introduced by the lack of physical access for backup, recovery and integration.

What Is DaaS?
DaaS provides traditional database features, typically data definition, storage and retrieval, on a subscription basis over the web. To subscribers DaaS appears as a black box supporting logical data operations, and logical data stores where customers can only see their organization's data. Physical access is seen as a security risk and thus it is not available. As with SaaS, DaaS vendors build and manage data centers incorporating best practices in security, back-up, recovery and customer support. Data services typically are provided as SOAP or REST APIs allowing users to define data structures, perform CRUD operations, manage entitlements and query the database using a subset of standard SQL.

Real-World Examples: Force.com and Amazon SimpleDB
Two real-world examples of DaaS are Salesforce.com's Force.com, which provides data services in its toolkit for building applications, and Amazon's SimpleDB, which provides an API for creating data stores which can be used for applications or pure data storage.

Force.com
Force.com supports the Model-View-Controller paradigm for application development where Model refers to the data model.

  • Database schema: Developers can configure pick list values for fields in standard CRM objects (tables), or create custom objects and fields via the Salesforce.com Setup menu. Data elements can also be defined programmatically through the Metadata API, which is used by the Force.com IDE, an add-on for Eclipse. Lookup fields and parent-child relationships allow foreign key relationships between tables.
  • CRUD operations: Data entry, updates and deletes can be performed using Force.com pages that are automatically generated for each table, or through the Force.com Web Services API. Apex, Force.com's programming language, provides the ability to develop object oriented code to perform data operations.
  • Database queries: Querying data is done through SOQL, Force.com's subset of SQL. SOQL provides read-only access via the Web Services API or Apex, Force.com's development language.
  • Stored procedures: Custom business rules can be implemented as Triggers, the equivalent of database stored procedures written in Apex.
  • Pro: Force.com database development and functionality parallels traditional database development.
  • Con: Force.com database design requires careful design and coding.

Amazon SimpleDB
Amazon.com's SimpleDB service appears to be geared to developing applications quickly with minimal effort on database design and definition:

  • Database schema: SimpleDB stores data in "domains," the equivalent of a spreadsheet tab. Once a domain is created attributes (fields) are created when records are added to the domain. Each record requires a unique ID string for each item (record) and attributes are added as name-value pairs such as ("First Name", "Tara"). Items are limited to 256 name-value pairs, and domains are limited to 1 billion attributes.
  • CRUD operations: SimpleDB uses the Put to insert and update items, Get to retrieve an item by unique number, and Delete to delete records.
  • Database queries: SimpleDB supports a subset of SQL for read-only access to data. SimpleDB does not support queries across domains, so SQL joins are not available. The Developer Guide suggests storing related data in a single domain as a workaround.
  • Stored procedures: SimpleDB currently does not support stored procedures.
  • Pro: SimpleDB allows rapid development of web based applications requiring data services.
  • Con: SimpleDB does not support joins, foreign keys and stored procedures. Porting complex applications to SimpleDB may not be feasible.

What Is Data Management?
One could argue that data management began as early as man invented written communication. Even cataloging, bookkeeping, and archiving, which are all forms of data management, are known to exist in ancient times. In recent history, the first computerized database management systems started to evolve in the 1960's when the primary data storage media were magnetic tapes.[1] In the 1980's, Data Management also became known as Data Resource Management and Enterprise Information Management as organizations recognized corporate data as assets that must be managed. The publication of the DAMA-DMBOK Guide [2] this year is a major step to formalizing data management as a science and practice. Data management functions discussed in this article are based upon this guide.

The DAMA-DMBOK Guide identifies ten data management functions found in most organizations. These functions are briefly described below:

  1. Data Governance - planning, supervision and control over data management and use
  2. Data Architecture Management - as an integral part of the enterprise architecture
  3. Data Development - analysis, design, building, testing, deployment and maintenance
  4. Database Operations Management - support for structured physical data assets
  5. Data Security Management - ensuring privacy, confidentiality and appropriate access
  6. Reference & Master Data Management - managing golden versions and replicas
  7. Data Warehousing & Business Intelligence Management - enabling access to decision support data for reporting and analysis
  8. Document & Content Management - storing, protecting, indexing and enabling access to data found in unstructured sources (electronic files and physical records)
  9. Meta Data Management - integrating, controlling and delivering meta data
  10. Data Quality Management - defining, monitoring and improving data quality

Figure 1 shows the scope of each of these functions. This article focuses on the Data Development and Database Operations Management functions as they relate to DaaS. It points out similarities and differences in managing data that resides in a DaaS environment versus a non-DaaS environment as well as key implementation challenges in present day technologies.

DaaS Data Management
How does data management in a DaaS environment differ from traditional environments?

All of the data management functions shown in Figure 1 apply to DaaS but with a twist introduced by the additional layer of abstraction presented by the Service Oriented Architecture (SOA) that defines the DaaS. As with other types of SOA implementations, a DaaS provider hides the physical implementation and complexities of managing the data stores from its consumers, while at the same time providing a ubiquitous language agnostic API, such as XML, to enter, retrieve, and manipulate data.

For those familiar with relational database managements systems (RDBMS) such as Oracle and SQL Server, DaaS is analogous to RDBMS as RDBMS is to flat files. In both cases, the abstractions introduced by the newer platform simplify access and management of data but in some cases limit what you can do with the data. Limitations are usually overcome as the platform matures. For example, a DaaS query language might initially allow access to one object (table) at a time but future releases may provide two or more objects to be joined together.

Just like the logical abstractions of RDBMS hides the fact that tables are implemented as files; DaaS objects might actually be implemented as RDBMS tables behind the scene. However, a DaaS consumer does not need to know that and does not have to be concerned with the maintenance of the underlying RDBMS. Table 1 shows more examples on how DaaS differs from RDBMS using Force.com and Oracle as bases for comparison.

DaaS Challenges
DaaS presents many advantages and promises. However, adopters of this new paradigm may find some new challenges, some of which are highlighted below using Force.com as an example.

Data Design
Joins

In order to optimize performance and simplify data access DaaS typically limits on resource intensive queries and reports. For queries this may mean that certain join statements or outer joins are not supported, or the number of entities that can be queried is limited. Similarly, report writers may limit joins by controlling what entities are available. Approaches for dealing with joins include copying some attributes of master objects into child objects, or writing code to merge master and detail results.

Physical Database Access
Performance Tuning

DaaS by nature hides the underlying details of the physical database implementation. At this point in time, troubleshooting and performance tuning require cobbling together various tools and approaches. While a best practice is to always follow vendor recommendations carefully, empirical data can be gathered via commercial performance testing tools, custom scripts/code, manual testing and vendor profiling tools. Analyze the results carefully and consult the vendor if their best practices do not mesh with the data.

Data Partitioning
Since DaaS provides logical database services there is no standard for partitioning data.  Best practice is to review vendor documentation on performance, especially for large data volumes. Approaches to data partitioning include defining tables or namespaces in lieu of partitions, creating indexed fields as partition filters, creating hierarchies and entitlements to control data visibility, or licensing multiple DaaS instances.

Backup and Recovery
While DaaS provides high performance tools for querying and exporting data, it can be difficult to perform a "database dump" that includes exporting data, Metadata and code as one operation. And once the data is "dumped" there may be no facility to rebuild a database from the database dump. Administrators used to these features with on-premise software must develop custom scripts for dumping and loading data for DaaS.

Transaction Processing
Some DaaS implementations allow the equivalent of stored procedures to support referential integrity and transaction logic. One workaround is the tried and true polling service that looks for updated records and performs the appropriate operations for inserts, deletes and updates. Regardless of the approach, pay careful attention to commit/rollback logic and error handling.

Benefits
Despite some of the limitations listed above, DaaS adoption is being driven by multiple factors that speed application delivery including:

Ease of Deployment
Without the need to procure, install and configure equipment DaaS can be rapidly deployed. And since vendors have already done extensive performance tuning on logical data services there may be little need to do performance tuning or extensive data design.

Platform Independence
Because DaaS is web-based most vendors comply with web standards, providing interoperability with desktops, servers and development tools from many vendors. Web service APIs for SOAP or REST are typically interoperable with multiple development platforms such as Adobe, Java, Mac OS X Cocoa, .NET and Visual Basic, Ruby On Rails, Perl, PHP and Python.

Simplified Database Administration
Database administrators may not need to understand SQL or APIs to configure DaaS databases. Data objects, custom fields, validation rules and data entry forms may be configured via the DaaS user interface. These are logical operations changes are usually available immediately via reports, SQL queries and web services.

Standardized Data Integration
DaaS web services provide programmatic access to data via the vendor's API. Ubiquitous support for SOAP and REST services in ETL tools, middleware and application servers facilitate integration with most platforms. And since most DaaS vendors provide text-based file import and export, batch processing or semi-manual procedures allow integration with legacy systems and new applications.

Conclusion
Database as a Service can be an important tool in a developer's toolbox for rapid development as well as for organizations that have more limited IT infrastructure resources.  As illustrated above there are some limitations but often the overall benefits to the project outweigh them.  Regardless of the approach, Database as a Service is something all Data Architects need to know and understand as we move into the next decade.  It is critical to enabling the movement of enterprise applications to the cloud.

References

  1. Olle, T. William, 2006,"Nineteen Sixties History of Data Base Management", (ISBN: 978-0-397-34637-3), Springer Boston.
  2. DAMA International, 2009, "The DAMA Guide to the Data Management Body of Knowledge", (ISBN: 0977140083), Technics Publications, LLC.

More Stories By Gary Hamilton

Gary Hamilton is a Global Services expert at Acumen Solutions, a leading business and technology consulting firm with offices across the U.S. and Europe. He is a hands-on leader skilled in application development, delivery and operations. Over the last five years, Gary has established expertise in CRM systems, with a focus on web services and systems integrations.

More Stories By Jocelyn Quimbo

Jocelyn Quimbo is a Senior Data Architect/Integrator at Acumen Solutions, a leading business and technology consulting firm with offices across the U.S. and Europe. Acumen Solutions helps clients turn customers into advocates, suppliers into partners and leverage systems and data for effective business decisions and process automation. Jocelyn holds an MS in Computer and Information Science from Florida State University.

More Stories By Saurabh Verma

Saurabh Verma is Director of Global Services at Acumen Solutions at Acumen Solutions, a leading business and technology consulting firm with offices across the U.S. and Europe. He is PMP certified with over 13+ years of progressive technical services and program management experience. Saurabh is an expert in strategic and management consulting with focus on Americas and EMEA telecommunication industry.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Microservices Articles
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm. In their Day 3 Keynote at 20th Cloud Expo, Chris Brown, a Solutions Marketing Manager at Nutanix, and Mark Lav...
"NetApp's vision is how we help organizations manage data - delivering the right data in the right place, in the right time, to the people who need it, and doing it agnostic to what the platform is," explained Josh Atwell, Developer Advocate for NetApp, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
TCP (Transmission Control Protocol) is a common and reliable transmission protocol on the Internet. TCP was introduced in the 70s by Stanford University for US Defense to establish connectivity between distributed systems to maintain a backup of defense information. At the time, TCP was introduced to communicate amongst a selected set of devices for a smaller dataset over shorter distances. As the Internet evolved, however, the number of applications and users, and the types of data accessed and...
The Software Defined Data Center (SDDC), which enables organizations to seamlessly run in a hybrid cloud model (public + private cloud), is here to stay. IDC estimates that the software-defined networking market will be valued at $3.7 billion by 2016. Security is a key component and benefit of the SDDC, and offers an opportunity to build security 'from the ground up' and weave it into the environment from day one. In his session at 16th Cloud Expo, Reuven Harrison, CTO and Co-Founder of Tufin, ...
Many organizations are now looking to DevOps maturity models to gauge their DevOps adoption and compare their maturity to their peers. However, as enterprise organizations rush to adopt DevOps, moving past experimentation to embrace it at scale, they are in danger of falling into the trap that they have fallen into time and time again. Unfortunately, we've seen this movie before, and we know how it ends: badly.
Kin Lane recently wrote a couple of blogs about why copyrighting an API is not common. I couldn’t agree more that copyrighting APIs is uncommon. First of all, the API definition is just an interface (It is the implementation detail … Continue reading →
The Jevons Paradox suggests that when technological advances increase efficiency of a resource, it results in an overall increase in consumption. Writing on the increased use of coal as a result of technological improvements, 19th-century economist William Stanley Jevons found that these improvements led to the development of new ways to utilize coal. In his session at 19th Cloud Expo, Mark Thiele, Chief Strategy Officer for Apcera, compared the Jevons Paradox to modern-day enterprise IT, examin...
Kubernetes is a new and revolutionary open-sourced system for managing containers across multiple hosts in a cluster. Ansible is a simple IT automation tool for just about any requirement for reproducible environments. In his session at @DevOpsSummit at 18th Cloud Expo, Patrick Galbraith, a principal engineer at HPE, discussed how to build a fully functional Kubernetes cluster on a number of virtual machines or bare-metal hosts. Also included will be a brief demonstration of running a Galera MyS...
Your homes and cars can be automated and self-serviced. Why can't your storage? From simply asking questions to analyze and troubleshoot your infrastructure, to provisioning storage with snapshots, recovery and replication, your wildest sci-fi dream has come true. In his session at @DevOpsSummit at 20th Cloud Expo, Dan Florea, Director of Product Management at Tintri, provided a ChatOps demo where you can talk to your storage and manage it from anywhere, through Slack and similar services with...
Growth hacking is common for startups to make unheard-of progress in building their business. Career Hacks can help Geek Girls and those who support them (yes, that's you too, Dad!) to excel in this typically male-dominated world. Get ready to learn the facts: Is there a bias against women in the tech / developer communities? Why are women 50% of the workforce, but hold only 24% of the STEM or IT positions? Some beginnings of what to do about it! In her Day 2 Keynote at 17th Cloud Expo, Sandy Ca...