Microservices Expo Authors: Liz McMillan, Pat Romanski, Elizabeth White, Mehdi Daoudi, Yeshim Deniz

Related Topics: Microservices Expo

Microservices Expo: Article

SOA to the Rescue, When Drug Discovery Needs Data Fast!

Information is key to drug discovery

SOA Data Services Approach Selected
After reviewing several new alternative approaches, we identified SOA data services as the best one for meeting our criteria.

Data services are a form of Web Service optimized for real-time data integration. Data services virtualize data to decouple physical and logical locations and therefore avoid unnecessary data replication. Data services abstract complex data structures and syntax. Data services federate disparate data into useful composites. Data services also support data integration across both SOA and non-SOA applications.

Architecturally, data services combine to form a middle layer of reusable services, or a data services layer, decoupled from both the underlying source-data layer as well as the consuming solutions layer. This provides the flexibility required to deal with each layer in the most effective manner, as well as the agility to work quickly across layers such as applications, schemas, or underlying data sources change (see Figure 1).

Beyond providing complex multi-source data integration, data services meet our other criteria as well. Because data services are on-demand, they meet our requirement for real-time information delivery. By not replicating data, data services eliminate the time required for building and testing marts. Further, data services can be automatically generated directly from our data models and so don't require coding. Data services, due to abstraction, can often be reused across projects. Finally, data services, because of their architecture, XML support capabilities, and standards compliance, are inherently SOA-compliant.

Data Services Infrastructure Technology Selected
Once we chose a SOA data services approach, we searched for a data services infrastructure provider that offered development tools and an appropriate run-time environment. We selected Composite Software. With more than 20 projects running in various Pfizer divisions and a Composite Center of Excellence at our headquarters, Composite was a proven vendor at Pfizer and its best-of-breed offerings met our search criteria.

Now our overall data integration capabilities include data virtualization, data abstraction, and data federation across both SOA and non-SOA environments. Delivered via Composite's Information Server, the solution supports both our design and run-time requirements. At build time, we have an easy-to-use data modeler and code generator to abstract our data in the form of relational views for reporting and other uses and/or Web data services for SOA initiatives. Its high-performance query engine securely accesses, federates, and delivers the diverse distributed data to our consuming solutions in real-time.

The Proof Was in the Portal
With our data services strategy and data integration toolset in hand, our next task was to do a pilot project. We wanted to see if we could successfully complete the project, and if we could complete it much faster while complying with SOA principles.

For our pilot, we selected the Drug Discovery Portfolio portal. This project easily met our evaluation criteria.

Business Requirements
Senior management, project team leaders, business analysts, and research scientists across Pfizer's R&D and commercial business units need to continuously evaluate our portfolio of discovery projects and drugs in development. This analysis includes how these projects fit into Pfizer's overall strategic portfolio as well as how each will be impacted by costs, market conditions and available resources. A complete picture of each particular project, as well as an overview of all the projects, is needed for major business decisions to be based on all relevant factors. Real-time access to this information is critical, so Pfizer can rapidly react to unforeseen events intelligently.

User Interface Requirements
We selected a Web portal as the user interface because this provides the most flexible and accessible solution for our wide range of information users. This means existing data has to be delivered in the form of Web data services for our portal developers and our portal toolset to consume easily.

Data Integration Requirements
Key data to be delivered includes both key metrics and details such as project costs, resources, timelines and ROI calculations, to name a few. This diverse data needs to be integrated from a wide variety of source applications from across various Pfizer groups. This diversity of source system data structures enabled us to evaluate and thoroughly test Composite's data connector and transform capabilities during the pilot project. We also thoroughly tested Composite's high-performance query algorithms through the dynamic nature of the sources and the need for real-time delivery. Because many teams from across the globe needed to be involved to provide access to the right data, we added ease-of-use to our RAD evaluation criteria.

Pilot Benchmark: The Data Mart Approach
To compare the relative and absolute strengths and weaknesses of the new data services approach and the Information Server versus our traditional approach, we invested in a small benchmark of the "old way." Benchmarking the functional and technical specifications lets us compare end solution delivery. Benchmarking the development process lets us compare time-to-solution and development costs.

Functional and Technical Specification
We already knew we could use our ETL/data mart tools to successfully combine the data required into a mart. Unfortunately, putting the relational data into a mart was only half the job. We still needed to get this data out of the mart and into the portal in the form of a Web Service. We found this requires manual coding and an additional toolset. What's more, to achieve the real-time delivery requirement, we found we needed to achieve unrealistic refresh rates using highly complex change data capture techniques.

Development Process
In a side-by-side comparison, Table 1 represents the steps used in an ETL versus a data services approach.

Problems with the Data Mart Approach
The ETL/data mart approach was not ideal for this project for the following reasons:
  •   We could only come close to meeting the real-time integration requirements if we used advanced change data capture and frequent refresh features.
  •   We found that the data mart was physically instantiated in a relational form. Yet, our portal developers wanted the data in the form of WSDL Web Services that are easier for the portal to consume.
  •   Sequential development such as building the ETL scripts, the mart, the delivery scripts, and then the portal application stretched the elapsed time thereby pushing out business benefits and adding costs.
  •   ETL and Web Service scripting were slow manual development processes.
  •   Scheduling the setup of the data mart infrastructure required coordinating with our operations group, fitting into its schedule and backlog.
  •   Replicated data in the mart would need to be maintained and controlled in addition to the original source data.
  •   Data security requires additional manual coding.
  •   Any changes required ETL scripts to be changed, as well as the mart to be reloaded, slowing our response to new requirements or even simple bug fixes.
  •   More data structure and syntax expertise was required by developers throughout the process, not just basic SQL.

SOA Data Services Approach Pilot Meets the Spec, Is Faster, and More
The data services approach proved ideal for our Drug Discovery Portfolio Portal project.
  •   We completed our project in less than half the time of traditional development. Much of the data-level development was automated, freeing our skilled development team to work on application-level development.
  •   Fewer skills were needed due to the drag-and-drop data service development environment, built-in security, and automated generation of Web data services.
  •   SOA-compliant WSDL data services provided data in the form the portal developers needed.
  •   Loosely coupled data services were easier to maintain than ETL scripts in case of changes either to the underlying data sources or the portal.
  •   Data service assets built for the portal project can be reused by other development projects.
  •   We no longer needed our IT operations team to build and maintain the data mart infrastructure. No extra costs for the mart itself.

Pfizer Informatics Adopts Data Services Approach
Going forward, we plan to use the data services approach and tools for all projects requiring complex data integration across multiple heterogeneous sources because the data services approach reduces unnecessary data replication and provides real-time information delivery, rapid application development, and SOA compliance.

We learned a number of lessons applicable to future projects. Data integration doesn't have to be hard or time-consuming with the right approach and right supporting tools. Virtualizing data versus replicating saves time and money. Rapid prototyping is possible, even automatic, when the right tools are used. Agility and reuse, the promise of SOA, comes to life in loosely coupled data services that span the gap between source data and end applications.

Moving from Pilot to Enterprise, Funded by Time and Cost Savings
With the new SOA data services approach to data integration proven, we have now put together our roadmap for future adoption. This roadmap includes educating our business analysts, developers, and architects on when to use data services and when to adopt the RAD approach to building SOA data services as the solution standard across all new SOA projects where data integration is required. Second, we plan to implement a "data services reuse" metric for measuring success across future projects to reduce development and maintenance costs. In addition, we're working with the centralized shared services team to create a Data Services Center of Excellence that promotes best practices, optimizes economies of scale, and maximizes reach across projects. Finally, we'll continue to seek emerging technologies and agile development practices that accelerate SOA projects and enable us to move to SOA in a safe and powerful way.

As advances in medical care and the need for new medicines continue to grow, the need for better ways to manage and deliver information is growing. In the same spirit that makes Pfizer a trusted leader in drug discovery and commercialization, the informatics group is pressing forward to meet the ever-demanding needs of our internal R&D customers as well.

Successful drug discovery needs data fast. To achieve rapid delivery requires new real-time portals and composite applications that rely heavily on existing data sourced from multiple systems from across the enterprise. Delivering that data to our researchers and managers has been one of our biggest bottlenecks, adding months and cost to our project timelines. These data integration needs, along with our aggressive SOA strategy and RAD objectives, have driven us to find, test, and deploy a new approach to data integration - SOA data services.

More Stories By Daniel Eng

Daniel Eng has over 17 years of diverse IT experience in managing projects, leading technical teams, and developing enterprise applications within Fortune 100 companies. Currently at Pfizer Global Research and Development, Dan is leading efforts in transitioning business processes and applications into a SOA environment by using emerging technologies and agile management practices. Prior to Pfizer, he was an independent consultant helping his Fortune 500 clients in developing intranet sites, portable applications and e-commerce solutions. Dan has also worked in many e-commerce start-ups and healthcare organizations. He holds a BSEE degree from Polytechnic University and an MBA degree from Gonzaga University.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

Microservices Articles
Containers, microservices and DevOps are all the rage lately. You can read about how great they are and how they’ll change your life and the industry everywhere. So naturally when we started a new company and were deciding how to architect our app, we went with microservices, containers and DevOps. About now you’re expecting a story of how everything went so smoothly, we’re now pushing out code ten times a day, but the reality is quite different.
"We do one of the best file systems in the world. We learned how to deal with Big Data many years ago and we implemented this knowledge into our software," explained Jakub Ratajczak, Business Development Manager at MooseFS, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
Gone are the days when application development was the daunting task of the highly skilled developers backed with strong IT skills, low code application development has democratized app development and empowered a new generation of citizen developers. There was a time when app development was in the domain of people with complex coding and technical skills. We called these people by various names like programmers, coders, techies, and they usually worked in a world oblivious of the everyday pri...
Traditional IT, great for stable systems of record, is struggling to cope with newer, agile systems of engagement requirements coming straight from the business. In his session at 18th Cloud Expo, William Morrish, General Manager of Product Sales at Interoute, will outline ways of exploiting new architectures to enable both systems and building them to support your existing platforms, with an eye for the future. Technologies such as Docker and the hyper-convergence of computing, networking and...
Using new techniques of information modeling, indexing, and processing, new cloud-based systems can support cloud-based workloads previously not possible for high-throughput insurance, banking, and case-based applications. In his session at 18th Cloud Expo, John Newton, CTO, Founder and Chairman of Alfresco, described how to scale cloud-based content management repositories to store, manage, and retrieve billions of documents and related information with fast and linear scalability. He addres...
"We focus on SAP workloads because they are among the most powerful but somewhat challenging workloads out there to take into public cloud," explained Swen Conrad, CEO of Ocean9, Inc., in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
DevOps tends to focus on the relationship between Dev and Ops, putting an emphasis on the ops and application infrastructure. But that’s changing with microservices architectures. In her session at DevOps Summit, Lori MacVittie, Evangelist for F5 Networks, will focus on how microservices are changing the underlying architectures needed to scale, secure and deliver applications based on highly distributed (micro) services and why that means an expansion into “the network” for DevOps.
Containers and Kubernetes allow for code portability across on-premise VMs, bare metal, or multiple cloud provider environments. Yet, despite this portability promise, developers may include configuration and application definitions that constrain or even eliminate application portability. In this session we'll describe best practices for "configuration as code" in a Kubernetes environment. We will demonstrate how a properly constructed containerized app can be deployed to both Amazon and Azure ...
The now mainstream platform changes stemming from the first Internet boom brought many changes but didn’t really change the basic relationship between servers and the applications running on them. In fact, that was sort of the point. In his session at 18th Cloud Expo, Gordon Haff, senior cloud strategy marketing and evangelism manager at Red Hat, will discuss how today’s workloads require a new model and a new platform for development and execution. The platform must handle a wide range of rec...
While some developers care passionately about how data centers and clouds are architected, for most, it is only the end result that matters. To the majority of companies, technology exists to solve a business problem, and only delivers value when it is solving that problem. 2017 brings the mainstream adoption of containers for production workloads. In his session at 21st Cloud Expo, Ben McCormack, VP of Operations at Evernote, discussed how data centers of the future will be managed, how the p...