Case Study
SOA to the Rescue, When Drug Discovery Needs Data Fast!
Information is key to drug discovery
Dec. 1, 2007 06:30 PM
Digg This!
Page 2 of 2
« previous page
SOA Data Services Approach Selected
After reviewing several new alternative approaches, we identified SOA data services as the best one for meeting our criteria.
Data services are a form of Web Service optimized for real-time data
integration. Data services virtualize data to decouple physical and
logical locations and therefore avoid unnecessary data replication.
Data services abstract complex data structures and syntax. Data
services federate disparate data into useful composites. Data services
also support data integration across both SOA and non-SOA applications.
Architecturally, data services combine to form a middle layer of
reusable services, or a data services layer, decoupled from both the
underlying source-data layer as well as the consuming solutions layer.
This provides the flexibility required to deal with each layer in the
most effective manner, as well as the agility to work quickly across
layers such as applications, schemas, or underlying data sources change
(see Figure 1).
Beyond providing complex multi-source data integration, data services
meet our other criteria as well. Because data services are on-demand,
they meet our requirement for real-time information delivery. By not
replicating data, data services eliminate the time required for
building and testing marts. Further, data services can be automatically
generated directly from our data models and so don't require coding.
Data services, due to abstraction, can often be reused across projects.
Finally, data services, because of their architecture, XML support
capabilities, and standards compliance, are inherently SOA-compliant.
Data Services Infrastructure Technology Selected
Once we chose a SOA data services approach, we searched for a data
services infrastructure provider that offered development tools and an
appropriate run-time environment. We selected Composite Software. With
more than 20 projects running in various Pfizer divisions and a
Composite Center of Excellence at our headquarters, Composite was a
proven vendor at Pfizer and its best-of-breed offerings met our search
criteria.
Now our overall data integration capabilities include data
virtualization, data abstraction, and data federation across both SOA
and non-SOA environments. Delivered via Composite's Information Server,
the solution supports both our design and run-time requirements. At
build time, we have an easy-to-use data modeler and code generator to
abstract our data in the form of relational views for reporting and
other uses and/or Web data services for SOA initiatives. Its
high-performance query engine securely accesses, federates, and
delivers the diverse distributed data to our consuming solutions in
real-time.
The Proof Was in the Portal
With our data services
strategy and data integration toolset in hand, our next task was to do
a pilot project. We wanted to see if we could successfully complete the
project, and if we could complete it much faster while complying with
SOA principles.
For our pilot, we selected the Drug Discovery Portfolio portal. This project easily met our evaluation criteria.
Business Requirements
Senior management,
project team leaders, business analysts, and research scientists across
Pfizer's R&D and commercial business units need to continuously
evaluate our portfolio of discovery projects and drugs in development.
This analysis includes how these projects fit into Pfizer's overall
strategic portfolio as well as how each will be impacted by costs,
market conditions and available resources. A complete picture of each
particular project, as well as an overview of all the projects, is
needed for major business decisions to be based on all relevant
factors. Real-time access to this information is critical, so Pfizer
can rapidly react to unforeseen events intelligently.
User Interface Requirements
We selected a
Web portal as the user interface because this provides the most
flexible and accessible solution for our wide range of information
users. This means existing data has to be delivered in the form of Web
data services for our portal developers and our portal toolset to
consume easily.
Data Integration Requirements
Key data to
be delivered includes both key metrics and details such as project
costs, resources, timelines and ROI calculations, to name a few. This
diverse data needs to be integrated from a wide variety of source
applications from across various Pfizer groups. This diversity of
source system data structures enabled us to evaluate and thoroughly
test Composite's data connector and transform capabilities during the
pilot project. We also thoroughly tested Composite's high-performance
query algorithms through the dynamic nature of the sources and the need
for real-time delivery. Because many teams from across the globe needed
to be involved to provide access to the right data, we added
ease-of-use to our RAD evaluation criteria.
Pilot Benchmark: The Data Mart Approach
To compare
the relative and absolute strengths and weaknesses of the new data
services approach and the Information Server versus our traditional
approach, we invested in a small benchmark of the "old way."
Benchmarking the functional and technical specifications lets us
compare end solution delivery. Benchmarking the development process
lets us compare time-to-solution and development costs.
Functional and Technical Specification
We
already knew we could use our ETL/data mart tools to successfully
combine the data required into a mart. Unfortunately, putting the
relational data into a mart was only half the job. We still needed to
get this data out of the mart and into the portal in the form of a Web
Service. We found this requires manual coding and an additional
toolset. What's more, to achieve the real-time delivery requirement, we
found we needed to achieve unrealistic refresh rates using highly
complex change data capture techniques.
Development Process
In a side-by-side comparison, Table 1 represents the steps used in an ETL versus a data services approach.
Problems with the Data Mart Approach
The ETL/data mart approach was not ideal for this project for the following reasons:
•
We could only come close to meeting the real-time integration
requirements if we used advanced change data capture and frequent
refresh features.
• We found that the data mart was physically instantiated in a
relational form. Yet, our portal developers wanted the data in the form
of WSDL Web Services that are easier for the portal to consume.
• Sequential development such as building the ETL scripts, the
mart, the delivery scripts, and then the portal application stretched
the elapsed time thereby pushing out business benefits and adding costs.
• ETL and Web Service scripting were slow manual development processes.
• Scheduling the setup of the data mart infrastructure required
coordinating with our operations group, fitting into its schedule and
backlog.
• Replicated data in the mart would need to be maintained and controlled in addition to the original source data.
• Data security requires additional manual coding.
• Any changes required ETL scripts to be changed, as well as the
mart to be reloaded, slowing our response to new requirements or even
simple bug fixes.
• More data structure and syntax expertise was required by developers throughout the process, not just basic SQL.
SOA Data Services Approach Pilot Meets the Spec, Is Faster, and More
The data services approach proved ideal for our Drug Discovery Portfolio Portal project.
• We completed our project in less than half the time of
traditional development. Much of the data-level development was
automated, freeing our skilled development team to work on
application-level development.
• Fewer skills were needed due to the drag-and-drop data service
development environment, built-in security, and automated generation of
Web data services.
• SOA-compliant WSDL data services provided data in the form the portal developers needed.
• Loosely coupled data services were easier to maintain than ETL
scripts in case of changes either to the underlying data sources or the
portal.
• Data service assets built for the portal project can be reused by other development projects.
• We no longer needed our IT operations team to build and maintain
the data mart infrastructure. No extra costs for the mart itself.
Pfizer Informatics Adopts Data Services Approach
Going forward, we plan to use the data services approach and tools for
all projects requiring complex data integration across multiple
heterogeneous sources because the data services approach reduces
unnecessary data replication and provides real-time information
delivery, rapid application development, and SOA compliance.
We learned a number of lessons applicable to future projects. Data
integration doesn't have to be hard or time-consuming with the right
approach and right supporting tools. Virtualizing data versus
replicating saves time and money. Rapid prototyping is possible, even
automatic, when the right tools are used. Agility and reuse, the
promise of SOA, comes to life in loosely coupled data services that
span the gap between source data and end applications.
Moving from Pilot to Enterprise, Funded by Time and Cost Savings
With the new SOA data services approach to data integration proven, we
have now put together our roadmap for future adoption. This roadmap
includes educating our business analysts, developers, and architects on
when to use data services and when to adopt the RAD approach to
building SOA data services as the solution standard across all new SOA
projects where data integration is required. Second, we plan to
implement a "data services reuse" metric for measuring success across
future projects to reduce development and maintenance costs. In
addition, we're working with the centralized shared services team to
create a Data Services Center of Excellence that promotes best
practices, optimizes economies of scale, and maximizes reach across
projects. Finally, we'll continue to seek emerging technologies and
agile development practices that accelerate SOA projects and enable us
to move to SOA in a safe and powerful way.
Conclusion
As advances in medical care and the
need for new medicines continue to grow, the need for better ways to
manage and deliver information is growing. In the same spirit that
makes Pfizer a trusted leader in drug discovery and commercialization,
the informatics group is pressing forward to meet the ever-demanding
needs of our internal R&D customers as well.
Successful drug discovery needs data fast. To achieve rapid delivery
requires new real-time portals and composite applications that rely
heavily on existing data sourced from multiple systems from across the
enterprise. Delivering that data to our researchers and managers has
been one of our biggest bottlenecks, adding months and cost to our
project timelines. These data integration needs, along with our
aggressive SOA strategy and RAD objectives, have driven us to find,
test, and deploy a new approach to data integration - SOA data services.
Page 2 of 2
« previous page
About Daniel EngDaniel Eng has over 17 years of diverse IT experience in managing projects, leading technical teams, and developing enterprise applications within Fortune 100 companies. Currently at Pfizer Global Research and Development, Dan is leading efforts in transitioning business processes and applications into a SOA environment by using emerging technologies and agile management practices. Prior to Pfizer, he was an independent consultant helping his Fortune 500 clients in developing intranet sites, portable applications and e-commerce solutions. Dan has also worked in many e-commerce start-ups and healthcare organizations. He holds a BSEE degree from Polytechnic University and an MBA degree from Gonzaga University.