Transforming Clinical Research Data: Application Migration to Databricks and Automation

Accelerating Data Efficiency for a Global CRO

A large CRO migrated applications to Databricks, introduced automation & BI tools, cutting manual effort by ~12% and accelerating matching record discrepancies with self-service reporting.

Client

Our client, one of the world’s largest global Contract Research Organizations (CROs), embarked on a significant Databricks migration to modernize their data infrastructure and enhance operational efficiency. This global CRO has been offering a comprehensive range of expert-based clinical research, consulting, and technology solutions to the pharmaceutical, biotechnology, and medical device industries. With over 40 years of experience, their mission is to make clinical research a viable care option for individuals worldwide.

Challenge

Our client faced several significant obstacles that hindered their operations:

Performance Complexity: Managing vast amounts of study data from various sources, including clinical and third-party applications, the client struggled with severe performance issues due to outdated legacy enterprise data warehouses.
Manual Dependency: Data comparison between sources and targets required substantial manual effort. This meticulous process of examining and cross-referencing datasets, investigating discrepancies, and ensuring data accuracy was both time-consuming and labor-intensive.
Quality Assurance: Ensuring data integrity involved manual verification of successful data transfers from source to target, including checking for truncation errors. Frequent production releases further exacerbated the manual workload.
Lack of Cloud-native Services: The client faced scalability challenges due to the absence of cloud-native services, impeding the development and deployment of machine learning models.
Limited Reporting Features: Without self-service BI capabilities, end-users relied heavily on IT and data analysts for guided analytics, restricting the scalability of reporting features.

Solution

To tackle these challenges and meet the client’s expanding business needs, Hexaware implemented a comprehensive strategy:

Azure Data Factory Implementation: Hexaware assisted the client in exploring Azure Data Factory (ADF) as a viable solution for their finance module. This involved transferring data from the source to bronze, silver, and gold layers, optimizing data flow and management.
Application Migration and Databricks Integration: As part of the client’s ongoing migration of over 40 applications, Hexaware supported the initial adoption of ADF for ETL processes from bronze to gold layers. The client is now leveraging Databricks, with Hexaware’s expertise, for faster and more efficient query resolution and data transformation.
Power BI Reporting: Hexaware is developing Power BI reports that seamlessly integrate with Azure Databricks data, providing the client with advanced analytics and visualization capabilities.
Python Automation Framework: To expedite the validation process, Hexaware proposed a Python automation framework. This innovative solution automates data validation between various source formats and target databases within the Databricks environment, significantly reducing manual effort.
Data Discrepancy Resolution: The automation framework also highlights mismatched data between the source and target, ensuring data accuracy and integrity throughout the process.

Benefits

The client experienced significant transformative benefits, including:

Boosted Productivity: Automation has significantly streamlined processes, resulting in enhanced operational efficiency and superior outcomes for the client.
Effort Savings: The client has realized a notable reduction in manual processes, achieving around 12% effort savings and allowing resources to be redirected towards more strategic tasks.
Accelerated Analysis: The time required to identify record mismatches has been drastically reduced, speeding up data analysis and decision-making.
Empowered Users: With the implementation of self-service BI, end users can now independently create dashboards and reports, reducing reliance on IT and data analysts.
Enhanced Performance: Leveraging Databricks Spark engine, the client can process large data sets without performance bottlenecks, ensuring smooth and efficient data handling.
Seamless Integration: Databricks’ flexibility allows for easy integration with other systems, enhancing the client’s data ecosystem.
Legacy System Modernization: By decommissioning outdated on-premises systems, the client can now fully leverage modern, cloud-native solutions, paving the way for innovation and scalability.

Summary

Hexaware embarked on a groundbreaking journey to address the legacy transformation and data migration challenges faced by one of the world’s largest clinical research organizations. By implementing a strategic application migration to Databricks and introducing an advanced automation framework, we significantly reduced manual dependencies in data ingestion, validation, and verification processes. This transformation led to remarkable effort savings and a substantial decrease in analysis time for record mismatches. Empowered with the ability to leverage native cloud services, the client can now develop and deploy cutting-edge AI and ML models, driving operational excellence and paving the way for future innovations in clinical research.

View PDF

Every outcome starts with a conversation

Ready to Pursue Opportunity?

Connect Now