Health services researchers spend a substantial amount of time performing integration, cleansing, interpretation, and aggregation of raw data from multiple public or private data sources. Often, each researcher (or someone in their team) duplicates this effort for their own project, facing the same challenges and experiencing the same pitfalls discovered by those before them.
This paper described a design process for creating a data warehouse that includes the most frequently used databases in health services research.
The design is based on a conceptual iterative process model framework that utilizes the sociotechnical systems theory approach and includes the capacity for subsequent updates of the existing data sources and the addition of new ones. We introduce the theory and the framework and then explain how they are used to inform the methodology of this study.
The application of the iterative process model to the design research process of problem identification and solution design for the Healthcare Research and Analytics Data Infrastructure Solution (HRADIS) is described. Each phase of the iterative model produced end products to inform the implementation of HRADIS. The analysis phase produced the problem statement and requirements documents. The projection phase produced a list of tasks and goals for the <i>ideal</i> system. Finally, the synthesis phase provided the process for a plan to implement HRADIS. HRADIS structures and integrates data dictionaries provided by the data sources, allowing the creation of dimensions and measures for a multidimensional business intelligence system. We discuss how HRADIS is complemented with a set of data mining, analytics, and visualization tools to enable researchers to more efficiently apply multiple methods to a given research project. HRADIS also includes a built-in security and account management framework for data governance purposes to ensure customized authorization depending on user roles and parts of the data the roles are authorized to access.
To address existing inefficiencies during the obtaining, extracting, preprocessing, cleansing, and filtering stages of data processing in health services research, we envision HRADIS as a full-service data warehouse integrating frequently used data sources, processes, and methods along with a variety of data analytics and visualization tools. This paper presents the application of the iterative process model to build such a solution. It also includes a discussion on several prominent issues, lessons learned, reflections and recommendations, and future considerations, as this model was applied.