Health data: a complex landscape
Understanding health issues often requires combining complex and heterogeneous data sources, even in the context of single-country interventions. Data can come from HMIS platforms such as DHIS2, from individual tracking systems, from custom software built to address specific issues, or from various Excel reports provided by health experts.
Having such diverse data in disconnected silos is often the biggest obstacle to an efficient exploration and analysis process. It also makes collaboration difficult, and many data analysts working on health data end up developing ad-hoc scripts and visualisations on their own laptops and communicating their results in scattered publications from which it is hard to get unified insights.
Breaking down the silos: the Bluesquare data integration platform
To address this issue, BlueSquare has built a cloud-based data integration platform consisting of three components: extraction, analysis & visualization. This platform is mostly based on mature open-source technologies, such as the Jupyter ecosystem and allows a lot of flexibility for its users.
Extraction: our platform is able to gather data from different sources:
- Automatic extracts of DHIS2 data, the HMIS of choice in many countries. Using D2D we extract data from DHIS2 and export it in csv in a standard flat format that has already proved useful in many data analysis projects.
- GIS data management. A lot of relevant data in health systems has a strong geospatial dimension, whether it is regarding the geolocation of populations or of health services. With a direct integration of the data platform with IASO, users are able to access different sources to benefit from the best available and updated geolocation data.
- Ad-hoc reports: Other data, in Excel, CSV or JSON format coming from specific actors or from ad-hoc surveys are key to specific thematic analysis. They can be consolidated with DHIS2 data and combined with the GIS data repository.
Once extracted, these data are stored in the project data lake in standard formats (mostly CSV or Geopackage) with which most data analysts are familiar and which are easily usable in Python or R.
Analysis: the core of the platform is a cloud-based computation environment. Based on JupyterHub, it allows different actors to collaborate using Jupyter notebooks:
- Our in-house data science team will take a first pass at the extracted data, cleaning it up and storing the corrected data for future use, offering imputations of key data series based on our experience and knowledge of the origin data systems.
- All project stakeholders can then perform their own analysis work on the raw or cleaned data using their tools of choices (Python/Panda, R, Julia).
Results from analysis can be exported in simple formats that our data science team will facilitate integrating in national dashboards and reports. When needed, the update of specific analysis can be programmed in the platform so that results can be updated when data evolves.
Visualisation: many different solutions can be used to visualise data, and the choice of the tool will depend on the project. Here are a few example scenarios that we can propose:
- Using off-the-shelf tools such as Tableau or PowerBI to visualise the extracted data
- Developing custom dashboards using Dash, Shiny or Voilà
- Jupyter notebooks can be hosted to delineate specific analysis and produce rich reports that can be easily updated and shared with partners and decision makers.
- Re-importing the transformed data to the routine systems such as DHIS2 so that visualisations can be built within these systems
This hosted analysis environment offers multiple benefits:
- To national health systems managers, it gives a simple way to give access to their data to researchers or Monitoring and Evaluation specialists, and to get feedback on the analysis and results obtained. They can build reports mobilising both the routine data and results from analysis produced by national and international research teams, thus benefiting from the expertise of a wide range of actors. It can also help enforce data sharing agreements by providing a single platform to share data.
- For analytical teams, it allows them to connect to updated data coming from a variety of data sources, so they can concentrate on their expertise rather than on acquiring and formatting data. It also allows them to easily see updates in their results when routine data is updated, and thus breaks the rigidity of the publication cycle that is too narrow for policy oriented data use.
Our data platform is currently being deployed in four countries, in order to:
- Combine different data sources to help with the COVID-19 Response
- Aggregate GIS data to build a consolidated Health Facility Registry
- Support strategic dashboards for Reproductive, Maternal, Adolescent and Child health in support to the GFF platform
- Support strategic dashboards for malaria management and elimination
- Monitor indirect impact of COVID-19 on other health services
Get in touch
Our data experts can set up an integrated data platform that will combine your various data sources and help you produce the reports you need for decision-making.
Contact our Data Science lead Grégoire Lurton at email@example.com