We often hear about the scarcity of health data coming from low resource countries. However, in the last decade, we at Bluesquare have experienced a significant increase in the volume of data being collected through the expansion of software solutions such as DHIS2.
However, with the development of data transmission through multiple communication networks, it can become hard to keep track of all the data being collected by public facilities, private actors, NGOs running campaigns, research projects or logistics systems. It is even harder to mobilize these various sources of data to create meaningful analysis and reports that can help guide public health decisions.
To help solve this problem, Bluesquare is proud to introduce OpenHexa, our open-source data integration platform targeted at health data professionals.
Solving the key challenges of data processing : Exploration, Extraction, Visualisation
OpenHexa is focused on four key capabilities:
- Data exploration through the data catalog component, which allows users to browse, search and discuss data from different sources (S3 buckets, PostgreSQL databases, DHIS2 instances…). This will make possible the synchronization of data from different sources and formats.
- Collaborative interactive computing through the notebooks component, based on Jupyterhub, where users can create and share notebooks. This will allow software users to easily share their ideas and advancements across teams.
- Automated data extraction and transformation using the data pipelines component, which will greatly speed up the extraction processes.
- Data visualization through the visualization component, which provides an easy way to use OpenHexa data in different data visualization and business intelligence tools.
OpenHexa also provides powerful access control features, allowing you to make sure that every user can only see and work on the data he has been authorized to use.
A user oriented open source platform
By developing OpenHexa as an open-source platform, we are tackling two challenges:
- The lack of open-source integrated data platforms: there are many high-quality open-source software tools for data visualization, analysis or automation, but integrated platforms that combine the different aspects of data science tend to be proprietary, expensive, and plagued by opaque pricing structures.
- The brittleness of data workflows in public health projects: when working with health data, experts are often confronted with heterogeneous data, tasks that require manual operations, and siloed information, leading to analyses that are disorganized and difficult to reproduce .
OpenHexa offers a novel solution that is both integrated and 100% open-source (codebase available on Github). We can host the platform for you in our cloud infrastructure, or you can deploy it yourself on any cloud provider – even in your own infrastructure.
Our aim with developing this platform is to offer our partners and future partners a tool that enables programs and managers to automate processes that are often manual, time consuming and error-prone. We had at heart to make it open source and to have a user friendly interface to allow a great number of administrators and projects to benefit from it.
Are you as excited about the launch of Openhexa as we are? Participate in the development of the source code on Github or request a demo session with one of our experts.
Use case: Improving the national surveillance system for infectious diseases
While not limited to health-centric workflows, our platform has been developed with health data as the primary use case.
OpenHexa offers an ideal environment for local universities, analytical units within Ministries of Health, Institutes of Public Health, National Statistical Institutes or international partners to implement a wide variety of data analysis, at national or sub-national level.
As an example, let’s consider an epidemiologist who is interested in improving the national surveillance system for infectious diseases. This system relies on weekly data collection on about 20 diseases collated at the district level. Each week, data is aggregated manually by a data manager in a provincial bureau and sent to the national surveillance team at the MOH. Then analysts from the national surveillance team evaluate the data using Microsoft Excel and try to identify outbreaks.
How can OpenHexa improve this workflow?
- Using the data pipelines component, automated extraction pipelines are implemented to ensure that up-to-date data is consolidated every week. Current use cases include data coming from DHIS2, Excel systems, EpiData collection systems, Access databases
- Within the notebooks component, a data scientist develops an outbreak detection algorithm in collaboration with national experts and academic teams and colleagues within and outside the country
- The data scientist can share the outbreak notebook with their colleagues and local experts
- The outbreak algorithm is then deployed as a data transformation pipeline and scheduled to run every week using the latest data
- Using a third-party visualization tool (such as Tableau or PowerBI) connected to OpenHexa, a data visualization expert creates a dashboard to visualize the outbreak data
- A local monitoring and evaluation team is trained to use the data integration platform, operate the outbreak detection code, and evaluate the data visualized in the dashboard, and oversees regular updates to the surveillance system. They can zoom on specific zones, compare outbreak cinetics from different years, articulate various relevant data series.
Thanks to OpenHexa, we have moved from a manual, error-prone process to an automated and reproducible solution which allows in-country teams to generate better insights into district-level epidemiological trends.