Healh Facility Registry Matching

Good information is a key input to successful strategy. If you saw that there’s a thunderstorm in the forecast, chances are you would rethink your trip to the beach. If you’re traveling somewhere you’ve never been before, you would probably use a map to figure out the best way to get there. Reliable, accessible data allow individuals to make the best decisions that they can. The same is true at the organizational level – governments, businesses, and nonprofits rely on what they know to manage their operations and inform their future plans. Unfortunately data in the global health sector are frequently scarce or fragmented. Bluesquare’s novel approaches to compiling and synthesizing that data help to create an information environment where decision-making is better informed and resource allocation more efficient.

Of the nearly 18,000 health facilities in the Democratic Republic of the Congo’s national health information system (SNIS), only about 35% have a GPS coordinate point. GPS coordinates are key to measure population access to services and products and help MoH and NGOs better estimating populations’ needs and better managing stocks. The SNIS is not the only source of data on the DRC’s health infrastructure, however. There is substantial information collected by NGOs, academics, and aid organizations working in the country that, combined with existing SNIS data, can significantly expand the information we have about the location of Congolese health facilities. For this work we used 24 sets of health facility data, containing a total of 73,190 individual points. 

Combining divergent data sources

There are three key challenges to combining these divergent data sources. The first is that the data are stored separately and organized in different ways, making integration a complicated and time-consuming process. Enters Iaso, Bluesquare’s geospatial data management platform. Iaso allows easy importing of data, can store as many different sources as needed, and lets users ‘link’ data points from different sources that represent the same clinic or hospital. Crucially, Iaso stores facility data in a consistent format that makes working with them straightforward.

Once all of the data are in Iaso, we face challenge number two: the inconsistency of facility names across datasets. While to humans it is clear that Mutiene Poste de Santé and Ps Mutien are the same facility, the slight difference in spelling and inconsistency in how the type of facility is specified makes it impossible for a computer to naively merge the datasets together. To address this issue we use a text-matching approach affectionately known as “the love machine.” Simply put, for a given facility from an outside source, we find the facility in the SNIS with the closest name. We then search the outside source for the facility whose name is closest to the name from SNIS that we just found. If we find the original facility name, we consider those two facilities a match and add the outside source’s coordinate data to our merged dataset. Where possible, sources’ geographic hierarchy are used to restrict matches to the correct province or zone de santé.

Synthetizing Geospatial data

The set of synthesized geodata from the sources matched by our love machine approach presents challenge number three. Namely, with multiple coordinate points for a facility, how do we determine that facility’s most likely location? Here we implement a GPS selection algorithm that takes into account the number of coordinate points available, their relation to the zone de santé that contains the facility, and how they are clustered to determine outliers and select the location closest to all of the valid points.

For example, the health facility shown in the map on the left, Bashimikie Centre de Santé in Lomami, has three points in the zone de santé (red, blue, and green). However, one is noticeably further away from the other two. The algorithm recognizes this, classifies the red point as an outlier, and takes the midpoint of the other two points (yellow) as the new ‘best’ coordinate location.

The facility on the right (Mambote Clinique in Kinshasa), has 4 points that are considered to be within an acceptable distance from the zone de santé. However after computing each point’s distance to the midpoint of the other points, the algorithm considers the green point in the upper left corner to be an outlier and selects as the ‘best’ point the midpoint of the three others, represented by the yellow dot

Our data synthesis approach increases the percent of facilities in DRC with coordinate locations from 35% to 73%, more than doubling the location data contained in the SNIS. The gains were not uniformly distributed – unsurprisingly the biggest improvements came from areas where aid organizations and academics have been most active (and thus we have the most data), such as the provinces of Kasai Central, Kasai Oriental, and Tanganyika.

Improving our knowledge about countries health infrastructures

Although we hope to have humans verify the matching process in the near future, quantitative analysis of the results suggests that the quality of the synthesized data is quite good. The histogram above shows that most facilities have two or more GPS data points contributing to its identified location. Furthermore, the average point that our GPS selection algorithm chooses is just 2 kilometers from the points identified in the matching process.

Using Iaso, our geodata management platform, as a backbone, we were able to combine data from the DRC Ministry of Health and third party aid and academic organizations using a text matching and GPS selection approach to increase the share of health facilities in the national health information system with location information by more than 100%. This work highlights how the platform can be used to compile and synthesize data from different sources to make substantial improvements in how much we know about the country’s health infrastructure. Better information in the hands of international actors and national policy-makers can make operations more efficient, strategy more effective, and improve the health of the Congolese people.