Health Facility Registry Matching
Good information is a key input to successful strategy. If you see a thunderstorm in the forecast, chances are you will rethink your trip to the beach. If you’re traveling somewhere you’ve never been before, you would probably use a map to figure out the best route. Reliable, accessible data allows everyone to make the best decisions that they can. The same is true at the organizational level: governments, businesses, and nonprofits rely on what they know to manage their operations and inform their future plans. Unfortunately data in the global health sector are frequently scarce or fragmented. Bluesquare’s novel approaches to compiling and synthesizing that data helps to create an information environment where decision-making is better informed and resource allocation more efficient.
Of the nearly 18,000 health facilities in the Democratic Republic of the Congo’s national health information system (SNIS), only about 35% have a GPS coordinate point. GPS coordinates are key to measure population access to services and products. They help MoH and NGOs better estimate a populations’ needs and better managing stocks. The SNIS is not the only source of data on the DRC’s health infrastructure, however. There is substantial information collected by NGOs, academics, and aid organizations working in the country. Combined with existing SNIS data, it can significantly expand the information we have about the location of Congolese health facilities. For this work we used 24 sets of health facility data, containing a total of 73,190 individual points.
Combining divergent data sources
There are three key challenges to combining these divergent data sources. The first is that the data is stored separately and organized in different ways. This makes integration a complicated and time-consuming process. Enters Iaso, Bluesquare’s geospatial data management platform. Iaso allows easy importing of data, can store as many different sources as needed, and lets users ‘link’ data points from different sources that represent the same clinic or hospital. Crucially, Iaso stores facility data in a consistent format that makes working with them straightforward.
Once all of the data are in Iaso, we face challenge number two: the inconsistency of facility names across datasets. While we can deduce that Mutiene Poste de Santé and Ps Mutien are the same facility, computers cannot. The difference in spelling and inconsistency in how the type of facility is specified makes it impossible for a computer to merge these.
To address this issue we use a text-matching approach known as “the love machine”. Simply put, for a given facility from an outside source, we find the facility in the SNIS with the closest name. We then search the outside source for the facility whose name is closest to the name from SNIS. If we find the original facility name, we consider those two facilities a match and add the outside source’s coordinate data to our merged dataset. Where possible, sources’ geographic hierarchy are used to restrict matches to the correct province or zone de santé.
Synthetizing Geospatial data
The set of synthesized geodata from the sources matched by our love machine approach presents challenge number three. Namely, with multiple coordinate points for a facility, how do we determine that facility’s most likely location? Here we implement a GPS selection algorithm that takes into account the number of coordinate points available, their relation to the health zone that contains the facility, and how they are clustered to determine outliers and select the location closest to all of the valid points.
For example, the health facility shown in the map on the left, Bashimikie Centre de Santé in Lomami, has three points in the health zone (red, blue, and green). However, one is noticeably further away from the other two. The algorithm recognizes this, classifies the red point as an outlier, and takes the midpoint of the other two points (yellow) as the new ‘best’ coordinate location.
The facility on the right has 4 points considered to be within an acceptable distance from the health zone. However after computing each point’s distance to the midpoint of the others, the algorithm considers the green point in the upper left corner to be an outlier and selects as the ‘best’ point the midpoint of the three others, represented by the yellow dot
Our data synthesis approach increases the percent of facilities in DRC with coordinate locations from 35% to 73%, more than doubling the location data contained in the SNIS. The gains were not uniformly distributed – unsurprisingly the biggest improvements came from areas where aid organizations and academics have been most active (and thus we have the most data), such as the provinces of Kasai Central, Kasai Oriental, and Tanganyika.
Improving our knowledge about countries health infrastructures
Although we hope to have humans verify the matching process in the near future, quantitative analysis of the results suggests that the quality of the synthesized data is quite good. The histogram above shows that most facilities have two or more GPS data points contributing to its identified location. Furthermore, the average point that our GPS selection algorithm chooses is just 2 kilometers from the points identified in the matching process.
Using Iaso, our geodata management platform, as a backbone, we were able to combine data from the DRC Ministry of Health and third party aid and academic organizations using a text matching and GPS selection approach to increase the share of health facilities in the national health information system with location information by more than 100%. This work highlights how the platform can be used to compile and synthesize data from different sources to make substantial improvements in how much we know about the country’s health infrastructure. Better information in the hands of international actors and national policy-makers can make operations more efficient, strategy more effective, and improve the health of the Congolese people.