Improving Access to Clean, Safe Drinking Water in Tanzania
Motivation
Tanzania is recognized by the UN as a least developed country, or LDC1. LDCs are, in part, defined by the UN as “low-income countries confronting severe structural impediments to sustainable development”2. One critical impediment the country has struggled with for decades has been supplying and maintaining access to clean and safe drinking water for its population. In 2017, a reported 24.8 million citizens lacked access to ‘at least basic’ water3. For those fortunate enough to have access to this life-essential resource at any given time, it is never gauranteed.
Figure 0 | Map of Pump Locations
In most rural parts of the country, much of the potable water is accessed via water pumps that draw water from local sources. Around 60,000 hand-pumps are installed in sub-Saharan Africa every year, but between 30 to 40% of all pumps do not function at any given time4! Figure 0 above shows an example of the distribution of functioning and non-functioning pumps throughout Tanzania. When pumps no longer function, they are often abandoned. These non-functioning pumps have realized an estimated loss of $1.2 billion USD over the last 2 decades4. The available number of functioning pumps is thus always in a state of flux. Local and national organizations and governments must constantly install new pumps to replace broken down ones.
The first step in addressing problematic pumps is identifying their functional status. Once the functional status of a pump is known, the appropriate resources can be deployed to target sites in need of repair or replacement. Unfortunately, there is currently no efficient or accurate system to tackle this monumental problem. The status of a pump must be physically checked but local governments and organizations battling the water crisis simply do not possess enough time, resources, or man-power to check the millions of pumps distributed throughout the country. Alternative methods to identifying the functional status of these water pumps is therefore critical to ensuring access to clean drinking water and is the focus of this work.
The problem statement can be phrased as follows. Data exists for tens of thousands of pumps which are assumed to be functional, but their true status is unknown. Rather than physically checking every pump to determine their true functional status, can we predict with some accuracy better than random, which pumps will be functional and which will be non-functional? The desired out come is to improve on the current state by allocating resources for pump maintenance more efficiently, reducing pump downtime, and ensuring basic water access for tens of millions of Tanzanians.
The Data
Taarifa is an open-source platform for crowd-sourced reporting and triaging of infrastructure related issues. Together with the Tanzanian Ministry of Water, data has been collected for thousands of water pumps throughout Tanzania. The data is hosted as part of a competition by DrivenData.org:
https://www.drivendata.org/competitions/7/pump-it-up-data-mining-the-water-table/page/23/
EDA
59,400 unique water pumps described by 39 features comprise the dataset. Each water pump is represented exactly once in the dataset and thus the shape of the training data is (59,400, 39). Figure 1 below illustrates the large class imbalance with the following breakdown: 54.3% - functional
, 7.3% - functional but require repair
, 38.4% - non functional
. This stratification had to be taken into account when creating train/test splits.
Figure 1 | Pump Status Classes
What is the most common type of pump?
The top 10 types of pumps are shown in Table 1. The majority of pumps use gravity as the pumping mechanism. Nira and Tanira pumps are the most popular model of handpumps represented in this dataset. However, the india mark ii is the world’s most widely used water handpump5. The majority of these hand pumps are designed to serve communities of around 300 people.
Table 1 | Pump Count by Pump Type
extraction_type | Pump Count |
---|---|
gravity | 26780 |
nira/tanira | 8154 |
other | 6430 |
submersible | 4764 |
swn 80 | 3670 |
mono | 2865 |
india mark ii | 2400 |
afridev | 1770 |
ksb | 1415 |
other - rope pump | 451 |
Which type of pump has the highest number of non functional pumps?
Table 2 shows that the number of non functional pumps within each pump type does not follow the same trend as the total number of pumps. The pump type with the highest number of non functional pumps is, not surprisingly, gravity
. The second however is other
with 5,195 non functional pumps. other
is likely a catch-all term for any pump that didn’t fit into any of the other pump types, in which case the exact pump type is unclear. Interestingly, even though there are 1500+ more nira/tanira
pumps than other
, other
contains over 3,000 more non functional pumps than nira/tanira
. The final column in the table shows the % of non functional pumps for that pump type. Although nira/tanira
has the 3rd highest total of non functional pumps, proportionately it has the least amount of non functional pumps with only 25.7% of all nira/tanira
pumps being non functinoal. Compare that to the mono
pump which has 57.7% of all pumps being non functional! It will be interesting to see which pump type the model is best able to classify.
Table 2 | Highest Non Functional Pump Counts
extraction_type | functional | functional needs repair | non functional | % non functional |
---|---|---|---|---|
gravity | 16048 | 2701 | 8031 | 30.0% |
other | 1029 | 206 | 5195 | 80.8% |
nira/tanira | 421 | 641 | 2092 | 25.7% |
submersible | 2626 | 227 | 1911 | 40.1% |
mono | 1082 | 129 | 1654 | 57.7% |
Which type of pump serves the highest average number of people?
The average population for each pump type is summarized in Table 3. This factor should be directly related to how long a pump will remain functional since wear from repated use is usually the cause of a pump breaking down. These questions can help drive design of new features for the dataset to improve the model further. This will be explored in future work.
Table 3 | Popuation Per Pump Type
extraction_type | Population per pump |
---|---|
windmill | 408 |
india mark iii | 378 |
other - play pump | 343 |
submersible | 324 |
india mark ii | 298 |
afridev | 250 |
other - rope pump | 214 |
other | 210 |
swn 80 | 206 |
gravity | 148 |
Data Wrangling
There are a total of 7 feature columns with missing data which have to be dealt with. Luckily, all 7 of these feature columns are categorical of object
pandas datatypes: none of the numerical columns have missing values. The missing categorical values are dealth with automatically when converting those features into dummy variables. Dummy variable conversion is necessary when working with categorical text data since training any model requires numerical datatypes. This method changes each unique value in the categorical feature column into its own binary column. An example is shown below where all missing values for the waterpoint_type_group
feature are combined into a single binary feature column waterpoint_type_group_nan
Table 4 | Sample of Categorical Dummy Variables
id | date_recorded | funder_0 | ... | waterpoint_type_group_nan |
---|---|---|---|---|
69572 | 2011-03-14 | 0 | ... | 0 |
8776 | 2013-03-06 | 0 | ... | 0 |
34310 | 2013-02-25 | 0 | ... | 0 |
67743 | 2013-01-28 | 0 | ... | 0 |
19728 | 2011-07-13 | 0 | ... | 0 |
Some of the feature columns are either duplicates or derived from other feature columns and getting rid of these will help reduce the total number of features. The features that were dropped from the dataset were:
recorded_by
payment_type
quality_group
quantity_group
I introduced one new engineered feature by computing the number of years between the pump’s year of installation and the year that the datapoint was recorded. This is another feature I think will have good correlation to the predicted class since the build materials for the pump will degrade with time and weathering.
Results
To evaluate the performance of my models, I chose the precision, recall, and f1-score metrics. These are defined as follows:
\[Precision = \frac{True Positive}{(True Positive + False Positive)}\] \[Recall = \frac{True Positive}{(True Positive + False Negative)}\] \[F1 = \frac{2 * (Precision * Recall)}{(Precision + Recall)}\]Precisiona and recall are favored over the true positive and true negative rates when there is a significant class imbalance, as in this case. The F1-score is the harmonic mean of precision and recall and gives us the ability compare a single metric from one model to another.
Two models are compared in this work, a logistic regression model and a random forest model. A Bayesion optimization algorithm is used to tune the hyperparameters of both models. The comparison is done between the optimized versions of each model. Logistic Regression models are popular for many classification tasks due to their simplicity in both implementation and interpretation. The only hyperparameter tuned in the logistic regression model is the regularization parameter C. The results are summarized in Table 5 below.
Table 5 | Results of Optimized Logistic Regression Model
metric | functional | functional needs repair | non functional |
---|---|---|---|
precision | 0.721376 | 0.154448 | 0.638251 |
recall | 0.561340 | 0.526316 | 0.548136 |
f1-score | 0.631375 | 0.238815 | 0.589771 |
classification_rate | 0.553770 |
Precision, recall, and f1-score are calculated for each class in a 1 vs all fashion. The results are drastically different between classes. The precision was highest for the functional
class (0.72) and lowest for the functional needs repair
class (0.15). This isn’t a surprise since the dataset has 32,000 functional
samples to train on and only 4,300 samples of the functional needs repair
class. In addition to the large class imbalance, the functional needs repair
class may simply be intrinsically harder to predict given the same set of predictor variables.
The recall scores are very similar between all 3 classes. The results imply that the model varies widely in the number of false positives between classes but is relatively equal in the number of false negatives between classes.
The classification rate is simply defined as the overall accuracy of the model and is what the DrivenData competition uses to score submissions. The classification rate for this model is 55.4%. This translates to an average of 55.4% accuracy in predicting each class correctly.
Table 6 | Results of Optimized Random Forest Model
metric | functional | functional needs repair | non functional |
---|---|---|---|
precision | 0.813756 | 0.576444 | 0.841199 |
recall | 0.891874 | 0.357193 | 0.786270 |
f1-score | 0.851026 | 0.441075 | 0.812808 |
classification_rate | 0.812825 |
The results for the optimized random forest model are summarized in Table 6. The random forest model clearly outperforms the logistic regression model. The overall classification rate went from 55.4% to 81.3%! Precision and recall improved significantly for the functional
and non functional
classes. Precision also improved for the funtional needs repair
class, but at the expense of a lower recall. The overall f1-scores for each class improved over the logistic regression f1-scores.
Figure 2 | Precision-Recall for Non Functional Class
The precision-recall curves for the non functional
class are shown in Figure 2. To decide where along the curve we want to operate, we need to consider the business problem we’re trying to solve and the consequences of prioritizing precision over recall and vice versa. Having a high precision means we minimize false positives. In the case of the non functional
class, false positives are misclassifications of functional
pumps as being non functional
. The consequence of misclassifying a functional
pump is that we waste resources in deploying materials and personnel to the pump. However, the underlying assumption here is that there is currently no way to know a pump’s functional status without actually deploying someone to physically check it, in which case we would have to physically check the status of every pump anyway. So if we can reduce the number of pumps we have to check, we are already better than the baseline.
To maximize recall, false negatives must be minimized. False negatives in the case of the non functional
class occur when a non functional
pump is classified as either functional
or functional needs repair
. Both of these misclassifications implies the pump is still working and is not in immediate need of replacement. If a non functional
pump is misclassified, no effort will be made to replace the broken down pump and hundreds of people can go without that water source for weeks or even months. They would likely be forced to find potentially farther or less sanitary sources of water. On the otherhand, the consequences of misclassifying a functional
or functional needs repair
are not critical to human life. For these reasons I would argue that the critical metric we want to maximize is the recall of the non functional
class.
A Case Study
Ultimately, the Tanzanian Water of Ministry and local organizations employing this research will have to decide how they want to balance the precision and recall of this model. Assume we want to maintain a recall of 95%. This means that for every 100 non functional
pumps, we will only misclassify 5 of them. From Figure 2 we can see that having a 95% recall corresponds to a 50% precision. Let’s use these figures along with the actual number of functional
and non functional
pumps in the dataset to come up with a real-world case study.
In the full training dataset, there are 22,824 non functional
pumps out of a total of 59,400 pumps. With a recall of 95%, we would correctly classify 21,683 pumps and only misclassify 1,195 pumps. Since the precision will be locked to 50% for our chosen recall, we would have to deploy personnel and resources to double the total number of non functional
pumps, or 45,648 pumps. Compare this to the current state in which the status of a pump can only be known by physically checking it. Assuming the chance of encountering a non functional
pump is equal to the proportion of non functional
pumps, 56,466 out of the 59,400 pumps would have to be physically checked to match the number of non functional
pumps that this model correctly identifies. The model would thus reduce the number of pumps that have to be checked by 10,818 in this sample, which corresponds to a 19.2% savings in time, money, and manpower!
Savings Calculator
The proportion of non functional
pumps is assumed to be more or less constant across Tanzania and more broadly across sub-saharan Africa, as reported in the literature4. I’ve developed a simple excel sheet calculator for determining the savings this model can offer over the current state of needing to survey every pump’s functional status. The savings_calculator.xls
can be accessed from the home repository here.
Figure 3 | Savings Calculator
An example of the calculator is shown in Figure 3. Assuming an organization is responsible for 10,000 pumps and there is an average cost of $50 to survey each pump (employee compensation, fuel, etc.), the organization could realize a savings of $96,000. The average price of a new Afridev or India Mark II pump is between $1,427-$1,585. The average cost of a new pump deployment including labor, digging, etc., is $3,8006. At this rate, the savings of $96,000 could be used to deploy 25 brand new pumps. This would be enough to support 7,500 people3!
Conclusions
A random forest classification model was used to predict the functional status of water pumps in Tanzania. A Bayesion optimization algorithm was used to tune the hyperparameters of the model. A precision of 0.84 and a recall of 0.79 was achieved on a test dataset, while the overall classification rate was 0.81. The model offers average savings in resources of 19.2% over the current state. This savings could be used to deploy additional pump replacements in places with critical need. Finally, a simple calculator is offered to help organization involved in tackling the water crisis to estimate the savings they could realize if employing the model presented in this work.
There are several next steps to explore for this project. The first is to engineer new features based on the questions answered in the EDA section. The second is to deep dive the misclassified cases for the non functional class and understand why the model has a hard time predicting those particular pumps. Finally, an ensemble of several models may aid the overall classification rate by allowing the models to compensate each other’s weaknesses.
References
- https://www.un.org/development/desa/dpad/wp-content/uploads/sites/45/publication/ldc_list.pdf
- https://www.un.org/development/desa/dpad/least-developed-country-category.html
- https://washwatch.org/en/countries/tanzania/summary/statistics/
- https://www.theguardian.com/global-development-professionals-network/2016/mar/22/how-do-you-solve-a-problem-like-a-broken-water-pump
- https://en.wikipedia.org/wiki/India_Mark_II
- http://www.ropepumps.org/uploads/2/9/9/2/29929105/rope_pump_-_piston_pumps._comp._study_tz.pdf
Leave a Comment