1 Introduction

Health insurance reduces extreme health costs and out-of-pocket spending by pooling resources. It is an important component towards the attainment of Universal Health Care (UHC) (Dye et al. 2015). The goal of UHC was set by the World Health Organization (WHO) member states in 2005 (World Health Organization 2005). The goal is to assist member countries to achieve UHC through health system financing. UHC has been defined as the provision of the needed quality health services to the whole population with less cost (World Health Organization 2013). In 2015, the General Assembly adopted the 2030 agenda that includes 17 Sustainable Development Goals (SDGs). The SDG goal 3.8 seeks: To attain UHC, with financial risk security, access to quality vital health care services and inexpensive key medicines and vaccines for everyone (United Nations General Assembly 2015).

Countries in Sub-Saharan Africa face many health challenges. These include; low investment in health care, slow economic growth, extensive out-of-pocket expenditure and reduced access to health services (Sambo et al. 2013). To achieve the health related SDG’s and UHC, the regional committee for Africa suggested strategies including; more investment, efficient use of health resources and expand coverage. The objective is to foster efficient and sustainable health financing and achieve these goals. Over the years these countries have prioritized investments towards achieving UHC (Lagomarsino et al. 2012; Cotlear and Rosemberg 2018). This also follows the “Abuja Declaration” of World Health Organization et al. (2010) that set a minimum of 15% of the total government expenditure.

To mention a few countries; Ghana’s National Health Insurance Scheme (NHIS) has been in existence since 2003. The goal is to guarantee fairness and access to health care services by reducing financial barriers to access at the point of use (Kusi et al. 2015). By 2014, over 10.5 million Ghanaians (an estimated 40% of the population) were covered by the NHIS, with inpatient and outpatient visits to health facilities increasing from 0.5 to around 3 per capita between 2005 and 2014 (Wang et al. 2017). A study by Aikins et al. (2021) established that the scheme will likely achieve UHC if protected from political interference and improved accountability. In 2017 Zambia developed the National Health Strategic Plan 2017–2021. The plan outlined strategies to establish a social health insurance scheme (Government of Zambia 2018). This was passed into the National Health Insurance Act 2018 whose goal is to provide reliable health system financing and universal access to health services. Under the NHI, all eligible citizens contribute to the pool of resources in addition to external funding. Households classified as poor by measuring absolute poverty based on monthly consumption expenditures are exempted from contributing (Government of Zambia 2018).

Kenya is a lower-middle-income country with a population of 47.5 million, 12.2 million households and an average household size of 3.9. In Kenya, 75.1% are below 35 years and 32.73 million (68.9%) live in rural areas (Kenya National Bureau of Statistics 2019). Approximately \(83\%\) of Kenyans do not have financial protection against extreme health care costs. Around 1.5 million become poor due to health care costs (Ministry of Health Kenya 2014; Okungu et al. 2017). As outlined in Vision 2030 Kenya seeks to achieve UHC by 2030. Towards this goal, several strategies have been implemented. To start with, the government piloted UHC in four out of 47 counties in Kenya (Isiolo, Kisumu, Machakos, and Nyeri). These counties were selected because they have a high incidence of both communicable and non-communicable diseases, maternal mortality, and road traffic injuries. Results from this pilot showed great success, however, not sustainable by the government capitation (Ministry of Health Kenya 2018). Secondly, the government abolished charges in public hospitals and health care. It also introduced free maternity services in all health care facilities (Maina and Kirigia 2015). Further, it expanded the National Health Insurance Fund (NHIF) package from an inpatient-only package to outpatient services (Mwaura et al. 2015).

Some studies on health insurance in Kenya include; Kazungu and Barasa (2017), where they examined the levels, inequalities (where households are categorized into five socio-economic quintiles) and factors associated with health insurance coverage in Kenya. They analyzed data from the Kenya Demographic and Health Survey (KDHS) 2009 and 2014. Results show health insurance coverage in Kenya remains low and show high inequalities. Otieno et al. (2019) carried out a study to determine the prevalence of health insurance and associated factors among households in urban slum settings in Nairobi, Kenya. They used cross-sectional data of adults aged 18 years or older from randomly selected households in Viwandani slums (Nairobi, Kenya). The study was conducted between June and July 2018. Their findings show that the prevalence of health insurance in the sample was 43%. Health insurance coverage in Viwandani slums in Nairobi, Kenya, is low.

The KDHS 2014 was a national survey. It was designed to provide reliable direct estimates at the national level only and county estimates for some selected indicators (health insurance not included). The direct estimators (they rely only on the survey data) are approximately designed unbiased and consistent (Pfeffermann 2013). However, direct estimators generally have large variances and estimates are unreliable when the sample sizes are small (Rao and Molina 2015) – for example at the county level in Kenya. In contrast, model-based small area methods produce more reliable estimates in terms of smaller mean squared error and coefficient of variation (Tzavidis et al. 2018). This is because they combine survey and census/administrative data through a model and therefore increase the effective sample size. For more theory on small area estimation, we refer the reader to Ghosh and Rao (1994); Pfeffermann (2002, 2013); Jiang and Lahiri (2006); Rao and Molina (2015); Pratesi (2016); Tzavidis et al. (2018) and Morales et al. (2021).

For this study, therefore, we rely on SAE to better estimate the proportion of persons with health insurance for Kenyan counties. The health insurance status of a person is binary. Some approaches have been proposed to estimating binary variables in the small area context. To mention a few Bayesian approaches; Hierarchical Bayes of Malec et al. (1997); Nandram et al. (1999); Liu et al. (2007) and Empirical Bayes of MacGibbon and Tomberlin (1987); Farrell et al. (1997) and Ghosh et al. (1998). For frequentist approaches Jiang and Lahiri (2001) proposed empirical best predictor (EBP) for a binary response. Chambers et al. (2016) outlines the use of a binary logistic generalized linear mixed model (GLMM) in SAE. However, they note that GLMM based on maximum likelihood is influenced by outliers. M‑quantile SAE model provides a robust alternative to GLMM’s.

The rest of this paper is organized as follows. We describe the KDHS 2014 and the Kenya Population and Housing Census (KPHC) 2009 in Sect. 2. In Sect. 3, we outline the statistical methodology applied in this paper. In particular, the direct estimation and the binary M‑quantile small area model estimation, hereafter called the MQ model, are examined by means of the point estimation and the mean squared error. In Sect. 4, we present the results of the application to estimate health insurance coverage for Kenyan counties. Lastly, in Sect. 5, we give the concluding remarks based on the findings, some possibilities for further research as well limitations.

2 Data description

In this section, we describe the data sources used in this paper. We had access to KDHS 2014 and KPHC 2009. The links to the data sources are provided at the end of this paper. We assume that the functional relation between having health insurance and auxiliary data remains constant between the survey and census time.

2.1 The Kenya Demographic and Health Survey

The Demographic and Health Survey (DHS) collects, analyzes and disseminates data on population, health, HIV and nutrition in over 90 countries (Croft et al. 2018). In this study, we had access to the Kenya Demographic and Health Survey (KDHS) done in 2014. The KDHS has been conducted in Kenya after every 5 years i.e. 1989, 1993, 1998, 2003, 2008–2009 and 2014. The 2014 KDHS collected several data on household characteristics, education and employment, and health-related indicators such as HIV and child health survival (Kenya National Bureau of Statistics 2015).

The 2014 KDHS sample was drawn from a sample master called the Fifth National Sample Survey and Evaluation Program (NASSEP V). The Kenya National Bureau of Statistics (KNBS) currently uses this framework to conduct household surveys in Kenya. It includes \(5{,}360\) clusters derived from the 2009 Kenya Population and Housing Census (Kenya National Bureau of Statistics 2015). The framework has a total of \(96{,}251\) enumeration areas (EA’s). The KDHS 2014 sought to create representative estimates for the majority of survey variables at the national level, for individual urban and rural regions, for regional (formerly provincial) levels, and selected indicators at the county level. To meet these objectives, the sample was designed to comprise 40,300 households from 1,612 clusters spread across the country, with 995 clusters in rural areas and 617 clusters in urban areas. Samples were selected separately in each sampling stratum using a two-stage sample approach. In the first stage, the \(1{,}612\) EA’s were chosen with equal probability from the NASSEP V frame. The properties from the listing operations served as the sampling frame for the second round of selection, which included selecting 25 households from each cluster (Kenya National Bureau of Statistics 2015).

Three main questionnaires were used in the KDHS; (i) A household questionnaire, (ii) A questionnaire for women aged 15 to 49, and (iii) A questionnaire for men aged 15 to 54. They were based on model questionnaires designed for the DHS program, as well as questionnaires used in earlier KDHS surveys and Kenya’s current information needs. During the questionnaire development process, input was sought from relevant stakeholders and data users. Producing county-level estimates necessitated gathering data from a large number of families within each county, resulting in a significant rise in sample size from around 10,000 homes in the 2008–09 KDHS to 40,300 households in 2014 (Kenya National Bureau of Statistics 2015).

A total of 39,679 households were selected in the sample, of which 36,812 were found occupied at the time of the fieldwork. Of these, 36,430 households were successfully interviewed, yielding an overall household response rate of \(99\%\). The shortfall of households occupied was primarily due to structures that were found to be vacant or destroyed and households that were absent for an extended time. Among the households selected using the full questionnaires, a total of 15,317 women were identified as eligible for the full women’s questionnaire, of whom 14,741 were interviewed, generating a response rate of \(96\%\) (Kenya National Bureau of Statistics 2015). A total of 14,217 men were identified as eligible in these households, of whom 12,819 were successfully interviewed, generating a response rate of \(90\%\). For this application, we use only complete cases for our variable of interest giving a total sample of 12,007 men and 14,730 women. County-specific samples sizes for women and men are summarized in Table 1. The women sample sizes are higher than for men because most indicators in the DHS (fertility, maternal mortality rate, infant mortality rate and neonatal mortality rate) relates to children and women.

Table 1 Summary of sample sizes for women and men over counties in the Kenya Demographic and Health Survey 2014

2.1.1 Socio-demographic characteristics

Table 2 are socio-demographic characteristics of respondents in the survey. It included women (aged 15 to 49 years) and men (aged 15 to 54 years.) For comparison purposes, we selected men aged 15 to 49 years. The majority of the respondents are between ages 15 to 34 years. Kenya is composed majorly of a youthful population. According to the 2019 Kenyan census the median age is approximately 20 years. The survey also inquired whether respondents lived in urban or rural areas. Most women \((63\%)\) live in rural areas while most men \((61\%)\) live in urban areas. The KDHS 2014 was planned to give representative estimates for most of the survey indicators at the national levels. Other characteristics include education level (no education, primary, secondary or higher), wealth index (poorer, poorest, middle, richer and richest) and marital status (never married, married, widowed, separated, divorced). The majority of women and men are either never married or married.

Table 2 Socio-demographic characteristics for women and men in the Kenya Demographic and Health Survey 2014

2.1.2 Direct estimation and type of insurance per wealth quantile

The 2014 KDHS asked respondents if they were covered by any health insurance and, if yes, what type. We first estimated health insurance coverage for the whole country (using KDHS only). We used the R package emdi (Kreutzmann et al. 2019) for direct estimation. Table 3 shows the percentage of women and men age 15–49 covered by health insurance at the national level together with the mean squared error and the coefficient of variation. A small percentage of Kenyans aged 15–49 (18% of women and 21% of men) have health insurance. The mean squared error values are very low, 0.000016 and 0.000027 for women and men respectively. Also, the coefficient of variation is 2.1% and 2.4% for women and men groups. Therefore the estimates are reliable (as expected at the design level of the survey).

Table 3 Estimated proportions (direct estimates) with health insurance, mean squared error and coefficient of variation for women and men at the national level in Kenya using the Kenya Demographic and Health Survey 2014

In this paper, we are interested in estimating health insurance coverage at the county level. Table 4 below is a summary of the coefficient of variations(CV’s) for the direct estimates. The CV’s are quite high reaching values of 63% and 67% for women and men, respectively. Model-based SAE methods that borrow strength from other counties of interest are required to increase the accuracy of the estimation.

Table 4 Descriptive statistics of the coefficients of variation for the distribution of health insurance coverage at the county level in Kenya for the direct estimates

Table 5 shows the type of health insurance for each wealth quantile (built based on household asset data (Kenya National Bureau of Statistics 2015)) category for women and men. Among those covered, the national insurance scheme is the most common type for both genders. Employer-based insurance is the next most common type of insurance. This is because employers are obliged by law to provide insurance to their employees. The trend in insurance coverage varies per wealth quintile, with the richer and richest most covered across all insurance types. Ilinca et al. (2019) also found out that there are significant levels of inequality in access to health services in Kenya across the wealth quintile. Mwenda et al. (2021) established that poor households pay more for health care especially for outpatient services. This is because poor households cannot pay or do not have health insurance hence more out of pocket spending. In the study they also noted that the rich also spent more on outpatient care owing to their financial abilities.

Table 5 Percentages for each type of health insurance per socio-economic quintiles in Kenya from Kenya Demographic and Health Survey 2014

2.2 The Kenya Population and Housing Census

For model-based SAE we need supplemental data collected from all areas. We had access to the Kenya Population and Housing Census (KPHC) 2009 in this case. Kenya has consistently conducted a census every ten years, i.e. 1969, 1979, and so on, with the most recent being in 2019. Under Kenyan legislation, the KNBS is the primary government body in charge of collecting, processing, and disseminating census and other statistical data. Statistics are needed to track the progress of numerous development goals and worldwide initiatives, such as the SDG’s. The main goal of the KPHC 2009 was to offer essential information on the population’s demographic, social, and economic features, as well as housing. These include population size and composition, fertility, mortality and migration rates, levels of education, labour force size, and so on. The data for this census was taken using scanning technology, with technical help from the United States Census Bureau (USCB) (Government of Kenya 2010). This census was conducted based on old administrative areas, such as villages, sub-locations, locations, divisions, districts, and provinces. Kenya had 46 legal districts, minus Nairobi, the capital city, which was the 47th district. After 2010, these districts were changed to the present 47 counties with no changes in borders (Government of Kenya 2013). As a result, we can connect the survey and census data. The census data serve as potential covariates in the small area model described in Sect. 3.4. Table 6 is a summary of population sizes at the county level in Kenya. The census is the \(10\%\) sample, i.e. every 10th household of the whole data set is released by the KNBS (Government of Kenya 2010).

Table 6 Summary of population sizes in Kenya Population and Housing Census 2009 at the county level in Kenya

3 Statistical Methodology

In this section, we outline the methodology applied in this paper. To begin with we describe the direct estimation using the survey data only. We then introduce M‑quantile regression differentiating it from standard mean regression. Next, we give the general small area estimation setting. Thereafter, we discuss the M‑quantile small area model together with the point and mean squared error estimation.

3.1 Direct estimation

The Horvitz-Thompson (HT) estimator of Horvitz and Thompson (1952) is used to estimate the population proportion \(\bar{Y_{i}}\) for area \(i\), \(i=1,2,{\ldots},m\), where \(m\) is the total number of areas of the whole population, from a complex sampling design. Using this estimator, the direct estimator of the target proportion for area \(i\) based on sample data is defined as \(\hat{\bar{y}}_{i}^{\text{{\;dir}}}=\frac{1}{N_{i}}\sum_{j=1}^{n_{i}}w_{ij}\,y_{ij},\,i=1,2,{\ldots},m\), where \(\hat{\bar{y}}_{i}^{\text{{\;dir}}}\) is the direct proportion estimator for area \(i\), \(N_{i}\) is the population size in area \(i\), \(y_{ij}\) is the response of individual/household \(j\) in area \(i\) and \(w_{ij}\) are sampling weights – inverse of first order inclusion probabilities. The weights compensate for unequal probabilities of sampling and unit non-response. The HT-estimator for population proportions is design-unbiased (Särndal et al. 2003). However, the variance reaches high values for areas with small sample sizes. For KDHS 2014, all counties were sampled, although sample sizes in some counties are not sufficient to provide reliable direct estimates as seen from the high values of coefficient of variation (beyond 20% using guidelines set by the UK Office for National Statistics [ONS]).

3.2 Small area estimation

In SAE we assume the following idealized setting: There is a finite population \(U\) of size \(N\) which is divided into \(m\) disjoint areas of sizes \(N_{1},N_{2},{\ldots},N_{m}\) where \(i=1,2,{\ldots},m\) is the \(i\text{th}\) small area. A sample of size \(n\) is taken from this population using a complex sampling design with sample sizes \(n_{1},n_{2},{\ldots},n_{m}\) for each area \(i\). The sampled and non-sampled units will be denoted by \(s\) and \(r\) respectively. Let \(y_{ij}\) be the response variable of interest of individual/household \(j\) in area \(i\) and has been observed for sampled units only; \(\mathbf{x}_{ij}\) denote a \(p\times 1\) vector of unit level covariates with intercept. In general it is assumed that the values of \(\mathbf{x}_{ij}\) are known for all units in the population, as are the values \(\mathbf{z}_{i}\) of a \(q\times 1\) vector of area level covariates. We are interested in using sample values of \(y_{ij}\) and the population values of \(\mathrm{x}_{ij}\) and \(\mathrm{z}_{i}\) to estimate the small area \(i\) proportion of health insurance coverage given by \({\bar{Y}}_{i}=N_{i}^{-1}\sum_{j\in U_{i}}y_{ij}\).

3.3 M-quantile regression

The standard linear regression summaries the average relationship between a continuous response \(y_{i}\) given explanatory variables \(x_{i}\) i.e. \(E[y_{i}|x_{i}]\) where \(i=1,2,{\ldots},n\) is the number of observations. This does not give a complete picture of the conditional distribution of the response variable given the explanatory variables, and we might be interested in other parts of this distribution for example the \(10\text{th}\) percentile. In the same manner, a relationship between the response and the explanatory variables can also be established using the conditional median function instead. The quantile \(q\in(0,1)\) is that \(y\) which splits the data into proportions \(q\) below and \((1-q)\) above such that \(F\left(y_{q}\right)=q\) and \(y_{q}=F^{-1}(q)\). The median has \(q=0.5\). Whereas the mean regression minimizes the squared error, a regression model based on the median (or median regression) minimizes the least absolute deviation (LAD). Median regression is also more robust to outliers than mean regression and no parametric assumption is required. To generalize the mean and median regression, we discuss the expectile and quantile regression. Expectile regression (Newey and Powell 1987) generalizes the mean regression to estimate the expectiles while quantile regression generalizes the median regression to estimate other parts of the conditional distribution (quantiles) of \(y\) given \(x\) (Koenker and Bassett Jr 1978; Koenker and Hallock 2001). M‑quantile regression (Breckling and Chambers 1988) estimates the conditional distribution lying between the quantiles and expectiles. It is an extension of M‑estimation of Huber (1992). The M‑quantile of order \(q\) of a continuous random variable \(y\) with distribution function \(F(y)\) is the value \(Q_{q}\) that satisfies

$$\int\psi_{q}\left(\frac{y-Q_{q}}{\sigma_{q}}\right)dF(y)=0,$$
(1)

where \(\psi_{q}(t)=2\psi(t)\{qI(t> 0)+(1-q)I(t\leqslant 0)\}\), \(\psi\) is an influence function defined by the user and \(\sigma_{q}\) is an appropriate scale measure for the random variable \(Y-Q_{q}\). According to Chambers and Tzavidis (2006) when the response variable \(y\) is binary, there is no obvious definition of a quantile function as in the continuous case in 1. But, given that the influence function \(\psi\) is continuous and monotone non-decreasing, the M‑quantiles of a binary variable exist and are unique. In that case we are interested in predicting \(P(y=1)=p\) which means that 1 becomes

$$pq\psi\left(\frac{1-Q_{q}}{\sigma_{q}}\right)=(1-p)(1-q)\psi\left(\frac{Q_{q}}{\sigma_{q}}\right).$$
(2)

Since \(y\) is binary, following Chambers and Tzavidis (2006), we impose a linear logistic function as

$$Q_{q}\left(\mathbf{x}_{j};\psi\right)=\exp\left(\mathbf{x}_{j}^{\mathrm{T}}\boldsymbol{\beta}_{q}\right)\left\{1+\exp\left(\mathbf{x}_{j}^{\mathrm{T}}\boldsymbol{\beta}_{q}\right)\right\}^{-1},$$
(3)

where \(\boldsymbol{\beta}_{q}\) are regression coefficients estimated using a robust maximum likelihood estimating equations following (Cantoni and Ronchetti 2001).

3.4 M-quantile small area model

Small area estimation mostly uses random-area effects to characterize between area variations beyond that explained by auxiliary variables in the model (Rao and Molina 2015). However, mixed effect models depend on distributional assumptions (for example, the assumption of normally distributed residuals). Further, it requires the specification of the random part of the model. An alternative approach to mixed effect modeling is the use of M‑quantile models in SAE. The M‑quantile model for SAE was proposed by (Chambers and Tzavidis 2006). They model the between-area heterogeneity using M‑quantile coefficients. In this case, the population model is specified and fitted at unit level without specifying any small area geography. First define \(q_{ij}\) such that \(y_{ij}=Q_{q_{ij}}\left(\mathbf{x}_{ij};\psi\right)\), i.e. \(q_{ij}\) is a random index that varies between 0 and 1. Since the response variable is binary, we specify a linear logistic function, where the population M‑quantile model for \(q_{ij}\) (and hence \(y_{ij}\)) is then defined by

$$Q_{q_{ij}}\left(\mathbf{x}_{ij};\psi\right)=\exp\left(\mathbf{x}_{ij}^{\mathrm{T}}\boldsymbol{\beta}_{q_{ij}}\right)\left\{1+\exp\left(\mathbf{x}_{ij}^{\mathrm{T}}\boldsymbol{\beta}_{q_{ij}}\right)\right\}^{-1}.$$
(4)

Chambers and Tzavidis (2006) called \(q_{ij}\) the M‑quantile coefficients. The M‑quantile coefficient for area \(i\) is given by; \(\theta_{i}=E\left[q_{ij}\mid i\right]\), where the expectation is conditional on the distribution of the random indices \(q_{ij}\) within area \(i\).

3.4.1 Point estimation

To estimate the population proportion we proceed as follows. We first note that the empirical value \(\hat{q}_{ij}\) of the random index \(q_{ij}\) is the solution to \(y_{ij}=\hat{Q}_{\hat{q}_{ij}}\left(\mathbf{x}_{ij};\psi\right)\) and this value is referred to as the estimated M‑quantile coefficient of \(y_{ij}\) (Chambers and Tzavidis 2006).

  1. 1.

    Obtain sample observations in area \(i\) using a non-informative sampling method (for example a two-stage cluster sampling design).

  2. 2.

    Derive the Estimate \(\hat{\theta}_{i}\) of the area \(i\) specific M‑quantile coefficient \(\theta_{i}\) as the sample average of the estimated M‑quantile coefficients for that area; otherwise it is set \(\hat{\theta}_{i}=0.5\).

  3. 3.

    Compute the corresponding M‑quantile predictor of the average \(\bar{y}_{i}\) in small area \(i\) as \(\hat{\bar{y}}_{i}^{MQ}=N_{i}^{-1}\left\{\sum_{j\in s_{i}}y_{ij}+\sum_{j\in r_{i}}\hat{Q}_{\hat{\theta}_{i}}\left(\mathbf{x}_{ij};\psi\right)\right\}\). If \(y\) is binary, model the regression M‑quantiles of order \(q\) by 4.

3.4.2 Estimating the mean squared error

In this study, we estimate the mean squared error (MSE) of the proposed estimator using the approach by Chambers et al. (2014) based on the linearisation approach. In essence, the assumption is that the working model used in concluding influences the obtained values from the area under study. Therefore, the MSE of interest is relied upon it and is equal to a conditional prediction variance plus a squared conditional prediction bias. Further, the estimated area level M‑quantile coefficient values are assumed as having some slight variations and can be considered as fixed. According to Chambers et al. (2016), a first order approximation to the conditional prediction variance of \(\hat{\bar{y}}_{i}^{MQ}\) is then

$$\operatorname{Var}\left(\hat{\bar{y}}_{i}^{MQ}-\bar{y}_{i}\mid\theta_{i}\right)=N_{i}^{-2}\left\{\operatorname{Var}\left[\sum_{j\in r_{i}}\hat{Q}_{\theta_{i}}\left(\mathrm{x}_{j};\psi\right)\right]+\sum_{j\in r_{i}}\operatorname{Var}\left(y_{j}\right)\right\}\approx N_{i}^{-2}\left\{\left[\sum_{j\in r_{i}}Q_{\theta_{i}}\left(\mathrm{x}_{j};\psi\right)\mathbf{x}_{j}^{T}\right]\operatorname{Var}\left(\hat{\boldsymbol{\beta}}_{\theta_{i}}\right)\left[\sum_{j\in r_{i}}Q_{\theta_{i}}\left(\mathrm{x}_{j};\psi\right)\mathbf{x}_{j}^{T}\right]^{T}\right.\left.+\sum_{j\in r_{i}}\operatorname{Var}\left(y_{j}\right)\right\},$$
(5)

which can be estimated by

$$\widehat{\operatorname{Var}}\left(\hat{\bar{y}}_{i}^{MQ}\right)=N_{i}^{-2}\left\{\left[\sum_{j\in r_{i}}\hat{Q}_{\hat{\theta}_{i}}\left(\mathrm{x}_{j};\psi\right)\mathbf{x}_{j}^{T}\right]\widehat{\operatorname{Var}}\left(\hat{\boldsymbol{\beta}}_{\hat{\theta}_{i}}\right)\left[\sum_{j\in r_{i}}\hat{Q}_{\hat{\theta}_{i}}\left(\mathrm{x}_{j};\psi\right)\mathbf{x}_{j}^{T}\right]^{T}\right.\left.+\sum_{j\in r_{i}}\widehat{\operatorname{Var}}\left(y_{j}\right)\right\}.$$
(6)

In this case \(\widehat{\operatorname{Var}}\left(\hat{\boldsymbol{\beta}}_{\hat{\theta}_{i}}\right)\) is a sandwich-type estimator. The \(\widehat{\operatorname{Var}}\left(y_{j}\right)\) can be calculated either by using the sample data from area \(i,\widehat{\operatorname{Var}}\left(y_{j}\right)=\hat{\bar{y}}_{i}\left(1-\hat{\bar{y}}_{i}\right)\), or by pooling data from the entire sample, in which case \(\widehat{\operatorname{Var}}\left(y_{j}\right)=\hat{\bar{y}}(1-\hat{\bar{y}})\). According to Chambers et al. (2016) the pooled estimator should lead to more stable prediction variance estimates when area sample sizes are very small and the conditional prediction bias can be approximated using the results of Copas (1988) as

$$E\left(\hat{\bar{y}}_{i}^{MQ}-\bar{y}_{i}\mid\theta_{i}\right)\approx-\frac{1}{2N}\left\{\frac{\partial}{\partial\boldsymbol{\beta}_{\theta_{i}}}\Psi\left(\boldsymbol{\beta}_{\theta_{i}}\right)\right\}^{-1}\left\{\operatorname{tr}\left[\left\{\frac{\partial}{\partial\boldsymbol{\beta}_{\theta_{i}}\partial\boldsymbol{\beta}_{\theta_{i}}^{T}}\Psi\left(\boldsymbol{\beta}_{\theta_{i}}\right)\right\}\operatorname{Var}\left(\hat{\boldsymbol{\beta}}_{\theta_{i}}\right)\right]\right\}\left\{\frac{\partial}{\partial\boldsymbol{\beta}_{\theta}}\sum_{j\in r_{i}}Q_{\theta_{i}}\left(\mathrm{x}_{j};\psi\right)\right\},$$
(7)

with corresponding plug-in estimator

$$\widehat{\operatorname{Bias}}\left(\hat{\bar{y}}_{i}^{MQ}\right)=-\frac{1}{2N}\left\{\left.\frac{\partial}{\partial\boldsymbol{\beta}_{\theta_{i}}}\Psi\left(\boldsymbol{\beta}_{\theta_{i}}\right)\right|_{\boldsymbol{\beta}_{i}=\hat{\boldsymbol{\beta}}_{\hat{\theta}_{i}}}\right\}^{-1}\left\{\operatorname{tr}\left[\left\{\left.\frac{\partial}{\partial\boldsymbol{\beta}_{\theta_{i}}\partial\boldsymbol{\beta}_{\theta_{i}}^{T}}\Psi\left(\boldsymbol{\beta}_{\theta_{i}}\right)\right|_{\boldsymbol{\beta}_{\theta_{i}}=\hat{\boldsymbol{\beta}}_{\hat{\theta}}}\right\}\widehat{\operatorname{Var}}\left(\hat{\boldsymbol{\beta}}_{\hat{\theta}_{i}}\right)\right]\right\}\left\{\left.\frac{\partial}{\partial\boldsymbol{\beta}_{\theta_{i}}}\sum_{j\in r_{i}}Q_{\theta_{i}}\left(\mathbf{x}_{j};\psi\right)\right|_{\boldsymbol{\beta}_{\theta_{i}}=\hat{\boldsymbol{\beta}}_{\hat{\theta}_{i}}}\right\}.$$
(8)

The estimator of the conditional mean squared error of \(\hat{\bar{y}}_{i}^{MQ}\) is then

$$\operatorname{Mse}\left(\hat{\bar{y}}_{i}^{MQ}\right)=\widehat{\operatorname{Var}}\left(\hat{\hat{y}}_{i}^{MQ}\right)+\left\{\widehat{\operatorname{Bias}}\left(\hat{\hat{y}}_{i}^{MQ}\right)\right\}^{2}.$$
(9)

4 Results

In this section we present the results of estimating health insurance coverage in Kenya at the county level. The respondents were asked the question; Are you covered by any health insurance? Therefore the response is a binary variable coded as 0 (No) and 1 (Yes). We first fitted a binary logistic generalized linear mixed model (GLMM) and show that the assumption of normality of random effects is not met.

4.1 Initial analysis using binary logistic GLMM

To begin with we first fitted a binary logistic GLMM with normally distributed random effects using the function glmer in R package lme4 (Bates et al. 2015). Plots a and b from Fig. 1 represent the QQ plots for Pearson residuals obtained from fitting a logistic GLMM for women and men respectively. Within the same fitted model, the random effects for women and men were obtained are also displayed in plots c and d respectively. The Pearson residuals are not normally distributed. Although the random effects show normality especially for the men data, there is a slight departure from the tails. The Shapiro-Wilk normality test using significance level of 0.05 does not reject the null hypothesis that the random effects are normally distributed for men (\(p\text{-value}=0.591\)) but it does for women (\(p\text{-value}=0.00473\)).

Fig. 1
figure 1

QQ-plots for Pearson residuals for a women b men and random effects for c women d men of health insurance coverage at the county level in Kenya

Table 7 shows the estimated model parameters, standard errors and corresponding p‑values for women and men. The fixed effects are age (15–49) years, relationship to household head (1 = head, 2 = spouse, 3 = others), employment status (0 = unemployed, 1 = employed), education level (1 = completed primary, 2 = secondary school and above, 3 = no formal schooling), residence (0 = rural, 1 = urban), marital status (1 = never married, 2 = married, 3 = widowed, 4 = divorced, 5 = seperated), region (1 = Coast, 2 = North Eastern, 3 = Eastern, 4 = Central, 5 = Rift Valley, 6 = Western, 7 = Nyanza, 8 = Nairobi). The regression coefficient of age has a positive sign (for both women and men), implying age increases the probability of access to health insurance. The table also shows the variance component for the random part of the model. To test whether the variance components are significant to measure unobserved heterogeneity we use the Likelihood Ratio Test (LRT). For women, the test statistic is 1,087.248, with a p‑value of 0.0000, and for men equals 1,111.015, \(\textit{p-value}=0.0000\). Therefore we reject the null hypothesis of no significance and conclude there is evidence of significant unobserved heterogeneity.

Table 7 Estimated fixed effects coefficients, variance components for the random effect and likelihood ratio test from fitting a generalized linear mixed model to health insurance data from Kenya

4.2 Binary M-quantile modeling

The diagnostic plots in Sect. 4.1 show that the model assumptions of GLMM are not met. According to Chambers et al. (2016), GLMM’s have attractive properties that can be used to model binary response variables. However, using GLMM’s in SAE is not straightforward since the estimation of model parameters can be numerically demanding. Apart from computational complexity for using GLMM in small area estimation, if model are not met, invalid conclusions could be obtained. To reduce the adverse effects from deviations from distributional assumptions (provide robust inference), while at the same time borrowing strength from domains, we explore the use of M‑quantile small area estimation model. Robust in this case means the estimator is reasonably efficient and unbiased, small deviations from model assumptions do not substantially affect the model performance and large deviations will not totally invalidate the model entirely. Since SAE via M‑quantile uses M‑quantile coefficients as opposed to random effects in GLMM, we find the correlation between the predicted area effects and area-level M‑quantile coefficients. This was suggested by Chambers et al. (2016). The correlation equals 0.769 for women model and 0.77 for men. Fig. 2 visualizes the scatter plots between the predicted random effects from mixed model and M‑quantile model.

Fig. 2
figure 2

Scatter plots for the predicted random effects (estimated with GLMM) and the M‑quantile coefficients (estimated with the M‑quantile model) for a women b men at the county level in Kenya

There is a high correlation between the area level M‑quantile coefficients and the predicted area effects from GLMM to capture area variability. However, according to Chambers et al. (2016) M‑quantiles provides an alternative SAE method when GLMM assumptions are not met. Table 8 shows quantiles of the point estimates of the proportion of persons with health insurance at the county level in Kenya. On average the mean and median for both women and men for direct and MQ estimates are comparable. A higher proportion of men are covered with health insurance compared to women. Overall, these proportions are quite low despite the efforts put by the government. This finding implies the government should explore other better options to increase coverage.

Table 8 Distribution of health insurance coverage proportions over counties in Kenya for women and men aged 15–49 years.

Fig. 3 shows smooth maps of health insurance coverage for the 47 counties in Kenya using M‑quantile estimation. The observed distribution is similar for women and men. Counties with highest coverage for women are Bomet \((24\%)\), Embu \((23.6\%)\), Kirinyaga \((22.8\%)\) and Nairobi \((22\%)\). The five least covered counties are West Pokot \((10.9\%)\), Turkana \((10.8\%)\), Garissa \((10.7\%)\) and Marsabit \((10.3\%)\). For men the leading counties are; Nairobi \((27.2\%)\), Bomet \((24.8\%)\), Nyeri \((23.4\%)\), Nyamira \((22.2\%)\) while the least covered counties are; Mandera \((11\%)\), Tana River \((10.6\%)\), Garissa \((10.6\%)\) and Kwale \((10.4\%)\). From the findings, we note that counties neighboring Nairobi have higher coverage rates. Since these counties are close to Nairobi which is the capital city of Kenya, with more employment opportunities, people living here are able to afford health insurance premiums. This is in contrast to counties further away like Turkana and Garissa. These results have been possible with the use of SAE methodology.

Fig. 3
figure 3

Maps showing the proportion covered with health insurance for a women and b men for the 47 counties in Kenya

4.3 Evaluation of the M-quantile SAE model estimates

We evaluate the model-based results based on three criteria: (i) smaller MSE and CV for MQ compared to direct estimates, where the MSE is the sum of the variance and bias squared of the estimator, while the CV measures the dispersion of the estimates around the mean. (ii) consistency and (iii) usefulness to users. This has been proposed by Brown et al. (2001). The same approach has been used by Chandra et al. (2018) when estimating poverty incidence in the state of Bihar in India.

For MSE, Fig. 4a and b are the box plots of estimated MSEs of the estimated health insurance coverage for women and men. The MSE for both women and men are smaller for MQ compared to direct estimates. For CV’s, Table 9 shows quantiles of the coefficient of variation for the estimated health insurance percentages at county level in Kenya. For direct estimates especially for counties with small samples, the CV’s reach values greater than \(60\%\) for both women and men. For MQ estimates all the CV’s were less than 20. Since Kenya has not set a guideline for publishing official statistics, we use other statistical offices like the UK’s Office for National Statistics (ONS). They set a CV of \(20\%\) for publishable official statistics. Therefore, our estimates meet this cut-off.

Fig. 4
figure 4

Box-plots showing the mean squared error for the distribution of health insurance coverage percentages at the county in Kenya for a women b men

Table 9 Quantiles of the coefficient of variation for the estimated health insurance percentages at county level in Kenya

Fig. 5 are line graphs with counties ordered by increasing sample sizes. To start with as expected the CV’s for MQ estimates are smaller than direct estimates for all counties. As the sample sizes per county increase, the CV’s for direct estimates reduce, especially for the women sample. By contrast, however, the CVs for direct estimates do not reduce with increasing sample size for men. For this analysis, the samples ranged from 236 to 460 for women and from 118 to 370 for men.

Fig. 5
figure 5

Line graphs showing coefficient of variation of health insurance coverage with counties ordered by increasing sample sizes for: a women b men

For bias-diagnostics, Fig. 6 is a scatter plot comparing direct estimates of the proportion of persons covered with health insurance and corresponding M‑quantile estimates. According to Brown et al. (2001) the plots is based on the idea that, if the model-based estimates are “close” to the model-based SAE values of interest, then unbiased direct estimators should behave like random variables whose expected values correspond to the values of the model-based estimates. That is, the model-based estimates should be unbiased predictors of the direct estimates. To check this, we plot appropriately scaled values of these estimates (x-axis) against similarly scaled direct estimates (y-axis) and then test whether the OLS (ordinary least squares) regression line fitted to these points is significantly different from the identity line. To check homoscedasticity assumption required for OLS fitting, we ran the Goldfeld-Quandt test. Under the null hypothesis, the Goldfeld-Quandt test statistic follows an \(F\) distribution with degrees of freedom as specified in the parameter. For men (\(\text{GQ}=0.39191\), \(\text{df1}=22\), \(\text{df2}=21\), \(p\text{-value}=0.983\)), and women (\(\text{GQ}=0.44733\), \(\text{df1}=22\), \(\text{df2}=21\), \(p\text{-value}=0.9662\)) we do not reject the null hypothesis and conclude heteroscedasticity is not present. Therefore, the homoscedasticity assumption is satisfied. We note that the M‑quantile estimates are generally consistent with the direct estimates. Even though the model-based results depict some bias, the aggregated results in Table 10 are close.

Table 10 Aggregated direct and MQ estimates at the national level
Fig. 6
figure 6

Scatter plots with regression and intersection line (\(y=x\)) comparing direct and M‑quantile health insurance coverage estimates for a women b men at the county level in Kenya

For usefulness to users, we still adopt the criteria proposed by Brown et al. (2001). Accordingly, in SAE applications aggregation or bench marking of small area estimates at higher level is always desirable by the users. In small area applications, National statistical offices involved in generating the small area estimates always expect that the small area estimates are aggregated/bench marked to higher level estimate. At higher level of aggregation, the direct estimates are considered to be reliable and therefore the model-based small area estimates are expected to be near to the direct estimates when they are aggregated. We checked the aggregation of model-based small area estimates at the county level. We computed national level insurance coverage by aggregating the direct estimates and MQ small area estimates, as \(\sum_{i}(n_{i}.\text{Direct}_{i})/\sum_{i}n_{i}\) and \(\sum_{i}(n_{i}.MQ_{i})/\sum_{i}n_{i}\), respectively. Table 10 shows the aggregated estimates. The MQ estimates aggregate well to national level direct estimate.

5 Concluding remarks

In conclusion, health insurance reduces health care costs by pooling resources. It is an important component towards achieving UHC. Assessing health insurance coverage for policymaking requires reliable data especially at disaggregated levels. In this study, we have combined survey and census data through an M‑quantile model to get better estimates when sample sizes are small. This has the advantage that we avoid specifying random effects while providing robust inference against deviations from model assumptions. Findings show model-based estimates have smaller MSE’s and CV’s than direct estimates. Health insurance coverage remains low overall in Kenyan counties. Among those covered, our findings show inequality in health insurance coverage across the wealth quintile with the highest coverage being the richer and richest, especially for private insurance which requires monthly contributions. Health insurance in Kenya is mostly voluntary except for public and civil servants. The majority of Kenyans also work in the informal sector where health insurance is not compulsory. With the current voluntary health insurance scheme, health insurance coverage remains low. Kenya should establish a mechanism mainly funded by taxation to extend prepaid coverage to its population. Despite financial constraints, Kenya should provide total subsidy to the poor through NHIF. Further, Kenya should give a partial insurance subsidy, through the NHIF to people within the informal sector. Two possible directions for further research are a) to allow for more disaggreaged domains like the sub-county level and b) to incorporate additional predictors of health insurance coverage like geospatial data. A limitation of this study is the time difference between the survey and census data. Data collected around the same time might yield to more accurate results. Despite the limitation, this study has estimated the health insurance coverage at the county level in Kenya with better precision compared to direct estimates. It has been possible to establish the variation in health insurance coverage between counties, noting that counties neighboring Nairobi have more proportions of persons with health insurance.