Spatial Modeling of Cardiovascular Disease in Kenya

Grace Wanjiku Mwangi; Anthony Kibira Wanjoya

doi:doi:10.11648/j.ajtas.20261502.14

Research Article |

| Peer-Reviewed

Spatial Modeling of Cardiovascular Disease in Kenya

Grace Wanjiku Mwangi^*

, Anthony Kibira Wanjoya

Published in American Journal of Theoretical and Applied Statistics (Volume 15, Issue 2)

Received: 20 March 2026 Accepted: 3 April 2026 Published: 16 April 2026

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

The study conducts an assessment of the spatial distribution of cardiovascular diseases (CVD) in Kenya by integrating spatial modeling techniques and spatial autocorrelation measures. CVDs, which refer to disorders of the heart and blood vessels, have surpassed communicable diseases as the leading cause of morbidity and mortality worldwide, posing a critical public health concern, especially in low- and middle-income countries (LMICs) where resources remain limited. A growing body of global evidence has revealed marked geographical disparities in CVD incidence, prompting investigations into small-area spatial distribution patterns. This study employed both global and local spatial autocorrelation measures to analyze CVD prevalence across Kenyan counties. The Global Moran’s I statistic was used to assess the overall degree of spatial clustering, while the Local Moran’s I identified significant clusters of high and low prevalence, alongside spatial outliers. Additionally, the Getis-Ord Gi* statistic was applied to detect statistically significant hotspots and coldspots, revealing important spatial patterns in disease prevalence. Spatial regression models were compared using the Lagrange Multiplier (LM) test, Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC) for model selection. The Spatial Lag Model (SLM) demonstrated superior performance over the Spatial Error Model (SEM) and the Spatial Durbin Model (SDM), achieving a Rao’s score (RSlag) of 16.449 and an adjusted score (adjRSlag) of 12.181, both statistically significant at the 5% level. The SLM also recorded the lowest AIC and BIC values at -380.09 and -361.80, respectively, confirming its suitability in capturing spatial dependence in the data. The findings revealed significant spatial clustering of CVD prevalence, with distinct high-risk and low-risk regions across the country. High body mass index (HBMI), tobacco use, and poor dietary habits emerged as major risk factors driving CVD prevalence, while urbanization and economic development were associated with lower disease burdens. The study highlights the importance of incorporating spatial analysis in public health planning to inform targeted interventions, optimize resource allocation, and enhance community health education campaigns aimed at promoting heart-healthy lifestyles.

Published in	American Journal of Theoretical and Applied Statistics (Volume 15, Issue 2)
DOI	10.11648/j.ajtas.20261502.14
Page(s)	59-71
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2026. Published by Science Publishing Group

Keywords

Cardiovascular Diseases (CVDS), Global Moran’s I, Local Moran’s I, Spatial Lag Model (SLM), Spatial Error Model (SEM), Spatial Durbin Model (SDM), Gertis Ord Gi* Statistic

1. Introduction

Cardiovascular diseases (CVDs) refer to disorders of the heart and blood vessels, including ischemic heart disease, stroke, peripheral arterial disease, rheumatic heart disease, congenital heart disease, deep vein thrombosis, and pulmonary embolism

[1]

. These diseases are a leading cause of morbidity and mortality globally, contributing to approximately 17.8 million deaths annually with the majority of these deaths, nearly 80%, occurring in low and middle-income countries (LMIC) including Kenya

[2]

. In Kenya, CVDs are increasingly recognized as a significant public health challenge rapidly overshadowing the infectious diseases that once dominated the health landscape, contributing to premature deaths and straining an already overburdened healthcare system.

Multiple risk factors for non-communicable diseases have contributed to the rising prevalence of cardiovascular disease (CVD) in Kenya. The 2015 STEPwise survey for noncommunicable disease risk factors in Kenya highlighted widespread exposure to CVD risk factors, including elevated blood pressure, tobacco use, physical inactivity, unhealthy dietary habits, and excess body weight

[3]

. These factors collectively fuel the growing burden of CVD, placing significant pressure on the country’s healthcare system and economy.

Despite national efforts through NCD strategic plans, the scarcity of studies examining the spatial patterns and disparities of CVD in Kenya has led to lack of a national strategy that prioritizes CVDs

[4, 25, 26]

, limiting the ability to guide targeted interventions and allocate resources effectively. Advancements in spatial analysis techniques offer opportunities to uncover geographic patterns of disease distribution. Spatial regression and hotspot analysis methods can detect spatial autocorrelation, identify disease clusters, and model the relationships between CVD prevalence across counties.

A review of existing literature highlights the significance of this study in integrating non-spatial epidemiological approaches with advanced spatial modelling techniques in the analysis of CVDs

[15]	Addie, O., & John Taiwo, O. (2024). Predictors of diagnosed cardiovascular diseases and their spatial heterogeneity in Lagos State, Nigeria. Open Health, 5(1), 20230018.
[16]	Dwane, N., Wabiri, N., & Manda, S. (2020). Small-area variation of cardiovascular diseases and select risk factors and their association to household and area poverty in South Africa: Capturing emerging trends in South Africa to better target local level interventions. PLoS One, 15(4), e0230564.
[17]	Darikwa, T. B., & Manda, S. O. (2020). Spatial co-clustering of cardiovascular diseases and select risk factors among adults in South Africa. International Journal of Environmental Research and Public Health, 17(10), 3583.
[18]	Baptista, E. A., & Queiroz, B. L. (2022). Spatial analysis of cardiovascular mortality and associated factors around the world. BMC public health, 22(1), 1556.
[20]	Roth, G. A., Johnson, C. O., Abate, K. H., AbdAllah, F., Ahmed, M., Alam, T., & Murray, C. J. L. (2017). Trends and patterns of geographic variation in cardiovascular mortality among US counties, 1980-2014. JAMA, 317(19), 1976-1992.
[21]	Zangeneh, A., Amini, S., Khalili, D., Fattahi, N., Shafiee, A., & Fotouhi, A. (2024). Epidemiological patterns and spatiotemporal analysis of cardiovascular disease mortality in Iran: Development of public health strategies and policies. Current Problems in Cardiology, 49(8), Article 102675.
[22]	Wang, Y., Chen, X., & Xue, F. (2024). A review of Bayesian spatiotemporal models in spatial epidemiology. ISPRS International Journal of Geo-Information, 13(3), Article 97.
[23]	Fabiyi, O. O., & Garuba, O. E. (2015). Geo-spatial analysis of cardiovascular disease and biomedical risk factors in Ibadan, South-Western Nigeria. Journal of Settlements and Spatial Planning, 6(1), 61-69.
[24]	Kandala, N.-B., Gebremedhin, G., Tlou, B., Koyanagi, A., Manda, S., & Lakew, Y. (2021). Mapping the burden of hypertension in South Africa: A comparative analysis of the national 2012 SANHANES and the 2016 Demographic and Health Survey. International Journal of Environmental Research and Public Health, 18(10), 5445.

[15-18, 20-24]

. Although numerous studies have employed conventional statistical methods to examine CVD prevalence and associated risk factors, the application of spatial autoregressive models and hotspot analysis remains comparatively limited, especially in low- and middle-income countries.

This study addresses this methodological gap by identifying spatial dependencies, quantifying regional disparities, and detecting spatial clustering patterns in CVD distribution in Kenya. The findings offer valuable insights for public health policy and planning, providing a spatially explicit, data-driven framework to enhance reduction in morbidity and mortality, optimize the targeting of health interventions, and inform the equitable allocation of healthcare resources aimed at reducing the burden of CVD.

2. Methodology

2.1. Variables and Data Source

Kenya, the study area, is made up of 47 counties, each characterized by diverse demographic, economic, and environmental conditions that shape health outcomes, particularly with regard to cardiovascular disease (CVD).

This study employs the Spatial model to analyze secondary data obtained from the Global Burden of Disease (GBD) 2021 dataset

[27]

and the 2019 national census data

[28]

. The GBD 2021 dataset offers detailed information on the prevalence of cardiovascular diseases (CVDs) and associated risk factors across different regions at both national and county levels. The county-level prevalence rate of cardiovascular disease is the dependent variable of this study. Specific risk factors considered include high body mass index (HBMI), high alcohol use, tobacco use, and dietary risks. Meanwhile, the 2019 national census data provides essential demographic and socioeconomic variables, enriching the spatial analysis by offering critical context for adjusting the spatial distribution of CVD burden. Key demographic variables include the urbanization rate and the population density per km². However, the socioeconomic variable included was the gross county product (GCP).

2.2. Spatial Autocorrelation

Spatial autocorrelation is used to describe the extent to which a variable is correlated with itself through space

[5]

. The spatial autocorrelation statistic captures both the attribute similarity and the locational similarity. The presence of spatial autocorrelation can be assessed using the Global Moran’s I index and Geary’s C index. The indices summarize the degree to which similar observations tend to occur near each other over the study area. The Morans I is produced by standardizing the spatial autocovariance by the variance of the data while Geary’s C uses the sum of the squared differences between pairs of data values as its measure of covariation. They both depend on a spatial structural specification such as a spatial weights matrix or a distance related decline function.

2.2.1. Spatial Weights Matrix

A spatial weights matrix W indicates the strength of the potential interaction between spatial areal units

[6]

. In the study, the spatial structure of units is defined by spatial contiguity, represented through a nxn spatial weights matrix W with binary values.

The binary spatial weights matrix W is defined as:

{w_{ij} = {}_{0 otherwise}^{1 if area i and area j are contiguous}

(1)

The row-standardized weights matrix W^∗is given by:

w_{ij}^{*} = \frac{w_{ij}}{\sum_{j ϵ J_{i}} w_{ij}}

(2)

where J_iis the set of areal units contiguous with i.

The row-standardized matrix satisfies:

{}_{j ϵ J_{i}}^{X *}{w_{ij} = 1 for all i}

(3)

The values of w_ij or the weights for each pair of locations are assigned by some preset rules that define the spatial relations among locations and, therefore, determine the spatial autocorrelation statistics. By convention, w_ii= 0 for the diagonal elements

[7, 8]

Neighbors can be defined by contiguity in different ways: in Rook contiguity, two locations are considered neighbors if they share a common border or side, while in Queen contiguity, any region that touches either the boundary or a single point of the region i is considered a neighbor.

2.2.2. Global Moran’s I

The study employed the Global Moran’s I index to asseess the presence of spatial autocorrelation in the overall data, which quantifies how similar each region is with its neighbors and averages all these assessments.

The Global Moran’s I takes the form:

I = \frac{n \sum_{i} \sum_{j} w_{ij} (Y_{i} - Ȳ) (Y_{j} - Ȳ)}{(\sum_{i \neq j} w_{ij}) \sum_{i} {(Y_{i} - Ȳ)}^{2}}

(4)

where, n is the number of regions, Y_i is the observed value of the variable of interest in region i, Y_jis the observed value of the variable of interest in region j, and ^Y^¯is the mean of all observed values. w_ijare spatial weights that denote the spatial proximity between regions i and j, with w_ii= 0 and i, j = 1,..., n.

Moran’s I values usually range from -1 to 1. The values significantly above E [I] indicate positive spatial autocorrelation or clustering. This occurs when neighboring regions tend to have similar values. On the other hand, Moran’s I values significantly below E [I] indicate negative spatial autocorrelation or dispersion. This happens when regions that are close to one another tend to have different values. However, the values around E [I] indicate randomness, that is, the absence of spatial pattern.

When the number of regions is sufficiently large, I has a normal distribution and we can assess whether any given pattern deviates significantly from a random pattern by comparing the z-score to the standard normal distribution.

z = \frac{I - E (I)}{\sqrt{V ar (I)}}

(5)

An alternative approach to judge significance is Monte Carlo randomization. This method creates random patterns by reassigning the observed values among the areas and calculates the Moran’s I for each of the patterns, providing a randomization distribution for the Moran’s I. If the observed value of Moran’s I lies in the tails of this distribution, the assumption of independence among observations is rejected.

2.3. Hotspot Analysis

2.3.1. Getis-Ord Gi* Statistic

In this study, the Hot Spot Analysis tool was employed to identify spatial clusters of high and low values in the prevalence of cardiovascular disease across regions. The tool calculates the Getis-Ord G^∗_istatistic for each feature in the dataset, generating associated z-scores and p-values that indicate where clusters of statistically significant high or low values occur

[9]

The analysis considers each feature within the context of its neighboring features. While an individual county with a high prevalence value is notable, it is only classified as a statistically significant hot spot if it is also surrounded by other counties with similarly high values. The local sum of a feature and its neighbors is compared to the overall sum of all features in the dataset. A statistically significant hot spot is identified when this local sum differs from the expected value by an amount too large to be attributed to random chance, yielding a high positive z-score.

To account for the potential issue of multiple testing and spatial dependency-where the likelihood of identifying statistically significant results by chance increases with the number of tests-the False Discovery Rate (FDR) correction was applied. This adjustment improves the robustness of the statistical inference, ensuring a more reliable identification of true hot and cold spots within the study area.

This methodology was integral to the spatial analysis conducted in this study, providing critical insights into the geographic clustering patterns of cardiovascular disease prevalence in Kenya. The Getis-Ord local statistic is given as:

G_{i}^{*} = \frac{\sum_{j = 1}^{n} w_{i, j} x_{j} - X̄ \sum_{j = 1}^{n} w_{i, j}}{\sqrt{s} (\sum_{j = 1}^{n} w_{i, j})} \frac{1}{n - 1}

(6)

where x_jis the attribute value for f eature j, w_i,j is the spatial weight between feature i and j, and n is equal to the total number of features and:

X̄ = \frac{\sum_{j = 1}^{n} x_{j}}{n}

(7)

S = \sqrt{\frac{\sum_{j = 1}^{n} x_{j}^{2}}{n} - {(X̄)}^{2}}

(8)

The G^∗_istatistic is a z-score, so no further calculations are required.

2.3.2. Local Indicators of Spatial Association (LISA)

To assess local spatial association between the prevalence of CVD in each county and those of its neighboring counties, Local Indicators of Spatial Association (LISA) were applied in this study. LISA is specifically designed to measure the degree of significant spatial clustering of similar values around each observation, allowing the identification of localized spatial patterns that may be obscured in global statistics.

A desirable property of LISA is that the sum of its local values across all regions is proportional to the corresponding global measure of spatial autocorrelation. This allows for the decomposition of a global statistic into the contribution of each individual region, providing a more localized understanding of spatial patterns.

In this analysis, local Moran’s I values were computed for each county to detect significant clusters. For the i-th region, the local Moran’s I is defined as:

I_{i} = n (Y_{i} - Ȳ) \frac{\sum_{j} w_{i j} (Y_{j} - Ȳ)}{\sum_{j} {(Y_{j} - Ȳ)}^{2}}

,(9)

where w_ijrepresents the spatial weight between regions i and j, and ^Y^¯is the mean of Y, the variable of interest.

Importantly, the global Moran’s I is proportional to the sum of the local Moran’s I values for all regions:

I = \frac{1}{\sum_{i \neq j} w_{i j}} \sum_{i} I_{i}

(10)

To interpret the significance of these local Moran’s I values, associated p-values was generated. These p-values represent the probability of obtaining a local Moransˆ I as extreme as the observed one under the null hypothesis of spatial randomness (no spatial association). This allowed for the identification of regions that are part of statistically significant high-high clusters (hot spots) or low-low clusters (cold spots), as well as spatial outliers.

The p-values were estimated through a permutation approach, in which the observed value at a given region i is held constant while the values at other locations are randomly permuted a large number of times to generate a reference distribution of local Moran’s I. The position of the observed statistic within this simulated distribution determines its statistical significance.

2.4. Spatial Models

The justification for using a spatial model begins with testing for spatial autocorrelation in the residuals of an Ordinary Least Squares (OLS) regression model that estimates the linear relationship between the dependent variable and the explanatory variables. The Moran’s I of the residuals of an OLS is used to assess whether residuals exhibit spatial dependence. A significant Moran’s I result indicates a violation of the OLS assumption of independent errors, necessitating the use of a spatial model to account for spatial dependencies across the geographical space.

Spatial econometrics offers a range of models tailored to different data characteristics and research objectives. Among the most commonly used types of spatial regression models which can be chosen subject to the data or theoretical literature are the spatial lag model (SAR model) and the spatial error model (SEM model)

[10, 11]

. While the two models provide essential frameworks for understanding spatial relationships, Spatial Durbin Model (SDM) build upon and combine these foundational models

[12]

The SAR model addresses spatial dependencies by incorporating spatial lags of the dependent variable. This approach accounts for autocorrelation among observed variables by modeling equilibrium outcomes that result from spatial and/or social interactions

[10, 11]

. The Spatial Autoregressive (SAR) model can be defined as:

y = ρWy + Xβ + ϵ

(11)

where:

1) y is an N × 1 vector of the dependent variable,

2) W is an N × N spatial weights matrix,

3) X is an N ×K matrix of k = {1, 2,..., K} covariates,

4) ϵ is an N×1 vector of normally distributed disturbances,

5) β is a K × 1 vector of parameter estimates, and

6) ρ represents the autoregressive scalar parameter.

The Spatial Error Model (SEM) addresses spatial dependence in the error term by incorporating a spatial autoregressive process. In a standard linear regression model, the disturbance term u is typically assumed to be i.i.d. However, in the SEM, u follows a first-order autoregressive (AR) process, analogous to the Markov process in time-series analysis. This implies that the error term for one observation can be influenced by the error terms of neighboring observations, reflecting a spatial spillover effect

[10, 11]

. The Spatial Error Model (SEM) can be defined as:

where:

y = Xβ + u

(12)

u = λWu + ϵ

(13)

In this model:

1) y represents the dependent variable,

2) X denotes the matrix of exogenous (independent) variables,

3) β represents the coefficients for these exogenous variables,

4) u is the vector of spatially correlated error terms,

5) λ is the spatial autoregressive coefficient, measuring the extent of spatial dependence in the error terms,

6) W is the spatial weights matrix indicating the spatial structure and influence of neighboring observations, and

7) ϵ is a vector of i.i.d. (independently and identically distributed) error terms.

The SEM is particularly useful when spatial dependence is confined to the error term rather than the dependent variable itself. It accounts for unobserved spatial heterogeneity and omitted variables that might be spatially correlated.

The Spatial Durbin Model (SDM) captures both spatial dependence in the dependent variable and the spatial spillovers in the explanatory variables. It combines the spatial autoregressive structure for the dependent variable with the spatial lag specification of the covariates, resulting in a more comprehensive representation of spatial interdependencies

[12]

. The general form of the SDM is expressed as follows:

y = ρWy + Xβ + WXθ + ϵ

(14)

Where:

1) y is the n × 1 vector of the dependent variable (e.g., prevalence of cardiovascular disease),

2) W is the n × n spatial weights matrix that defines the spatial relationships between regions,

3) X is the n×k matrix of explanatory variables (covariates such as population density, socio-economic factors,

4) etc.),

5) β is the k × 1 vector of coefficients for the explanatory variables,

6) WX is the spatial lag of the explanatory variables, capturing spillover effects from neighboring regions,

7) θ is the k × 1 vector of coefficients for the spatially lagged covariates,

8) ρ is the spatial autoregressive coefficient, indicating the impact of the dependent variable’s spatial lag on the outcome in each region,

9) ϵ is the n×1 vector of errors, assumed to follow a normal distribution with zero mean and constant variance.

2.5. The Analytical Strategy

In spatial econometrics, the specific-to-general approach is commonly used to build models progressively. This approach begins with a basic non-spatial model and and systematically tests for possible misspecifications, such as omitted autocorrelation in the error term or the dependent variable

[13]

. Through these tests, the model is refined by incorporating necessary spatial components to account for the spatial dependencies in the data

[14]

Download: Download full-size image

Figure 1. Spatial distribution of Cardiovascular Disease in Kenya.

2.6. Histogram

The Freedman-Diaconis rule has been used to determine the optimal bin width for the histogram. The formula for calculating the width of the bin, h, is given by:

h = \frac{2 IQR}{n^{1 / 3}}

This study begins by conducting an Exploratory Spatial Data Analysis (ESDA) to understand the underlying patterns and distributions in the data. Analysis of spatial autocorrelation of the residuals from the Ordinary Least Squares (OLS) regression is then performed to assess the presence of spatial dependence and justify the use of a spatial model. The Global Moran’s I is computed on the data to detect any overall spatial autocorrelation and determine whether spatial dependence exists across the regions of interest employing the Queen (first order) adjacency weights matrix. The study applies the spatial modelling techniques, namely, spatial lag model, spatial error model, and spatial durbin model. To guide model selection, we will use the Lagrange Multiplier (LM) test and assess model fit through the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). This criterion will enable a comparative evaluation of the relative performance of the spatial models. Once the best model is identified and fit, model diagnostics to ensure the validity of the results is conducted. This includes residual analysis which is essential in assessing the adequacy of fitted spatial model.

3. Results

3.1. Exploratory Spatial Data Analysis

3.1.1. Maps on CVD Prevalence in Kenya

The Natural Breaks (Jenks) classification method effectively reveals inherent data groupings by optimizing class boundaries at points where significant differences in data values occur. This ensures that similar values are grouped while maximizing distinctions between classes. Figure 1 illustrates the spatial distribution of cardiovascular diseases in Kenya and highlight areas with varying concentrations.

where IQR is the interquartile range and n is the number of data points. By focusing on the interquartile range, the Freedman-Diaconis rule reduces the influence of outliers and helps to generate a histogram that accurately represents the data’s central tendency and distribution. Figure 2 shows the distribution of CVDs prevalence using a histogram.

Download: Download full-size image

Figure 2. Cardiovascular diseases in Kenya.

3.1.2. Horizontal Bar Graph

Figure 3 shows the CVD distribution using a horizontal bar graph.

Download: Download full-size image

Figure 3. CVD prevalence in the counties across Kenya.

3.1.3. Spatial Lag Scatterplot

Figure 4 shows the scatterplot of the original variable values versus the spatially lagged values.

Download: Download full-size image

Figure 4. Spatial Lag Scatterplot.

The positive spatial lag scatter plot for the prevalence of cardiovascular disease (CVD) reveals a significant positive correlation, indicating that regions with higher CVD rates tend to be surrounded by neighboring areas with similarly elevated rates.

3.1.4. Residuals from Linear Regression

In Table 1, Moran’s I for the residuals of the OLS regression model is 0.2048, with a z-value of 2.4491 and a significant p-value of 0.0071.

Table 1. Moran’s I for residuals of the OLS regression model.

Disease	Moran’s I	z-value	P-value
Cardiovascular	0.2048	2.4491	0.0071

The significant Moran’s I value suggests that an OLS model would fail to account for key spatial dependencies, necessitating spatial regression techniques.

3.2. Spatial Analysis Results

3.2.1. Global Moran’s I

The results presented in Table 2 highlight the spatial autocorrelation as determined by the global Moran statistic I.

Table 2. Moran’s I Test Results.

Disease	Moran’s I Statistic	p-value
Cardiovascular	0.5875	< 0.001
Stroke	0.5135	< 0.001
Ischemic	0.5598	< 0.001
Rheumatic	0.4057	< 0.001
Hypertensive	0.4985	< 0.001

The statistically significant positive values obtained from the global Moran I test indicate the presence of a positive spatial autocorrelation. This suggests that neighboring areas tend to have high or low prevalence rates of CVD.

3.2.2. Moran’s I Scatterplot

The plot shown in Figure 5 is a Moran’s I scatter plot, which is typically used to assess spatial autocorrelation. It compares the prevalence of cardiovascular disease (CVD) in each region (x-axis) with the prevalence in neighboring regions (y-axis).

Download: Download full-size image

Figure 5. Moran Scatterplot.

The positive slope of the regression line suggests that regions with high (low) CVD prevalence tend to be surrounded by other regions with similarly high (low) CVD prevalence, indicating positive spatial autocorrelation.

3.2.3. Hotspot Analysis

Local Moran’s I have been used in identifying clusters of similar values and highlight spatial outliers, detecting regions where high values cluster with other high values (hotspots), low values cluster with low values (coldspots), and areas where values differ significantly from their neighbors (outliers). In contrast, Getis-Ord Gi* has been used to specifically identify statistically significant hotspots and coldspots, focusing on the concentration of high or low values without detecting outliers. Figure 6 displays the hotspots and coldspots of cardiovascular disease (CVD) prevalence, with hotspots marked in red and coldspots in green. The hotspots indicate regions with higher than-expected CVD prevalence, while coldspots represent areas with lower-than-expected prevalence.

Download: Download full-size image

Figure 6. Hotspots and coldspots of CVD in Kenya.

Figure 7 illustrates the significant hotspots and coldspots for cardiovascular disease (CVD) prevalence. It highlights the counties with statistically significant clustering, showing regions with either elevated (hotspots) or reduced (coldspots) CVD prevalence in relation to neighboring areas.

Download: Download full-size image

Figure 7. Significant hotspots and coldspots of CVD in Kenya.

The results presented in Tables 3 and 4 summarizes the counties identified as significant hotspots and coldspots for CVDs in Kenya.

Table 3. List of Hotspot Counties with Moran’s I and Gi* Scores.

County	Local Moran’s I	P-value	Gi* Score	Hot/Cold
Nyeri	2.8134	0.000074	3.9646	Hot Spot
Murang’a	2.6778	0.000251	3.6613	Hot Spot
Kirinyaga	2.6766	0.000109	3.8686	Hot Spot
Embu	2.0347	0.000063	4.0011	Hot Spot
Nyandarua	1.1412	0.011	2.5389	Hot Spot
Meru	1.0305	0.00503	2.8052	Hot Spot
Tharaka	0.8205	0.00056	3.4502	Hot Spot
Machakos	0.7998	0.0121	2.5085	Hot Spot

Counties such as Nyeri, Murang’a, Kirinyaga, and Embu emerge as key hotspots, with high Moran’s I values of 2.8134, 2.6778, 2.6766, and 2.0347, respectively, and strongly positive Gi* scores. Their highly significant p-values support the statistical significance of these clusters, indicating that CVD prevalence in these counties is considerably higher than in neighboring regions.

Table 4. List of Coldspot Counties with Moran’s I and Gi* Scores.

County	Local Moran’s I	P-value	Gi* Score	Hot/Cold
Wajir	2.3004	0.0045	-2.8384	Cold Spot
Garissa	1.7446	0.0207	-2.3127	Cold Spot
Marsabit	1.2691	0.0095	-2.5921	Cold Spot

Coldspot counties such as Wajir, Garissa and Marsabit exhibit statistically significant low Moran I and p-values, coupled with significantly negative Gi* scores. This indicates that these regions have a lower CVD prevalence compared to their neighboring areas. The local Moran I values range from 1.2691 in Marsabit to 2.3004 in Wajir, confirming that spatial clustering is present.

3.3. Spatial Models Comparison

Table 5 presents the results of four Lagrange Multiplier (LM) test statistics namely: Lagrange Multiplier (lag) and Robust Lagrange Multiplier (lag), alongside the Lagrange Multiplier (error) and robust Lagrange Multiplier (error), which test for the presence of spatial lag dependence and spatial error dependence, respectively.

Table 5. Rao’s Score (Lagrange Multiplier) Diagnostics for Spatial Dependence.

Test	Statistic	Degrees of Freedom (df)	p-value
RSerr	4.3962	1	0.036
RSlag	16.449	1	<0.001
adjRSerr	0.1291	1	0.719
adjRSlag	12.181	1	<0.001

The results from the Lagrange Multiplier (LM) tests indicate that both the LM-lag and LM-error statistics are highly significant, rejecting the null hypothesis and confirming the existence of spatial dependence in the data. To determine the appropriate form of spatial dependence, we rely on the robust versions of the tests. In this case, the robust LMlag statistic remains significant, while the robust LM-error statistic becomes non-significant. This suggests that, once the influence of a spatially lagged dependent variable is accounted for, the remaining error term no longer exhibits spatial autocorrelation. This implies that the spatial dependence in the data is better explained by a spatial spillover effect of the dependent variable rather than by spatially correlated errors. Consequently, the Spatial Lag Model (SLM), among the tested models, emerges as the most appropriate specification for analyzing the spatial distribution of CVD prevalence in this study.

Table 6 displays the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC0 values for three spatial models: the Spatial Lag Model (SLM), and the Spatial Error Model (SEM), and the Spatial Durbin Model (SDM). A lower AIC value indicates a more efficient model that balances goodness of fit with model complexity.

Table 6. Comparison of Spatial Models.

Model	AIC value	BIC value
Spatial Lag Model	-380.09	-361.80
Spatial Error Model	-373.71	-355.43
Spatial Durbin Model	-372.31	-341.22

The model comparison results indicate that, among the three models, the Spatial Autoregressive Model (SAR) achieves the lowest AIC and BIC values hence reinforcing its suitability for analyzing the spatial dynamics of cardiovascular disease prevalence.

3.4. Spatial Lag Model (SLM)

3.4.1. Impact Measures

Table 7 shows the impact measures derived from the Spatial Lag Model providing valuable information on the relationships between various predictors and the prevalence of CVD. The direct effect represents the immediate influence that changes in a predictor variable have on the prevalence of CVD within a specific region. In contrast, indirect effects reflect how changes in a predictor can influence CVD rates in neighboring regions due to spatial interactions. The total effects combine direct and indirect impacts, offering a holistic view of how each predictor contributes to the prevalence of CVD.

Table 7. Impact Measures from Spatial Lag Model.

Variable	Direct Impact	Indirect Impact	Total Impact
HBMI	0.2738	0.2439	0.5177
Alcohol Use	-0.4054	-0.3612	-0.7665
Tobacco use	0.2623	0.2337	0.4961
Dietary risks	0.2940	0.2620	0.5560
GCP	-0.0004	-0.0003	-0.0007
Population Density	0.0121	0.0107	0.0228
Urbanization Rate	-0.0078	-0.0070	-0.0148

The estimated spatial autoregressive parameter ρ is 0.5875, with a highly statistically significant p-value (p < 0.001), indicating considerable spatial spillover effects; this suggests that the prevalence of cardiovascular disease in one county is not independent of that in neighboring counties.

3.4.2. Residuals Distribution Analysis

The Q-Q plot of residuals, presented in Figure 8, is a visual assessment of the normality of the residuals from the spatial lag model.

Download: Download full-size image

Figure 8. QQ plot of SLM residuals.

The plot displays the quantiles of the residuals plotted against the quantiles of a standard normal distribution. Ideally, if the residuals are normally distributed, the points should closely follow the reference line, which in this case is shown in blue. The shaded blue area represents the confidence bands, indicating the expected range for normally distributed data.

The histogram shown in Figure 9 represents the distribution of residuals from a spatial lag model (SLM).

Download: Download full-size image

Figure 9. Histogram of residuals of SLM.

The residuals are centered around zero, with the majority values closely clustered near the center, suggesting that the model’s predictions are generally unbiased, without any systematic overestimation or underestimation. Notably, the absence of extreme outliers implies that the assumptions of normality and homoscedasticity in the residuals may hold. Although a formal statistical test would be necessary to confirm this, the visual assessment suggests that the residuals are evenly distributed around zero, enhancing confidence in the model’s reliability.

The scatter plot of residuals for the Spatial Lag Model (SLM), shown in Figure 10, illustrates the relationship between the residuals, which are the differences between the observed and predicted values, and the fitted values. Analyzing the residuals helps identify patterns that may indicate model misspecification, such as spatial autocorrelation or heteroscedasticity.

Download: Download full-size image

Figure 10. Residual of SLM scatter plot.

From the plot, we observe that the residuals are randomly scattered around the horizontal line at zero. This random distribution suggests that the model does not exhibit any obvious pattern in the residuals, reinforcing the likelihood that the assumptions of linearity and homoscedasticity are satisfied. The absence of any discernible trends further confirms that the model is not affected by significant issues related to heteroscedasticity or non-linearity.

The results from the Moran I test, as presented in Table 8, reveal a Moran I statistic of 0.0489, indicating minimal spatial autocorrelation in the residuals. This interpretation is further supported by the p-value of 0.2272, which is above the commonly accepted significance threshold of 0.05. Therefore, we fail to reject the null hypothesis of no spatial autocorrelation in the residuals.

Table 8. Moran I Test Results.

Statistic	Value
Sample estimate: Moran I statistic	0.0482
Moran I statistic standard deviate	0.7482
p-value	0.2272
Expectation	-0.02222
Variance	0.0087

The results provide confidence in the model’s ability to account for spatial dependence within the data.

4. Discussion

This study provides valuable insights into the spatial distribution of cardiovascular disease (CVD) prevalence in Kenya, revealing distinct regional patterns. Notably, counties such as Nyeri, Murang’a, Kirinyaga, and Embu emerged as CVD hotspots, characterised by significant clustering of CVD cases. These findings corroborate earlier research, which attributes such clustering to regional heterogeneity in environmental, socioeconomic, and behavioral determinants of health

[15]

. In contrast, counties such as Wajir, Garissa, and Marsabit exhibit lower prevalence of CVD.

Physiological factors such as high body mass index, a widely accepted marker of overweight and obesity, featured prominently having a direct and significant association with CVD risk. This finding is consistent with previous studies

[16]

, which identified diabetes as a major predictor of CVD morbidity and mortality, particularly in the context of ischemic heart disease and stroke. These results highlight the necessity for community-level interventions targeting obesity prevention through improved nutrition, increased physical activity, and health education on the risks of excess body weight.

Behavioral risk factors, particularly high use of tobacco, emerged as key contributors to CVD risk. The harmful chemicals introduced into the bloodstream through smoking and tobacco use have direct and indirect detrimental effects on cardiovascular health. These findings echo the conclusions of previous research, which identified tobacco control as a critical public health priority in Kenya

[3]

. The evidence calls for intestified antitobacco policies, public education campaigns, and cessation support services.

An intriguing aspect of the analysis was the role of population density on the risk of CVD. Counties with high population density tend to report slightly higher CVD prevalence, likely attributed to environmental stressors such as air pollution. Interestingly, the study observed that high urbanization rates are correlated with lower prevalence of CVD, a finding that contrasts with earlier research which reported increased CVD risks in highly urbanized regions

[17]

. This discrepancy highlights the complexity of urban health dynamics in Kenya and suggests that urbanization may offer protective benefits through better access to healthcare and health promotion services, a relationship that warrants further investigation.

Economic disparities were also evident in the spatial distribution of CVD. The Gross County Product (GCP), a measure of local economic performance, exhibited a significant inverse association with the CVD prevalence. This association is consistent with earlier research findings

[18]

. The study finds that high alcohol use to be associated with a lower risk of CVD, despite the known adverse effects of excessive consumption. A systematic review and meta-analysis reported a reduced risk of heart failure associated with moderate alcohol consumption in community settings

[19]

. The present findings echo this complex relationship but caution is warranted, as the threshold between beneficial and harmful consumption remains poorly defined. and requires further research. Finally, the study explored dietary risks where higher dietary risks were associated with higher CVD prevalence supporting the conventional assumptions about diet and heart health.

Future efforts should prioritize strengthening health data systems to address the persistent gaps in CVD reporting, particularly in remote and underserved regions of Kenya. Enhanced collaboration between the Ministry of Health (MOH), the Kenya Health Information System (KHIS), and innovative health data platforms such as Signalytic is essential for improving the quality, timeliness, and representative of health data to support evidence-based decision-making. By investing in robust, inclusive data collection and analysis frameworks, Kenya can improve public health planning, reduce regional disparities, and enhance the effectiveness of national cardiovascular disease prevention strategies.

5. Conclusion

This study reveals significant spatial disparities in the prevalence of cardiovascular disease (CVD) across Kenya, shaped by a complex interplay of physiological, behavioral, and socioeconomic factors. Strengthening health information systems and improving data quality, particularly in undeserved and remote areas, remains crucial to achieving equitable health outcomes. Partnerships between the Ministry of Health (MOH), Kenya Health Information System (KHIS), and health data innovators may offer valuable opportunities to close data gaps and support evidence-based policy-making.

A comprehensive approach that integrates health promotion, economic support, expanded access to healthcare, more comprehensive approach to data collection and further research is critical in addressing the complex drivers of CVD in Kenya.

Abbreviations

CVD	Cardiovascular Disease
GBD	Global Burden of Disease
GCP	Gross County Product
HBMI	High Body Mass Index
HBP	High Blood Pressure
KNBS	Kenya National Bureau of Statistics
LM	Lagrange Multiplier
NCD	Non-Communicable Diseases
SAR	Spatial Autoregressive
SDM	Spatial Durbin Model
SEM	Spatial Error Model
SR	Standardized Rate
WHO	World Health Organization

Acknowledgments

I extend my sincere appreciation to all individuals and institutions whose support made this research possible. I am particularly grateful to my academic supervisor and mentors for their invaluable guidance, constant feedback, and continuous encouragement throughout the study. I also acknowledge the contributions of stakeholders and organizations that facilitated access to essential data and resources, enabling the development and validation of the models presented. Lastly, I express deep gratitude to my family, friends, and colleagues for their unwavering support and encouragement, which were instrumental in the successful completion of this cardiovascular disease research.

Author Contributions

Grace Wanjiku Mwangi: Conceptualization, Data curation, Formal Analysis, Methodology, Resources, Visualization, Writing – original draft

Anthony Kibira Wanjoya: Conceptualization, Supervision, Writing – review & editing

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1]	World Health Organization, ”Cardiovascular diseases”. (2021). Available from: https://www.who.int/newsroom/fact-sheets/detail/cardiovascular-diseases-(cvds)
[2]	Mensah, G. A., Roth, G. A., & Fuster, V. (2019). The global burden of cardiovascular diseases and risk factors: 2020 and beyond. Journal of the American College of Cardiology, 74(20), 2529-2532.
[3]	Ministry of Health, “Kenya STEPwise Survey for Non-Communicable Diseases Risk Factors 2015 Report”. Available from: https://www.health.go.ke/wpcontent/uploads/2016/04/Steps-Report-NCD-2015.pdf
[4]	Mbau, L., Fourie, J. M., Scholtz, W., Nel, G., Scarlatescu, O., & Gathecha, G. (2021). PASCAR and WHF Cardiovascular diseases Scorecard project. Cardiovascular Journal of Africa, 32(3), 161-167.
[5]	Moraga, P. (2023). Spatial statistics for data science: theory and practice with R. CRC Press.
[6]	Zhang, C. (2012). Spatial weights matrix and its application. Journal of Regional Development Studies, 15, 85-97.
[7]	Zhou, X., & Lin, H. (2008). Spatial weights matrix. In S. Shekhar & H. Xiong (Eds.), Encyclopedia of GIS (pp. 1113-1113). Springer US. https://doi.org/10.1007/978-0387-35973-1-1307
[8]	Shekhar, S., Xiong, H., & Zhou, X. (Eds.). (2017). Encyclopedia of GIS (2nd ed.). Springer International Publishing.
[9]	Environmental Systems Research Institute. (2021). ArcGIS Pro (Version 2.8) [Computer software].
[10]	Elhorst, J. P. (2014). Spatial econometrics: from crosssectional data to spatial panels (Vol. 479, p. 480). Heidelberg: Springer.
[11]	Yamagata, Y., & Seya, H. (Eds.). (2019). Spatial analysis using big data: Methods and urban applications. Academic Press.
[12]	Ruttenauer, T. (2022). Spatial regression models: a systematic comparison of different model specifications using Monte Carlo experiments. Sociological Methods & Research, 51(2), 728-759.
[13]	Florax, R. J., Folmer, H., & Rey, S. J. (2003). Specification searches in spatial econometrics: the relevance of Hendry’s methodology. Regional science and urban economics, 33(5), 557-579.
[14]	Grekousis, G. (2020). Spatial analysis methods and practice: describe-explore-explain through GIS. Cambridge University Press.
[15]	Addie, O., & John Taiwo, O. (2024). Predictors of diagnosed cardiovascular diseases and their spatial heterogeneity in Lagos State, Nigeria. Open Health, 5(1), 20230018.
[16]	Dwane, N., Wabiri, N., & Manda, S. (2020). Small-area variation of cardiovascular diseases and select risk factors and their association to household and area poverty in South Africa: Capturing emerging trends in South Africa to better target local level interventions. PLoS One, 15(4), e0230564.
[17]	Darikwa, T. B., & Manda, S. O. (2020). Spatial co-clustering of cardiovascular diseases and select risk factors among adults in South Africa. International Journal of Environmental Research and Public Health, 17(10), 3583.
[18]	Baptista, E. A., & Queiroz, B. L. (2022). Spatial analysis of cardiovascular mortality and associated factors around the world. BMC public health, 22(1), 1556.
[19]	Yoon, S. J., Jung, J. G., Lee, S., Kim, J. S., Ahn, S. k., Shin, E. S., Jang, J. E., & Lim, S. H. (2020). The protective effect of alcohol consumption on the incidence of cardiovascular diseases: Is it real? A systematic review and meta-analysis of studies conducted in community settings. BMC Public Health, 20, 1-9.
[20]	Roth, G. A., Johnson, C. O., Abate, K. H., AbdAllah, F., Ahmed, M., Alam, T., & Murray, C. J. L. (2017). Trends and patterns of geographic variation in cardiovascular mortality among US counties, 1980-2014. JAMA, 317(19), 1976-1992.
[21]	Zangeneh, A., Amini, S., Khalili, D., Fattahi, N., Shafiee, A., & Fotouhi, A. (2024). Epidemiological patterns and spatiotemporal analysis of cardiovascular disease mortality in Iran: Development of public health strategies and policies. Current Problems in Cardiology, 49(8), Article 102675.
[22]	Wang, Y., Chen, X., & Xue, F. (2024). A review of Bayesian spatiotemporal models in spatial epidemiology. ISPRS International Journal of Geo-Information, 13(3), Article 97.
[23]	Fabiyi, O. O., & Garuba, O. E. (2015). Geo-spatial analysis of cardiovascular disease and biomedical risk factors in Ibadan, South-Western Nigeria. Journal of Settlements and Spatial Planning, 6(1), 61-69.
[24]	Kandala, N.-B., Gebremedhin, G., Tlou, B., Koyanagi, A., Manda, S., & Lakew, Y. (2021). Mapping the burden of hypertension in South Africa: A comparative analysis of the national 2012 SANHANES and the 2016 Demographic and Health Survey. International Journal of Environmental Research and Public Health, 18(10), 5445.
[25]	Mwenzwa, E. M., & Misati, J. A. (2014). Kenya’s social development proposals and challenges: Review of Kenya Vision 2030 first medium-term plan, 2008-2012.
[26]	Asiki, G., Kyobutungi, C., Ezeh, A., Joshi, M. D., Oti, S., & Awuor, C. (2018). Policy environment for prevention, control and management of cardiovascular diseases in primary health care in Kenya. BMC Health Services Research, 18(1), 1-9.
[27]	Global Burden of Disease Collaborative Network (2022). Global Burden of Disease Study 2021 (GBD 2021) results. Seattle, United States: Institute for Health Metrics and Evaluation (IHME).
[28]	Kenya National Bureau of Statistics. (2023). 2019 Kenya Population and Housing Census: Volume IV- Distribution of population by socio-economic characteristics.

Cite This Article

Plain Text BibTeX RIS

APA Style

Mwangi, G. W., Wanjoya, A. K. (2026). Spatial Modeling of Cardiovascular Disease in Kenya. American Journal of Theoretical and Applied Statistics, 15(2), 59-71. https://doi.org/10.11648/j.ajtas.20261502.14

Copy | Download

ACS Style

Mwangi, G. W.; Wanjoya, A. K. Spatial Modeling of Cardiovascular Disease in Kenya. Am. J. Theor. Appl. Stat. 2026, 15(2), 59-71. doi: 10.11648/j.ajtas.20261502.14

Copy | Download

AMA Style

Mwangi GW, Wanjoya AK. Spatial Modeling of Cardiovascular Disease in Kenya. Am J Theor Appl Stat. 2026;15(2):59-71. doi: 10.11648/j.ajtas.20261502.14

Copy | Download

@article{10.11648/j.ajtas.20261502.14,
  author = {Grace Wanjiku Mwangi and Anthony Kibira Wanjoya},
  title = {Spatial Modeling of Cardiovascular Disease in Kenya},
  journal = {American Journal of Theoretical and Applied Statistics},
  volume = {15},
  number = {2},
  pages = {59-71},
  doi = {10.11648/j.ajtas.20261502.14},
  url = {https://doi.org/10.11648/j.ajtas.20261502.14},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajtas.20261502.14},
  abstract = {The study conducts an assessment of the spatial distribution of cardiovascular diseases (CVD) in Kenya by integrating spatial modeling techniques and spatial autocorrelation measures. CVDs, which refer to disorders of the heart and blood vessels, have surpassed communicable diseases as the leading cause of morbidity and mortality worldwide, posing a critical public health concern, especially in low- and middle-income countries (LMICs) where resources remain limited. A growing body of global evidence has revealed marked geographical disparities in CVD incidence, prompting investigations into small-area spatial distribution patterns. This study employed both global and local spatial autocorrelation measures to analyze CVD prevalence across Kenyan counties. The Global Moran’s I statistic was used to assess the overall degree of spatial clustering, while the Local Moran’s I identified significant clusters of high and low prevalence, alongside spatial outliers. Additionally, the Getis-Ord Gi* statistic was applied to detect statistically significant hotspots and coldspots, revealing important spatial patterns in disease prevalence. Spatial regression models were compared using the Lagrange Multiplier (LM) test, Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC) for model selection. The Spatial Lag Model (SLM) demonstrated superior performance over the Spatial Error Model (SEM) and the Spatial Durbin Model (SDM), achieving a Rao’s score (RSlag) of 16.449 and an adjusted score (adjRSlag) of 12.181, both statistically significant at the 5% level. The SLM also recorded the lowest AIC and BIC values at -380.09 and -361.80, respectively, confirming its suitability in capturing spatial dependence in the data. The findings revealed significant spatial clustering of CVD prevalence, with distinct high-risk and low-risk regions across the country. High body mass index (HBMI), tobacco use, and poor dietary habits emerged as major risk factors driving CVD prevalence, while urbanization and economic development were associated with lower disease burdens. The study highlights the importance of incorporating spatial analysis in public health planning to inform targeted interventions, optimize resource allocation, and enhance community health education campaigns aimed at promoting heart-healthy lifestyles.},
 year = {2026}
}

Copy | Download

TY  - JOUR
T1  - Spatial Modeling of Cardiovascular Disease in Kenya
AU  - Grace Wanjiku Mwangi
AU  - Anthony Kibira Wanjoya
Y1  - 2026/04/16
PY  - 2026
N1  - https://doi.org/10.11648/j.ajtas.20261502.14
DO  - 10.11648/j.ajtas.20261502.14
T2  - American Journal of Theoretical and Applied Statistics
JF  - American Journal of Theoretical and Applied Statistics
JO  - American Journal of Theoretical and Applied Statistics
SP  - 59
EP  - 71
PB  - Science Publishing Group
SN  - 2326-9006
UR  - https://doi.org/10.11648/j.ajtas.20261502.14
AB  - The study conducts an assessment of the spatial distribution of cardiovascular diseases (CVD) in Kenya by integrating spatial modeling techniques and spatial autocorrelation measures. CVDs, which refer to disorders of the heart and blood vessels, have surpassed communicable diseases as the leading cause of morbidity and mortality worldwide, posing a critical public health concern, especially in low- and middle-income countries (LMICs) where resources remain limited. A growing body of global evidence has revealed marked geographical disparities in CVD incidence, prompting investigations into small-area spatial distribution patterns. This study employed both global and local spatial autocorrelation measures to analyze CVD prevalence across Kenyan counties. The Global Moran’s I statistic was used to assess the overall degree of spatial clustering, while the Local Moran’s I identified significant clusters of high and low prevalence, alongside spatial outliers. Additionally, the Getis-Ord Gi* statistic was applied to detect statistically significant hotspots and coldspots, revealing important spatial patterns in disease prevalence. Spatial regression models were compared using the Lagrange Multiplier (LM) test, Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC) for model selection. The Spatial Lag Model (SLM) demonstrated superior performance over the Spatial Error Model (SEM) and the Spatial Durbin Model (SDM), achieving a Rao’s score (RSlag) of 16.449 and an adjusted score (adjRSlag) of 12.181, both statistically significant at the 5% level. The SLM also recorded the lowest AIC and BIC values at -380.09 and -361.80, respectively, confirming its suitability in capturing spatial dependence in the data. The findings revealed significant spatial clustering of CVD prevalence, with distinct high-risk and low-risk regions across the country. High body mass index (HBMI), tobacco use, and poor dietary habits emerged as major risk factors driving CVD prevalence, while urbanization and economic development were associated with lower disease burdens. The study highlights the importance of incorporating spatial analysis in public health planning to inform targeted interventions, optimize resource allocation, and enhance community health education campaigns aimed at promoting heart-healthy lifestyles.
VL  - 15
IS  - 2
ER  -

Copy | Download

Author Information

Grace Wanjiku Mwangi

Department of Statistics and Actuarial Science, Jomo Kenyatta University of Agriculture and Technology (JKUAT), Nairobi, Kenya

Contact Email

http://orcid.org/0009-0005-6716-0353
Anthony Kibira Wanjoya

Department of Statistics and Actuarial Science, Jomo Kenyatta University of Agriculture and Technology (JKUAT), Nairobi, Kenya

Contact Email

http://orcid.org/0009-0009-7609-4318

Download PDF

Submit an Article

Plain Text BibTeX RIS

APA Style

Mwangi, G. W., Wanjoya, A. K. (2026). Spatial Modeling of Cardiovascular Disease in Kenya. American Journal of Theoretical and Applied Statistics, 15(2), 59-71. https://doi.org/10.11648/j.ajtas.20261502.14

Copy | Download

ACS Style

Mwangi, G. W.; Wanjoya, A. K. Spatial Modeling of Cardiovascular Disease in Kenya. Am. J. Theor. Appl. Stat. 2026, 15(2), 59-71. doi: 10.11648/j.ajtas.20261502.14

Copy | Download

AMA Style

Mwangi GW, Wanjoya AK. Spatial Modeling of Cardiovascular Disease in Kenya. Am J Theor Appl Stat. 2026;15(2):59-71. doi: 10.11648/j.ajtas.20261502.14

Copy | Download

@article{10.11648/j.ajtas.20261502.14,
  author = {Grace Wanjiku Mwangi and Anthony Kibira Wanjoya},
  title = {Spatial Modeling of Cardiovascular Disease in Kenya},
  journal = {American Journal of Theoretical and Applied Statistics},
  volume = {15},
  number = {2},
  pages = {59-71},
  doi = {10.11648/j.ajtas.20261502.14},
  url = {https://doi.org/10.11648/j.ajtas.20261502.14},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajtas.20261502.14},
  abstract = {The study conducts an assessment of the spatial distribution of cardiovascular diseases (CVD) in Kenya by integrating spatial modeling techniques and spatial autocorrelation measures. CVDs, which refer to disorders of the heart and blood vessels, have surpassed communicable diseases as the leading cause of morbidity and mortality worldwide, posing a critical public health concern, especially in low- and middle-income countries (LMICs) where resources remain limited. A growing body of global evidence has revealed marked geographical disparities in CVD incidence, prompting investigations into small-area spatial distribution patterns. This study employed both global and local spatial autocorrelation measures to analyze CVD prevalence across Kenyan counties. The Global Moran’s I statistic was used to assess the overall degree of spatial clustering, while the Local Moran’s I identified significant clusters of high and low prevalence, alongside spatial outliers. Additionally, the Getis-Ord Gi* statistic was applied to detect statistically significant hotspots and coldspots, revealing important spatial patterns in disease prevalence. Spatial regression models were compared using the Lagrange Multiplier (LM) test, Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC) for model selection. The Spatial Lag Model (SLM) demonstrated superior performance over the Spatial Error Model (SEM) and the Spatial Durbin Model (SDM), achieving a Rao’s score (RSlag) of 16.449 and an adjusted score (adjRSlag) of 12.181, both statistically significant at the 5% level. The SLM also recorded the lowest AIC and BIC values at -380.09 and -361.80, respectively, confirming its suitability in capturing spatial dependence in the data. The findings revealed significant spatial clustering of CVD prevalence, with distinct high-risk and low-risk regions across the country. High body mass index (HBMI), tobacco use, and poor dietary habits emerged as major risk factors driving CVD prevalence, while urbanization and economic development were associated with lower disease burdens. The study highlights the importance of incorporating spatial analysis in public health planning to inform targeted interventions, optimize resource allocation, and enhance community health education campaigns aimed at promoting heart-healthy lifestyles.},
 year = {2026}
}

Copy | Download

TY  - JOUR
T1  - Spatial Modeling of Cardiovascular Disease in Kenya
AU  - Grace Wanjiku Mwangi
AU  - Anthony Kibira Wanjoya
Y1  - 2026/04/16
PY  - 2026
N1  - https://doi.org/10.11648/j.ajtas.20261502.14
DO  - 10.11648/j.ajtas.20261502.14
T2  - American Journal of Theoretical and Applied Statistics
JF  - American Journal of Theoretical and Applied Statistics
JO  - American Journal of Theoretical and Applied Statistics
SP  - 59
EP  - 71
PB  - Science Publishing Group
SN  - 2326-9006
UR  - https://doi.org/10.11648/j.ajtas.20261502.14
AB  - The study conducts an assessment of the spatial distribution of cardiovascular diseases (CVD) in Kenya by integrating spatial modeling techniques and spatial autocorrelation measures. CVDs, which refer to disorders of the heart and blood vessels, have surpassed communicable diseases as the leading cause of morbidity and mortality worldwide, posing a critical public health concern, especially in low- and middle-income countries (LMICs) where resources remain limited. A growing body of global evidence has revealed marked geographical disparities in CVD incidence, prompting investigations into small-area spatial distribution patterns. This study employed both global and local spatial autocorrelation measures to analyze CVD prevalence across Kenyan counties. The Global Moran’s I statistic was used to assess the overall degree of spatial clustering, while the Local Moran’s I identified significant clusters of high and low prevalence, alongside spatial outliers. Additionally, the Getis-Ord Gi* statistic was applied to detect statistically significant hotspots and coldspots, revealing important spatial patterns in disease prevalence. Spatial regression models were compared using the Lagrange Multiplier (LM) test, Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC) for model selection. The Spatial Lag Model (SLM) demonstrated superior performance over the Spatial Error Model (SEM) and the Spatial Durbin Model (SDM), achieving a Rao’s score (RSlag) of 16.449 and an adjusted score (adjRSlag) of 12.181, both statistically significant at the 5% level. The SLM also recorded the lowest AIC and BIC values at -380.09 and -361.80, respectively, confirming its suitability in capturing spatial dependence in the data. The findings revealed significant spatial clustering of CVD prevalence, with distinct high-risk and low-risk regions across the country. High body mass index (HBMI), tobacco use, and poor dietary habits emerged as major risk factors driving CVD prevalence, while urbanization and economic development were associated with lower disease burdens. The study highlights the importance of incorporating spatial analysis in public health planning to inform targeted interventions, optimize resource allocation, and enhance community health education campaigns aimed at promoting heart-healthy lifestyles.
VL  - 15
IS  - 2
ER  -

Copy | Download