Strengths
- Near-complete county coverage: 3,248 counties spanning diverse geography, demographics, and agricultural practices
- Methodological convergence: 15+ analytical methods from bivariate through Bayesian spatial, with kidney and colorectal surviving the most stringent
- Positive control validation: PM2.5–lung cancer relationship recovered at published effect sizes (17.9% vs meta-analytic 10–15%), confirming pipeline validity
- Compound specificity: Herbicide-specific signal (5/6 sig for colorectal) vs null insecticides (0/6) and fungicides (0/3 for colorectal) argues against generic agricultural confounding
- Multiple negative controls: Livestock density (6/6 NS), diabetes (NS), fungicides→colorectal (0/3 NS) rule out alternative explanations
- Temporal evidence: Long-difference estimator (1997→2012) shows within-county changes predict cancer rate changes
- Confounding robustness: RR stable or strengthening across 5 model specifications adding food, alcohol, nitrate, livestock, and diabetes
- Risk factor gauntlets: Pesticide associations survive as covariates across all 4 gauntlets (smoking 7/8, obesity 4/8, alcohol 2/7, inactivity 2/7), ruling out major established risk pathways as confounders
- Exploratory confirmation: Hypothesis-free screening of all predictors across 26 cancers independently identifies pesticide as a top correlate for kidney and colorectal
- Temporal evidence: CDC EPHT panel (2001–2020) shows kidney cancer rising in precisely the regions where herbicide application is highest
Limitations
Ecological Fallacy
This is a county-level analysis. We observe that counties with higher pesticide application have higher cancer rates, but we cannot determine whether the individuals who develop cancer are the ones exposed to pesticides. The association could be driven by unmeasured county-level confounders that correlate with both pesticide use and cancer incidence. Individual-level cohort studies (e.g., the Agricultural Health Study) are needed to establish personal exposure–outcome relationships.
Cross-Sectional Cancer Data
Cancer incidence rates reflect a 5-year window (2016–2020), while pesticide application is measured primarily in 2019. Given cancer latency periods of 10–30 years, current pesticide levels serve as a proxy for historical exposure patterns under the assumption that county-level agricultural intensity is relatively stable over decades. The long-difference analysis (1997→2012) partially addresses this by examining temporal variation.
Exposure Misclassification
USGS PNSP estimates are modeled from USDA survey data and EPA registration data, not direct measurements. County-level application density (kg/mi²) is a crude proxy for actual human exposure, which depends on proximity to fields, water contamination, food residues, drift patterns, and individual behavior. This measurement error likely attenuates effect estimates toward the null (consistent with IV > OLS).
Spatial Confounding
BYM2 models show very high spatial mixing parameters (ρ = 0.93–0.99), meaning the spatial random effects absorb most residual variation. This is both a strength (aggressive control for unmeasured spatial confounders) and a concern: the spatial random effects may absorb some true pesticide signal along with confounding, making surviving fixed effects conservative but potentially over-attenuated.
The fact that most cancer types (6 of 8) show null pesticide effects after BYM2 spatial smoothing demonstrates how aggressively the spatial RE absorbs cross-sectional signal. The kidney and colorectal associations that survive are therefore noteworthy, but the high ρ values mean we cannot rule out that even these surviving signals reflect residual spatial confounding from unmeasured factors.
The Spatial Confounding Story
When we fit OLS or LASSO, pesticide effects look clear and consistent. As we add spatial controls (Spatial Error, GWR, BYM2), effects attenuate progressively. This is expected: spatially correlated unmeasured confounders are absorbed by spatial random effects. The question is whether BYM2 over-absorbs by removing true causal spatial signal along with confounding. The IV and long-difference results, which address endogeneity through different mechanisms, provide complementary evidence that supports a real (if modest) effect.
Modest Effect Sizes
The surviving associations are small: RR = 1.025–1.034 for kidney, RR = 1.015–1.018 for colorectal, per standard deviation of pesticide density. These are rate ratios per SD, not per absolute unit, and represent the county-level ecological association after spatial smoothing. For context:
- Smoking→all-site cancer: RR = 1.044 (in the same BYM2 framework)
- Obesity→all-site cancer: RR = 1.008
- The pesticide–kidney effect (RR = 1.025–1.034) is between these two
Additional Limitations
- No individual exposure data: No biomarker, residential, or dietary exposure measures
- Single time point: Cancer and pesticide data are not perfectly temporally aligned
- BYM2 connectivity: ICAR requires a connected adjacency graph; rural and small counties are dropped after connected-component filtering
- Suppressed counties: NCI suppresses rates for small-count counties, creating non-random missingness
- No dose-response threshold: We model continuous linear effects; threshold or non-linear relationships may exist
The IV Paradox
The IV/2SLS and BYM2 results appear to tell different stories at first glance. IV shows a significant all-site cancer effect (using crop acreage as instrument), while BYM2 shows all-site as null but kidney and colorectal as significant.
These are actually complementary:
- IV addresses endogeneity (measurement error, reverse causation) but does not control for spatial autocorrelation
- BYM2 addresses spatial confounding (unmeasured spatially correlated factors) but assumes correct model specification
- The IV result confirms that OLS is attenuated by measurement error (IV > OLS), validating the direction of bias
- The BYM2 results identify which cancer types survive aggressive spatial control
- Together, they suggest a real but modest causal effect that is detectable for kidney and colorectal cancer specifically
Overall Evidence Assessment
The evidence for a pesticide–kidney cancer association is the strongest in this study. It is supported by: (1) BYM2 with P(>0) = 0.997 across all model specifications, (2) long-difference with p = 0.003, (3) compound specificity showing glyphosate (RR=1.027) and dicamba (RR=1.026) as strongest signals, (4) null livestock and diabetes negative controls, (5) robustness to nitrate confounding, and (6) survival across all four risk factor gauntlets (smoking, obesity, alcohol, inactivity).
The colorectal evidence is also robust but with a smaller effect size. The herbicide specificity for colorectal (5/6 sig, 0/3 insecticides, 0/3 fungicides) is the cleanest chemical-class pattern in the study. The alcohol gauntlet is particularly informative: alcohol is null for colorectal in BYM2 while pesticide remains significant.
The exploratory screening independently confirms these findings: among all available predictors tested against all 26 cancer types in a hypothesis-free framework, pesticide density emerges as a top correlate specifically for kidney and colorectal cancer.
However, this remains an ecological study with modest effect sizes and high spatial mixing parameters. It should be interpreted as hypothesis-generating evidence that motivates individual-level investigation, not as proof of causation.
What Would Strengthen the Evidence?
Individual-level prospective cohort studies with measured pesticide exposure (biomarkers or residential proximity), dose-response analysis with continuous exposure gradients, replication in non-US agricultural settings, and mechanistic studies linking specific herbicides (glyphosate, atrazine, 2,4-D) to kidney and colorectal carcinogenesis pathways.
Citation
If referencing this work, please cite as:
County-Level Pesticide Exposure and Cancer Incidence in the United States: An Ecological Analysis Using Bayesian Spatial Models. 2024–2026. Available at this URL.
Contact
Questions, corrections, or collaboration inquiries are welcome. This project was developed as an independent research effort combining publicly available data from eleven federal data sources.