Publication Tables
-
table1_descriptive_stats.csvTable 1: Descriptive statistics for all variables (mean, SD, range, coverage)
-
table2_evidence_hierarchy.csvTable 2: Evidence hierarchy across all analytical methods
-
table3_bym2_robustness.csvTable 3: BYM2 robustness across model specifications (v1, v2A, v2B)
-
table4_compound_bym2.csvTable 4: Compound-specific BYM2 results (12 compounds × 2 cancers)
-
table5_long_difference.csvTable 5: Long-difference results (Δpesticide 1997→2012)
-
table6_gauntlet_summary.csvTable 6: Cross-gauntlet summary (4 risk factors, IARC scores, pesticide survival)
Risk Factor Gauntlet Results
-
smoking_bym2_results.csvSmoking gauntlet: BYM2 results for 12 cancer types
-
smoking_long_diff_results.csvSmoking gauntlet: long-difference results
-
smoking_iv_results.csvSmoking gauntlet: IV/2SLS results
-
smoking_vs_pesticide_comparison.csvSmoking vs pesticide head-to-head comparison
-
obesity_bym2_results.csvObesity gauntlet: BYM2 results for 12 cancer types
-
obesity_long_diff_results.csvObesity gauntlet: long-difference results
-
obesity_iv_results.csvObesity gauntlet: IV/2SLS results
-
obesity_vs_pesticide_comparison.csvObesity vs pesticide head-to-head comparison
-
alcohol_bym2_results.csvAlcohol gauntlet: BYM2 results for 12 cancer types
-
alcohol_long_diff_results.csvAlcohol gauntlet: long-difference results
-
alcohol_iv_results.csvAlcohol gauntlet: IV/2SLS results
-
alcohol_vs_pesticide_comparison.csvAlcohol vs pesticide head-to-head comparison
-
inactivity_bym2_results.csvPhysical inactivity gauntlet: BYM2 results for 12 cancer types
-
inactivity_long_diff_results.csvInactivity gauntlet: long-difference results
-
inactivity_iv_results.csvInactivity gauntlet: IV/2SLS results
-
inactivity_vs_pesticide_comparison.csvInactivity vs pesticide head-to-head comparison
Exploratory Screening
-
exploratory_synthesis.csvFull exploratory synthesis: all predictors × 26 cancers (Spearman, partial, LASSO, OLS)
-
top_correlates_by_cancer.csvTop predictor correlates for each of 26 cancer types
Core Analysis Tables
-
spearman_correlation_matrix.csvFull pairwise Spearman rank correlation matrix across all variables
-
spearman_pvalue_matrix.csvP-value matrix for all pairwise Spearman correlations
-
top_correlations.csvTop predictor–cancer correlations ranked by absolute magnitude
-
partial_correlations_cancer.csvPartial correlations controlling for 9 confounders
-
vif_scores.csvVariance Inflation Factors for multicollinearity assessment
-
morans_i_results.csvGlobal Moran’s I spatial autocorrelation statistics for each variable
-
data_coverage_report.csvVariable-level coverage report: non-null counts, missing percentages
Bayesian & Compound-Specific Results
-
all_cancer_type_models.csvBYM2 results for all 8 cancer types: rate ratios, HDIs, rho, diagnostics
-
all_compound_bym2_results.csvCompound-specific BYM2: 12 compounds × 2 cancers = 24 models
-
expanded_compound_bym2_results.csvExpanded compound BYM2: 6 additional compounds (3 herbicides + 3 fungicides)
-
exposure_pathway_results.csvUrban/rural stratified BYM2, interaction models, crop sensitivity
Robust Analysis
-
spatial_models_all_cancers.csvOLS, Spatial Lag, and Spatial Error model comparisons across cancer types
-
negative_control_tests.csvNegative control results: lifestyle, random, and unrelated exposure tests
Sensitivity Analysis
-
bootstrap_correlations.csvBootstrap confidence intervals (1,000 resamples) for Spearman correlations
-
bootstrap_regression.csvBootstrap confidence intervals for OLS regression coefficients
-
leave_one_state_out.csvJackknife estimates excluding each state in turn (50 iterations)
-
lag_window_sensitivity.csvSensitivity to different exposure–outcome temporal lag windows
-
suppression_sensitivity.csvSensitivity to varying county-inclusion thresholds (NCI suppression)
-
outlier_cooks_comparison.csvOutlier sensitivity: Cook’s distance based trimming
-
outlier_iqr_comparison.csvOutlier sensitivity: IQR-based trimming
Synthesis & Summary
-
methods_comparison.csvAll analytical methods compared: effect estimates, spatial controls, interpretations
-
confounding_robustness.csvConfounding robustness: RR stability across 5 model specifications
Reproducibility
The complete analysis pipeline consists of 32 Jupyter notebooks and 5 Python source modules:
src/data_collection.py— Data download functions (~3,400 lines)src/cleaning.py— FIPS standardization, merge logic (~1,300 lines)src/analysis.py— All statistical methods (~2,500 lines)src/visualization.py— Plotly maps, forest plots, figures (~1,700 lines)src/sensitivity.py— Bootstrap, jackknife, robustness tests (~540 lines)
Key Dependencies
Python 3.10+, pandas, numpy, scipy, statsmodels, scikit-learn, plotly, geopandas, libpysal, spreg,
mgwr (≥2.1), linearmodels (≥5.0), pymc (≥5.0), arviz (≥0.17), nutpie (≥0.12).
Install via pip install -r requirements.txt.