Limitations
Use boundaries
Limitations — Field Risk Atlas
This document is the long-form version of the README disclaimer. Every limitation here is a load-bearing constraint: ignoring any of them leads to material misuse of the tool's output.
1. What this tool is not
Not a real estate appraisal
The risk score is a screening artifact. It is not a substitute for a state-certified or state-licensed appraiser per FCA Regulation 12 CFR 614.4260. It was not produced under USPAP standards. Any use that requires an appraiser's signature is out of scope.
Not a private-data tool
Every input is public. The repo will not ingest, reference, or commit any proprietary data, internal organizational records, or commercially-licensed sources (parcel aggregators like ParcelQuest, Regrid, Acres GIS — incompatible with open-source posture). If you build derivative work that pairs this tool with private data, that derivative work needs its own disclosure controls.
2. Snapshot tool, not a live system
The model produces a snapshot keyed on the gsp_status_as_of field per parcel. v1's snapshot is May 2026; the regulatory state encoded in the GSP status crosswalk is:
- Tule, Tulare Lake → Probationary
- Kaweah, Kern, Chowchilla → Returned to DWR
- Pleasant Valley → Inadequate
Refresh cadence is documented in methodology §3. A scoring run with stale inputs is fine for backtesting but should not be presented as the current state.
The USDM column (usdm_d2_weeks_52w) was zero across all 6 v1 counties in the May-2026 snapshot — California was in a wet period after the 2020–22 drought. This is accurate to the snapshot date, but it makes the 5-weight USDM component non-discriminating in v1; the snapshot date will move forward and conditions will shift before any forward-looking presentation.
3. Data quality caveats
Per-county parcel data heterogeneity
The 6 counties use 6 different REST endpoints with 6 different APN field conventions. Notable wrinkles:
- Tulare's
APNfield is integer-encoded (drops leading zeros);PARCELIDis used instead. - Madera's GIS server has a broken intermediate cert chain; ingest tolerates it for that one source.
- Kern's parcel data carries an attribution disclaimer (Kern County Assessor's Office, Mapping Section; Kern Council of Governments; MCAG; City of Bakersfield; City of Shafter). Raw Kern parcel data is never committed to the repo.
- 109 parcels (~0.04%) have collisions on
parcel_idin Fresno + Kings — different physical land sharing an APN string. The overlay collapses to the largest geometry perparcel_id.
DWR i07 well-completion reports have known issues
DWR's own metadata documents missing/duplicate records and incorrect values. Aggregation to PLSS section median is robust to outliers; absolute well counts are not. The dominant_well_depth_ft column is a median over the parcel's PLSS section, not a parcel-specific drilling depth.
PLSS rancho-grant areas have no T/R/S
~6.5% of v1-bbox PLSS polygons (concentrated in Sonoma) are Mexican Land Grant ranchos that predate the PLSS system. Parcels falling on these polygons won't have well-depth statistics — that's correct behavior, not a bug. ~28% of all parcels are missing well-depth coverage as a result.
Land IQ vs CDL provenance
Land IQ accounts for 66.7% of ag-parcel crop classifications; CDL fills the rest (mostly with Grassland/Pasture → annual_low). Land IQ is the higher-fidelity source; CDL is a 30m-grid centroid-sample fallback. The crop_source audit column records which source classified each parcel.
Water tier crosswalk is partially filled
Of 505 districts in the v1 footprint, 49 have explicit ASFMRA Tier 1–4 assignments (covering 95% of ag-acres) and 175 are auto-marked n/a (mutual water companies, mobile-home parks, urban M&I, etc.). 281 long-tail districts remain blank, defaulting to white-area; collectively they cover 3.1% of ag-acres.
4. Methodology limitations
Static weights, no learning
The weights are pinned by build-plan judgment. They are not calibrated against historical outcomes data. They are not adjusted for cross-component correlation (e.g., critical-overdraft basins almost always have High priority — there's some double-counting). A serious calibration exercise would replace this with regression-fit weights against an outcomes panel.
Single time-point GSP-status visualization
The map shows a single (May-2026) snapshot. A 2020 → 2026 time slider would require a manually-curated GSP-status timeline (SWRCB order effective dates per basin) which is not in v1. Presenting the map as a "trajectory" visualization would be misleading.
Six counties, not statewide
v1 covers Tulare, Kings, Kern, Madera, Fresno, Sonoma — chosen for the San Joaquin Valley core SGMA stress + a Sonoma vineyard contrast. The architecture is statewide-capable; extending requires a per-county parcel ingest module. Inferences across un-covered counties (Sacramento Valley, Salinas Valley, etc.) are out of scope.
5. Use boundaries
Appropriate uses:
- Field-level and regional water-risk analysis — which county / basin / crop-class concentrations are most exposed to SGMA stress
- Field prioritization — flag parcels for closer review by a state-licensed appraiser or domain expert
- Methodology research — a transparent and auditable model with documented data sources
- Public discourse on SGMA's water-allocation and basin-stress implications
Inappropriate uses:
- Any decision relying on the score in isolation, without ground-truth context the model doesn't capture
- Any presentation that strips the snapshot date from the score
- Any use that pairs the public score output with private data without separate disclosure controls
6. Where to read the source
- Methodology rationale:
methodology - Source URLs, licenses, vintages:
sources - Code:
github.com/mcmillinanalytics/field-risk-atlas
If you find a methodology gap or data-quality issue not documented here, file a GitHub issue against the repo.