H14H15publishedNPD release 2026-04-09

Duplicate detection

Same-NPI-multiple-resource-IDs for Practitioner, and normalized-name-plus-address collapse for Organization.

Headline

Practitioner dedup is clean — 0 excess rows across 7,441,212 NPIs (H14). But Organizations multiply: 70.5% of the 1,999,118 unique Org NPIs map to more than one Organization resource (1,415,777 excess rows; max 5 resources per one NPI). By normalized (name, state, city), 70.3% of keys repeat. Downstream consumers assuming one Organization resource = one real-world entity will be wrong roughly two out of three times.

1.2M / 9.2M = 13.35%

H14 Practitioner by NPI0.000%
H15 Org by name+state+city70.3%
H15b Org by NPI70.5%

unit: percent

What this means

Everyone using NDH

COUNT(Organization) is roughly 2× the number of unique real-world organizations. De-duplicate by _npi before treating org counts as unique entities. Practitioner dedup is clean (0 excess rows).

Payer data teams

An org that appears multiple times in NDH under different resource IDs may be legitimate (one FHIR Organization per service location) or defect (true duplicate). Either way, your match-to-internal-roster logic needs a normalization pass on NPI or (name, state, city).

Researchers

70% of Org NPIs map to multiple Organization resources. Any study that treats NDH Organization count as a population figure will be inflated by ~1.7× at the entity level.

Null hypothesis

Duplicate rate is below 1% for both Practitioner (by NPI) and Organization (by normalized name + address).

Denominator

All `Practitioner` and `Organization` resources.

Data source

CMS NPD bulk export.

Notes

BigQuery dataset has primary-key dedup applied at ingest (-4.6M Practitioner, -383K Organization at _id). These are residual entity-level duplicates. H14 key = _npi on practitioner. Max copies observed: 1 for a single Practitioner NPI. H15 key = (LOWER(name) stripped of LLC/INC/PC/PA/PLLC/CORP/LLP/LTD/CO/COMPANY/THE and non-alphanumerics, UPPER(state), UPPER(TRIM(city))); orgs with missing name or state or city are excluded. Max copies for one key: 2206. H15-bonus keys by _npi; max copies for one Org NPI: 5. Caveat — some portion of the Organization multiplicity may reflect CMS modeling one FHIR Organization resource per service location rather than true duplication. Either interpretation breaks the common downstream assumption that COUNT(Organization) equals the number of unique organizations. Fuzzy matching (Jaro-Winkler, suite-unit tolerance) is a v2 enhancement.