Thompson C, Jin A, Luft HS, Lichtensztajn DY, Allen L, Liang SY, Schumacher BT, Gomez SL., Cancer Epidemiol Biomarkers Prev. pii: cebp.0882.2019. doi: 10.1158/1055-9965.EPI-19-0882. [Epub ahead of print], 2020 Feb 17
Su-Ying Liang, Ph.D., Research Economist / Faculty
BACKGROUND: There is tremendous potential to leverage the value gained from integrating electronic health records (EHRs) and population-based cancer registry data for research. Registries provide detailed diagnosis tumor characteristics, and treatment summaries, while EHRs contain rich clinical detail. A carefully conducted cancer registry linkage may also be used to improve the internal and external validity of inferences made from EHR-based studies.
METHODS: We linked the EHRs of a large, multispecialty, mixed-payer healthcare system with the statewide cancer registry and assessed the validity of our linked population. For internal validity, we identify patients that might be "missed" in a linkage, threatening the internal validity of an EHR study population. For generalizability, we compared linked cases with all other cancer patients in the 22-county EHR catchment region.
RESULTS: From an EHR population of 4.5M, we identified 306,554 cancer patients, 26% of the catchment regions cancer patients. 22.7% of linked patients were diagnosed with cancer after they migrated away from our healthcare system highlighting an advantage of system-wide linkage. We observed demographic differences between EHR patients and non-EHR patients in the surrounding region and demonstrated use of selection probabilities with model-based standardization to improve generalizability.
CONCLUSIONS: Our experiences set the foundation to encourage and inform researchers interested in working with EHRs for cancer research as well as provide context for leveraging linkages to assess and improve validity and generalizability.
IMPACT: Researchers conducting linkages may benefit from considering one or more of these approaches to establish and evaluate the validity of their EHR-based populations.