Breast Cancer Surveillance Consortium: Working together to advance breast cancer research
Home   |   Data   |   Statistics   |   Tools   |   Collaborations   |   Work with Us   |   Publications   |   About   |   Links

Algorithms to Identify Second Breast Cancer Events from Electronic Data

Grant number: R21CA143242
PI Name: Chubak J
Title: Algorithms to Identify Second Breast Cancer Events from Electronic Data

U.S. breast cancer survivors number 2.5 million, more than the survivors of any other cancer. Studies on how to improve survival and quality of life in this ever-growing population are critical in reducing the national cancer burden. The ability to identify second breast cancer events (i.e., breast cancer recurrences and second primary breast cancers) is critical for cancer survivorship research. In response to the National Cancer Institute’s call for studies of cancer surveillance using health claims data, we propose to develop and validate algorithms to identify second breast cancer events from automated healthcare utilization data in order to minimize the need for expensive and time-consuming manual medical record review. Automated healthcare utilization data are becoming increasingly accessible; however, these sources have yet to be validated against gold-standard medical record abstraction for obtaining information on second breast cancer events. This work is significant and necessary since state tumor registries do not routinely collect information on cancer recurrences.

The proposed study will be conducted using data from two integrated healthcare delivery systems within the Cancer Research Network (CRN): Group Health Cooperative (in western Washington State) and the Henry Ford Health System (in Detroit, Michigan). These healthcare systems have extensive automated data on enrollment, diagnoses, procedures, and prescription medication fills. The proposed study is efficient because it will use gold-standard data on second breast cancer events that have already been abstracted on ~2500 women as part of previously funded studies of breast cancer outcomes. The sample of women will be divided into a training dataset (60%) for algorithm development and a testing dataset (40%) for validation. The primary aim of this study is to develop a “menu” of algorithms that researchers can select from under different circumstances; i.e., when they want to maximize sensitivity, specificity, or positive predictive value. Secondary analyses will explore:

  1. whether algorithms developed in one population are valid in another, and
  2. whether valid algorithms can be developed using more limited sources of data that are likely to be available in a larger number of healthcare settings.

This project will use innovative approaches to develop the algorithm “menu” and to explore the generalizability of algorithm development.

National Cancer Institute Department of Health and Human Services National Institutes of Health USA.gov: The US government's official web portal Maintained by the Healthcare Delivery Research Program,
Division of Cancer Control and Population Sciences