This digital mammography dataset includes information from 20,000 digital and 20,000 film screening mammograms performed between January 2005 and December 2008 from women included in the Breast Cancer Surveillance Consortium. Some women contribute more than one examination to the dataset. These data are recommended only for use in teaching data analysis or epidemiological concepts. Because the data represent only a small sample of mammography data available from BCSC they should not be used to conduct primary research.
Variable Name | Description | coding |
---|---|---|
age_c | patient's age in years at time of mammogram | Numerical |
assess_c | Radiologist's assessment based on the BI-RADS scale | 0 = Needs additional imaging 1 = Negative 2 = Benign finding(s) 3 = Probably benign 4 = Suspicious abnormality 5 = Highly suggestive of malignancy |
cancer_c | binary indicator of cancer diagnosis within one year of screening mammogram | 0 = no cancer diagnosis 1 = cancer diagnosis |
compfilm_c | comparison mammogram from prior mammography examination available | 0 = no 1 = yes 9 = missing |
density_c | patient's BI-RADS breast density as recorded at time of mammogram | 1 = Almost entirely fatty 2 = Scattered fibroglandular densities 3 = Heterogeneously dense 4 = Extremely dense |
famhx_c | family history of breast cancer in a first degree relative |
0 = no 1 = yes 9 = missing |
hrt_c | current use of hormone therapy at time of mammogram | 0 = no 1 = yes 9 = missing |
prvmam_c | binary indicator of whether the woman had ever received a prior mammogram | 0 = no 1 = yes 9 = missing |
biophx_c | history of breast biopsy | 0 = no 1 = yes 9 = missing |
mammtype | film or digital mammogram | 1 = film mammogram 2 = digital mammogram |
CaTypeO | cancer type | 1 = ductal carcinoma in situ 2 = invasive cancer 8 = no cancer diagnosis |
bmi_c | body mass index at time of mammogram | Numerical or -99 if missing |
ptid | patient's study id |
The following must be cited when using this dataset:
"Data collection and sharing was supported by the National Cancer Institute-funded Breast Cancer Surveillance Consortium (HHSN261201100031C). You can learn more about the BCSC at: http://www.bcsc-research.org/."
Information about the BCSC may also be included in the methods section using language such as:
"Data for this study was obtained from the BCSC: http://www.bcsc-research.org/."