Automated EHR text de-identification

The U.S. Veterans Healthcare Administration (VHA) Consortium for Healthcare Informatics Research (CHIR) is a multi-disciplinary group of collaborating investigators affiliated with VHA sites across the U.S. In the context of the CHIR, this de-identification project focused on investigating the current state of the art of automatic clinical text de-identification, on developing a best-of-breed de-identification application for VHA clinical documents, and on evaluating its impact on subsequent text analysis tasks and the risk for re-identification of this de-identified text.

The VHA best-of-breed clinical text de-identification system (BoB) combines best-of-breed methods and resources for each type of PHI. Knowledge of the methods and resources performing the best was based on a large literature and technology review, and evaluations of several existing text de- identification applications on various clinical text corpora. These evaluations demonstrated that no existing system reached sufficient accuracy when de-identifying VHA clinical text, even when trained with such clinical text. These findings motivated the development and evaluation of a new system for VHA clinical text de-identification, a system combining methods and resources allowing for the best detection accuracy with each type of PHI. High sensitivity and high positive predictive value are often not compatible when detecting or extracting information from text. A highly sensitive system typically has lower positive predictive value (i.e., produces more false positives), and a highly precise system usually has lower sensitivity (i.e., more false negatives). Text de-identification requires very high sensitivity, but too many false positives could damage the non-PHI information in clinical notes. To try and combine very high sensitivity with high positive value, we developed a “hybrid stepwise” approach for BoB. After pre-processing of the clinical text, this approach consists in starting with a component focused only on high sensitivity, even if producing numerous false positives, and then continuing with a filtering component that filters out false positives.

Original Deid

Publications:

  • Kim, Y., Heider, P., & Meystre, S. M. (2018). Ensemble-based Methods to Improve De-identification of Electronic Health Record Narratives (pp. 663–672). Presented at the AMIA Annu Symp Proc, San Francisco, CA.
  • Meystre, S., Carrell, D., Hirschman, L., Aberdeen, J., Fearn, P., Petkov, V., & Silverstein, J. C. (2018). Automatic Text De-Identification: How and When is it Acceptable? (pp. 124–126). Presented at the AMIA Annu Symp Proc, San Francisco, CA.
  • Meystre, S., Heider, P., Heider, Kim, Y., Trice, A., & Underwood, G. (2018). Clinical Text Automatic De-Identification to Support Large Scale Data Reuse and Sharing: Pilot Results (p. 2069). Presented at the AMIA Annu Symp Proc, San Francisco, CA.
  • Khalifa, A., & Meystre, S. (2017). Learning to De-Identify Clinical Text with Existing Hybrid Tools (pp. 150–151). Presented at the AMIA Joint Summits on Translational Science proceedings, San Francisco, CA.
  • Meystre, S. M. (2015). De-identification of Unstructured Clinical Data for Patient Privacy Protection. In Medical Data Privacy Handbook (pp. 697–716). Cham: Springer International Publishing. http://doi.org/10.1007/978-3-319-23633-9_26
  • Redd, A., Pickard, S., Meystre, S., Scehnet, J., Bolton, D., Heavirland, J., et al. (2015). Evaluation of PHI Hunter in Natural Language Processing Research. Perspect Health Inf Manag, 12, 1f.
  • Meystre, S. M., Ferrandez, O., Friedlin, F. J., South, B. R., Shen, S., & Samore, M. H. (2014). Text de-identification for privacy protection: a study of its impact on clinical text information content. Journal of Biomedical Informatics, 50, 142–150. 
  • South, B. R., Mowery, D., Suo, Y., Leng, J., Ferrandez, O., Meystre, S. M., & Chapman, W. W. (2014). Evaluating the effects of machine pre-annotation and an interactive annotation interface on manual de-identification of clinical text. Journal of Biomedical Informatics, 50, 162–172. 
  • Meystre, S., Shen, S., Hofmann, D., & Gundlapalli, A. (2014). Can Physicians Recognize Their Own Patients in De-identified Notes? Studies in Health Technology and Informatics, 205, 778–782.
  • Meystre, S., H, D., Aberdeen, J., & Malin, B. (2013). Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? (pp. 1–3). Medinfo 2013.
  • Meystre, S. M., Ferrandez, O., South, B. R., Shen, S., & Samore, M. H. (2013). How Much Does Automatic Text De-Identification Impact Clinical Problems, Tests, and Treatments? (pp. 1–1). AMIA Summits Transl Sci Proc, CRI.
  • Ferrandez, O., South, B. R., Shen, S., Friedlin, F. J., Samore, M. H., & Meystre, S. M. (2013). BoB, a best-of-breed automated text de-identification system for VHA clinical documents. Journal of the American Medical Informatics Association, 20(1), 77–83.
  • Nokes, N., Meystre, S., Scehnet, J. S., South, B., Shen, S., Maw, M., et al. (2012). A Survey of VHA Privacy Officers for the External Use of Automatically De- Identified Clinical Documents (p. 1879). AMIA Annu Symp Proc.
  • Ferrandez, O., South, B. R., Shen, S., Friedlin, F. J., Samore, M. H., & Meystre, S. M. (2012). Evaluating current automatic de-identification methods with Veteran’s health administration clinical documents. BMC Medical Research Methodology, 12(1), 109. 
  • Ferrandez, O., South, B. R., Shen, S., & Meystre, S. M. (2012). A Hybrid Stepwise Approach for De-identifying Person Names in Clinical Documents (pp. 65–72). Proceedings of the 2012 Workshop on Biomedical Natural Language Processing (BioNLP 2012), Montreal, Canada.
  • Ferrandez, O., South, B., Shen, S., Maw, M., Nokes, N., Friedlin, F. J., & Meystre, S. (2012). Striving for Optimal Sensitivity to De-identify Clinical Documents (p. 117). AMIA Summits Transl Sci Proc, CRI.
  • South, B., Shen, S., Maw, M., Ferrandez, O., Friedlin, F. J., & Meystre, S. (2012). Prevalence Estimates of Clinical Eponyms in De-Identified Clinical Documents (p. 136). AMIA Summits Transl Sci Proc, CRI.
  • Friedlin, F. J., South, B., Shen, S., Ferrandez, O., Nokes, N., Maw, M., et al. (2012). An Evaluation of the Informativeness of De-identified Documents (p. 128). PAMIA Summits Transl Sci Proc, CRI.
  • Ferrandez, O., South, B. R., Shen, S., Friedlin, F. J., Samore, M. H., & Meystre, S. M. (2012). Generalizability and comparison of automatic clinical text de-identification methods and resources. AMIA Annu Symp Proc, 2012, 199–208.
  • Shen, S., South, B., Friedlin, F. J., & Meystre, S. (2011). Coverage of Manual De-identification on VA Clinical Documents. AMIA Annu Symp Proc, 1958.
  • Meystre, S. M., Friedlin, F. J., South, B. R., Shen, S., & Samore, M. H. (2010). Automatic de-identification of textual documents in the electronic health record: a review of recent research. BMC Medical Research Methodology, 10, 70.