Evaluation of a Natural Language Processing Approach to Identify Social Determinants of Health in Electronic Health Records in a Diverse Community Cohort

Medical Care ◽  
2022 ◽  
Vol Publish Ahead of Print ◽  
Author(s):  
Christopher J. Rouillard ◽  
Mahmoud A. Nasser ◽  
Haihong Hu ◽  
Douglas W. Roblin
2021 ◽  
Author(s):  
Ye Seul Bae ◽  
Kyung Hwan Kim ◽  
Han Kyul Kim ◽  
Sae Won Choi ◽  
Taehoon Ko ◽  
...  

BACKGROUND Smoking is a major risk factor and important variable for clinical research, but there are few studies regarding automatic obtainment of smoking classification from unstructured bilingual electronic health records (EHR). OBJECTIVE We aim to develop an algorithm to classify smoking status based on unstructured EHRs using natural language processing (NLP). METHODS With acronym replacement and Python package Soynlp, we normalize 4,711 bilingual clinical notes. Each EHR notes was classified into 4 categories: current smokers, past smokers, never smokers, and unknown. Subsequently, SPPMI (Shifted Positive Point Mutual Information) is used to vectorize words in the notes. By calculating cosine similarity between these word vectors, keywords denoting the same smoking status are identified. RESULTS Compared to other keyword extraction methods (word co-occurrence-, PMI-, and NPMI-based methods), our proposed approach improves keyword extraction precision by as much as 20.0%. These extracted keywords are used in classifying 4 smoking statuses from our bilingual clinical notes. Given an identical SVM classifier, the extracted keywords improve the F1 score by as much as 1.8% compared to those of the unigram and bigram Bag of Words. CONCLUSIONS Our study shows the potential of SPPMI in classifying smoking status from bilingual, unstructured EHRs. Our current findings show how smoking information can be easily acquired and used for clinical practice and research.


2019 ◽  
Author(s):  
Kelsey Berg ◽  
Chelsea Doktorchik ◽  
Hude Quan ◽  
Vineet Saini

Abstract Background: Electronic Health Records (EHRs) are key tools for integrating patient data into health information systems (IS). Advances in automated data collection methodology, particularly the collection of social determinants of health (SDOH), provide opportunities to advance health promotion and illness prevention through advanced analytics (i.e. “Big Data” techniques). We ask how current data collection processes in EHRs permit SDOH data to flow throughout health systems. Methods: Using a scoping review framework, we searched through medical literature to identify current practices in SDOH data collection within EHR systems. We extracted relevant information on data collection methodology, specifically focusing on uses of automated technology. We discuss our findings in the context of research methodology and potential for health equity. Results: Practitioners collect a variety of SDOH data at point of care through EHR, predominantly via embedded screening tools and clinical notes, and primarily capturing data on financial security, housing status, and social support. Health systems are increasingly using digital technology in data collection, including natural language processing algorithms. However overall use of automated technology is limited to date. End uses of data pertain to improving system efficiency, patient care-coordination, and addressing health disparities. Discussion & Conclusion: EHRs can realistically promote collection and meaningful use of SDOH data, although EHRs have not extensively been used to collect and manage this type of information. Future applied research on systems-level application of SDOH data is necessary, and should incorporate a range of stakeholders and interdisciplinary teams of researchers and practitioners in fields of health, computing, and social sciences.


2021 ◽  
Vol 89 (9) ◽  
pp. S155
Author(s):  
Nicolas Nunez ◽  
Joanna M. Biernacka ◽  
Manuel Gardea-Resendez ◽  
Bhavani Singh Agnikula Kshatriya ◽  
Euijung Ryu ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document