Top five selected features in the UNS dataset.

Top five selected features in the UNS dataset.

Source publication
Article
Full-text available
Simple Summary Breast cancer is a heterogeneous disease characterized by different risks of relapse, which makes it challenging to predict progression and select the most appropriate follow-up strategies. With the ever-growing adoption of Electronic Health Records, there are great opportunities to leverage the amount of data collected routinely in...

Context in source publication

Context 1
... top five ranked features in the UNS dataset are shown in Table 4. For each concept, the mean and standard deviation of the number of times the concept was extracted for each patient are reported. ...

Similar publications

Article
Full-text available
Objectives: The objective of this study was to explore the use of natural language processing (NLP) algorithm to categorise contributing factors from patient safety event (PSE). Contributing factors are elements in the healthcare process (eg, communication failures) that instigate an event or allow an event to occur. Contributing factors can be us...

Citations

... The breast cancer tumor recurrence date may be predicted with 78.7 percent accuracy by Gupta [17] using ML algorithms on the WPBC and WDBC datasets. By combining both unstructured and structured data from electronic health record (EHRs) (COMB, UNS, and STR databases), González-Castro et al. [18] were able to achieve a 90.0% accuracy, 89.7% F1-score, 90.7% recall, and 80.7% AUROC for predicting breast cancer recurrence. ...
Article
Full-text available
Diagnosis and prognosis are especially difficult areas of medical research related to cancer due to the high incidence of breast cancer, which has surpassed all other cancers in terms of female mortality. Another factor that has a substantial influence on the quality of life of cancer patients is the fear that they may experience a relapse of their disease. The objective of the study is to give medical practitioners a more effective strategy for using ensemble learning techniques to forecast when breast cancer may recur. This research aimed to investigate the usage of deep neural networks (DNNs) and artificial neural networks (ANNs) in addition to machine learning (ML) based approaches, including bagging, averaging, and voting, to enhance the efficacy of breast cancer relapse diagnosis on two breast cancer relapse datasets. Results from the empirical study demonstrate that the proposed ensemble learning-enabled approach improves accuracies by 96.31% and 95.81%, precisions by 96.70% and 96.15%, sensitivities by 98.88% and 98.68%, specificities by 84.62% in both, F1-scores by 97.78% and 97.40%, and area under the curve (AUCs) of 0.987 and 0.978, with University Medical Centre, Institute of Oncology (UMCIO) and Wisconsin prognostic breast cancer (WPBC) datasets respectively. Consequently, these improved disease outcomes may encourage physicians to use this model to make better treatment choices.
... Another study conducted by (González-Castro et al., 2023) demonstrated that the ensemble learning algorithm XGBoost outperformed deep learning in predicting the recurrence of breast cancer in patients, achieving a precision rate of up to 90%. Notably, this study employed clinical attributes that differed from those utilized in our research, and it did not involve the interpretation of the model to identify significant attributes. ...
Article
Full-text available
Breast cancer has become on of the leading causes of death in Indonesia. This study contributes to global efforts to combat breast cancer by improving patient outcome prediction accuracy. This study employed ensemble learning techniques such as Random Forest, XGBoost, and LightGBM. The results of the study demonstrates LightGBM's superior performance (accuracy=85%, ROC-AUC=81%, AUPR=85%). Notably, all three algorithms identify key clinical attributes: "Relapse Free Status (Months)", "Overall Survival (Months)", "Nottingham Prognostic Index", and "Lymph Nodes Examined Positive". LightGBM uniquely highlights "pam50_LumA" as significant, suggesting reduced fatality risk for Luminal A subtype patients, while others prioritize "Tumor Size". This research lays groundwork for intelligent systems to predict breast cancer outcomes, potentially transforming patient care and clinical practice.
... The ML model based on extreme gradient boosting (XGB) was selected in our study because of its generalizability, low risk of overfitting, high interpretability [25], and high scalability [34]. XGB has been confirmed to be a reliable method for recognizing patterns in other diseases such as lupus erythematosus [16], traumatic brain injury-induced coagulopathy [35], epilepsy [36], diabetes [37], Alzheimer's disease [38,39], HIV [40,41], or different types of cancer [42][43][44][45][46]. We, therefore, used the aforementioned ML technique to determine which factors were most predictive of disease severity in a closed group of patients hospitalized for COVID-19 during the first two months of the pandemic, a time when the population did not yet have herd immunity and had not yet been vaccinated. ...
Article
Full-text available
The COVID-19 pandemic demonstrated the need to develop strategies to control a new viral infection. However, the different characteristics of the health system and population of each country and hospital would require the implementation of self-systems adapted to their characteristics. The objective of this work was to determine predictors that should identify the most severe patients with COVID-19 infection. Given the poor situation of the hospitals in the first wave, the analysis of the data from that period with an accurate and fast technique can be an important contribution. In this regard, machine learning is able to objectively analyze data in hourly sets and is used in many fields. This study included 291 patients admitted to a hospital in Spain during the first three months of the pandemic. After screening seventy-one features with machine learning methods, the variables with the greatest influence on predicting mortality in this population were lymphocyte count, urea, FiO2, potassium, and serum pH. The XGB method achieved the highest accuracy, with a precision of >95%. Our study shows that the machine learning-based system can identify patterns and, thus, create a tool to help hospitals classify patients according to their severity of illness in order to optimize admission.
... Machine learning serves as a pivotal tool in our study enabling us to delve into complex medical data and make precise predictions regarding survival outcomes in patients who have undergone breast-conserving surgery (BCS) [19]. Unlike conventional prognostic models, machine learning facilitates the creation of personalized prognostic models by considering a broad spectrum of patient-speci c data, including clinical, pathological, and demographic information [20]. ...
Preprint
Full-text available
Background Breast-conserving surgery (BCS) is a viable treatment for early-stage breast cancer, but post-operative recurrence is a significant concern linked to mortality. This study leverages Machine Learning and healthcare data to better identify patients at risk of recurrence. The goal is to assess how effectively the model predicts survival factors in breast cancer patients post-BCS. Methods This study retrospectively analyzed 1518 breast cancer patients, of whom 430 were excluded due to unknown post-surgery recurrence status from January 1993 to June 2021 using XGBoost model, optimized with grid search and 5-fold cross-validation. Feature importance was determined using the Shapley value technique, and data was collected with SPSS Statistics, Version 28.0, IBM. Results The machine learning model showed high effectiveness in predicting patient outcomes, with notable metrics like accuracy (0.947) and precision (0.897). Key findings emphasize the importance of clear surgical margins and reveal that demographic factors like age and race significantly affect prognosis, while luminal subtype and comorbidity are less influential. These insights are crucial for understanding disease recurrence in breast cancer patients after BCS and radiotherapy. Conclusion The XGBoost machine learning model demonstrated outstanding predictive performance for outcomes in breast cancer patients receiving BCS and radiotherapy. It confirmed the critical importance of clear surgical margins during initial surgery for prognosis. Demographic factors, especially age and race, were identified as significant predictors of patient outcomes.
... Gupta [18] proposed a prediction time of breast cancer tumor recurrence considering machine learning approaches on WPBC and WDBC datasets, resulting in 78.7% accuracy. Castro et al. [19] proposed to predict breast cancer Recurrence using structured and unstructured sources from economic health records considering STR, UNS, and COMB datasets and resulted in 90.0% precision, 90.7% recall, 89.7% F1-score, and 80.7% AUROC. ...
Article
Full-text available
Predicting progression and deciding on the best follow-up techniques for breast cancer patients is difficult because the illness is diverse and characterized by varying relapse risks. Due to its prevalence, breast cancer has become the top cause of mortality among women worldwide, making diagnosis and prognosis particularly challenging areas of medical study. In addition, the fear of a cancer relapse is a major factor influencing cancer patients' quality of life. The study aims to help doctors determine the likelihood of a breast cancer relapse by applying ensemble learning techniques. In this research, artificial neural networks (ANN) and deep neural networks (DNN) ensembled with Weighted averaging, minority, and majority voting approaches have been investigated for performance enhancements on the breast cancer recurrence dataset sourced from the UCI-ML repository. The empirical analysis shows that this ensemble learning-enabled proposed novel approach shows improved accuracy, precision, sensitivity, specificity, and F1-score of 96.21%, 96.59%, 98.84%, 84.62%, and 97.41%, respectively. The findings of this study can aid doctors in making more informed treatment decisions, thereby improving patient outcomes.
... Several ML/AI studies have been conducted in oncology [29][30][31][32][33][34][35]50,51]. Some of these have a empted to predict 5-10-year breast cancer recurrences using both structured and unstructured clinicopathological data from Electronic Health Records (EHRs) [50,51]; however, these studies focused primarily on predicting local rather than distant recurrences. ...
... Several ML/AI studies have been conducted in oncology [29][30][31][32][33][34][35]50,51]. Some of these have a empted to predict 5-10-year breast cancer recurrences using both structured and unstructured clinicopathological data from Electronic Health Records (EHRs) [50,51]; however, these studies focused primarily on predicting local rather than distant recurrences. A previous study [52] predicted distant recurrences in breast cancer from both unstructured and structured clinical data in EHR using natural language processing and deep learning algorithms but lacked structured clinicopathological predictors such as those used in our study. ...
... Several ML/AI studies have been conducted in oncology [29][30][31][32][33][34][35]50,51]. Some of these have attempted to predict 5-10-year breast cancer recurrences using both structured and unstructured clinicopathological data from Electronic Health Records (EHRs) [50,51]; however, these studies focused primarily on predicting local rather than distant recurrences. ...
Article
Full-text available
Simple Summary Breast cancer is a diverse disease with varying prognoses, even within the same subtype. Approximately 30% of breast cancer patients experience distant organ recurrence, known as metastasis, after treatment. The evaluation of breast tumors and surrounding lymph nodes occurs before and after neoadjuvant therapy, which aims to shrink the tumor before surgery. Following resection, residual tumor cells may remain in the breast tissue, lymph nodes, or other areas, necessitating adjuvant therapy. Typically, a follow-up visit is scheduled a year or more after adjuvant therapy, during which metastasis may be detected. By utilizing machine learning techniques, metastasis can be predicted earlier in a clinical setting, allowing for tailored surveillance and treatment strategies. This has the potential to significantly enhance the quality of life for breast cancer patients. Abstract Breast cancer is the most common type of cancer worldwide. Alarmingly, approximately 30% of breast cancer cases result in disease recurrence at distant organs after treatment. Distant recurrence is more common in some subtypes such as invasive breast carcinoma (IBC). While clinicians have utilized several clinicopathological measurements to predict distant recurrences in IBC, no studies have predicted distant recurrences by combining clinicopathological evaluations of IBC tumors pre- and post-therapy with machine learning (ML) models. The goal of our study was to determine whether classification-based ML techniques could predict distant recurrences in IBC patients using key clinicopathological measurements, including pathological staging of the tumor and surrounding lymph nodes assessed both pre- and post-neoadjuvant therapy, response to therapy via standard-of-care imaging, and binary status of adjuvant therapy administered to patients. We trained and tested four clinicopathological ML models using a dataset (144 and 17 patients for training and testing, respectively) from Duke University and validated the best-performing model using an external dataset (8 patients) from Dartmouth Hitchcock Medical Center. The random forest model performed better than the C-support vector classifier, multilayer perceptron, and logistic regression models, yielding AUC values of 1.0 in the testing set and 0.75 in the validation set (p < 0.002) across both institutions, thereby demonstrating the cross-institutional portability and validity of ML models in the field of clinical research in cancer. The top-ranking clinicopathological measurement impacting the prediction of distant recurrences in IBC were identified to be tumor response to neoadjuvant therapy as evaluated via SOC imaging and pathology, which included tumor as well as node staging.