MedNER: A Service-Oriented Framework for Chinese Medical Named-Entity Recognition with Real-World Application
Next Article in Journal
FinSoSent: Advancing Financial Market Sentiment Analysis through Pretrained Large Language Models
Next Article in Special Issue
Leveraging Mixture of Experts and Deep Learning-Based Data Rebalancing to Improve Credit Fraud Detection
Previous Article in Journal
A Multi-Modal Machine Learning Methodology for Predicting Solitary Pulmonary Nodule Malignancy in Patients Undergoing PET/CT Examination
Previous Article in Special Issue
An Efficient Probabilistic Algorithm to Detect Periodic Patterns in Spatio-Temporal Datasets
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

MedNER: A Service-Oriented Framework for Chinese Medical Named-Entity Recognition with Real-World Application

by
Weisi Chen
1,†,
Pengxiang Qiu
2,† and
Francesco Cauteruccio
3,*,†
1
School of Software Engineering, Xiamen University of Technology, Xiamen 361024, China
2
Faculty of Innovation Engineering, Macau University of Science and Technology, Macau 999078, China
3
Department of Information Engineering, Electrical Engineering and Applied Mathematics (DIEM), University of Salerno, 84084 Fisciano, Italy
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Big Data Cogn. Comput. 2024, 8(8), 86; https://doi.org/10.3390/bdcc8080086
Submission received: 18 June 2024 / Revised: 18 July 2024 / Accepted: 31 July 2024 / Published: 2 August 2024
(This article belongs to the Special Issue Big Data and Information Science Technology)

Abstract

:
Named-entity recognition (NER) is a crucial task in natural language processing, especially for extracting meaningful information from unstructured text data. In the healthcare domain, accurate NER can significantly enhance patient care by enabling efficient extraction and analysis of clinical information. This paper presents MedNER, a novel service-oriented framework designed specifically for medical NER in Chinese medical texts. MedNER leverages advanced deep learning techniques and domain-specific linguistic resources to achieve good performance in identifying diabetes-related entities such as symptoms, tests, and drugs. The framework integrates seamlessly with real-world healthcare systems, offering scalable and efficient solutions for processing large volumes of clinical data. This paper provides an in-depth discussion on the architecture and implementation of MedNER, featuring the concept of Deep Learning as a Service (DLaaS). A prototype has encapsulated BiLSTM-CRF and BERT-BiLSTM-CRF models into the core service, demonstrating its flexibility, usability, and effectiveness in addressing the unique challenges of Chinese medical text processing.

1. Introduction

1.1. Background and Motivation

In the information age, data are proliferating with a speed and complexity we have never seen before, especially the exploding amount of textual data on the Internet. This situation presents us with both great challenges and unprecedented opportunities. As one of the core tasks of natural language processing (NLP), named-entity recognition (NER) aims to extract information with practical significance from text data, such as the names of people, places, and organizations in organizational documents, as well as the names of diseases or medications in medical documents [1]. NER technology plays an important supportive role in the fields of text analysis, information retrieval, the construction of knowledge graphs, etc. [2,3]. Its effective application can not only significantly improve the efficiency and accuracy of information processing but also promote the intelligent decision-making process and facilitate the discovery of knowledge.
In today’s accelerated advancement of artificial intelligence (AI), deep learning (DL) models like the pre-trained BERT model [4] have achieved impressive results for NER. Compared with traditional lexicon-based methods [5], BERT can capture complex linguistic patterns and deep semantics within texts and drastically improve NER performance.
Despite the remarkable advancements in NER technology, its implementation still faces many challenges. First, there is a notable gap in the development and implementation of NER systems tailored specifically for Chinese medical texts. Existing NER models often fall short of capturing the complex and nuanced medical terminology present in Chinese clinical narratives, resulting in suboptimal extraction of relevant entities. Furthermore, most current NER approaches are monolithic, lacking the flexibility required for integration with diverse healthcare IT systems. This rigidity limits their applicability in real-world clinical environments, where scalability, interoperability, and ease of deployment are critical.
Moreover, it is a significant challenge for non-IT experts, such as clinicians and healthcare administrators, to efficiently select and utilize these NER tools. Most existing literature focuses on enhancing one particular NER model to uplift the performance. However, no consensus has been reached on which model works best, e.g., [6,7,8,9]. This indicates that the performance of pre-trained models may be inconsistent when used against various datasets and contexts. Thus, there is no one-size-fits-all solution that can be directly chosen as a no-brainer. A systematic method that can effectively encapsulate multiple pre-trained NER models to enhance user-friendliness and flexibility is essential for adapting to healthcare providers’ heterogeneous and evolving needs. The absence of a user-friendly architecture in these systems hampers the ability to update and customize NER functionalities easily, leading to inefficiencies and increased costs in maintaining and scaling the systems. Consequently, healthcare providers face significant challenges in deploying robust and adaptable NER solutions that can integrate with existing IT infrastructures, limiting the potential for advanced clinical decision support and comprehensive data analysis in medical management.

1.2. Research Questions and Contributions

This paper aims to address the following research questions:
RQ 1: Is there a systematic method that can facilitate non-IT experts, such as clinicians and healthcare administrators, in utilizing advanced NER capabilities for clinical decision support and data analysis?
This research question aims to investigate the development of a systematic approach that empowers non-IT experts, such as clinicians and healthcare administrators, to effectively utilize advanced NER tools. The focus is on creating a method that simplifies the deployment and use of NER technologies without requiring extensive technical knowledge. This involves user interface design, workflow integration, and user customization.
RQ 2: How can a systematic framework balance the usability, user-friendliness, accuracy, and efficiency of NER for Chinese medical texts?
This research question explores the challenge of designing a framework that not only achieves high accuracy and efficiency in NER tasks but also remains user-friendly and usable for non-technical users. Balancing these aspects involves in-built candidate models with high performance, flexible modular architecture, automation, and simplification.
To address these research questions, we propose a systematic framework called MedNER that features a service-oriented architecture that can encapsulate multiple pre-trained NER models. The contributions of this work are as follows:
  • Introduce MedNER, a novel systematic framework featuring a service-oriented architecture that leverages the concept of “Deep Learning as a Service” (DLaaS). This framework is designed to encapsulate multiple pre-trained NER models, providing flexibility, scalability, and ease of integration with existing healthcare IT systems.
  • Employ a systematic method for integrating and evaluating multiple pre-trained NER models. This method enhances the framework’s adaptability to diverse datasets and contexts, ensuring robust performance and facilitating automatic comprehensive data analysis. The paper provides a thorough evaluation of the framework using real-world datasets and involving a healthcare expert testing the system.
  • Demonstrate an intuitive, user-friendly interface tailored for non-IT experts such as clinicians and healthcare administrators. This design facilitates the efficient use of advanced NER capabilities for clinical decision support and data analysis without requiring extensive technical knowledge.
The rest of the paper is organized as follows. Section 2 discusses related work. Section 3 illustrates the proposed framework. Section 4 describes the selection and comparison of various NER models. Section 5 demonstrates a prototype of the proposed framework. Section 6 concludes the paper and highlights future research directions.

2. Related Work

NER is one of the core tasks of NLP [10], aiming at extracting and recognizing entities with explicit meanings from text, laying the foundation for advanced tasks such as knowledge discovery and text classification. Since its inception in the 1990s, NER techniques have undergone significant development [11], from lexicon-based traditional machine learning to state-of-the-art DL-based methods.
The Chinese medical NER initially relied on pattern-matching techniques based on lexicons and rules [5] and gradually evolved to utilize traditional machine learning methods such as Plain Bayes [12], Support Vector Machines [13], Maximum Entropy Models (MEM) [14], the Hidden Markov Model (HMM) [15], and Conditional Random Fields (CRFs) [16]. These models leverage the combination of complex feature engineering and statistical methods to identify entities within textual data. Feature engineering, a pivotal aspect of this process, involves the extraction of salient information from raw text, such as POS tags, syntactic structures, and morphological variations, thereby equipping models with the necessary insights to comprehend the text. However, this process can be complicated and error-prone.
With the advancements in DL techniques, researchers have begun to explore more sophisticated and efficient modeling architectures for Chinese medical NER, such as recurrent neural networks (RNNs) [17] and their variants like long short-term memory (LSTM) and Bidirectional LSTM (BiLSTM) [18]. These models are capable of learning deeper contextual features in text, thus significantly improving the accuracy and efficiency of entity recognition tasks. Hybrid models that combine these DL models with other techniques like word vectors and CRFs have also been broadly used to further enhance performance. For instance, a study [19] has proposed multi-head self-attention based BiLSTM-CRF for Chinese clinical named-entity recognition and reported superior performance to other benchmarks. Table 1 provides a comprehensive overview of some prevalent models for the NER task, highlighting the key characteristics, advantages, and disadvantages, facilitating a better understanding of their applications and effectiveness.
Pre-trained models such as BERT [25] have greatly advanced the development of Chinese medical NER technology. It has been reported in some studies that models combining BERT with LSTM, BiLSTM, convolutional neural networks (CNNs), the attention mechanism, and/or CRF may improve the accuracy of NER but also show excellent cross-domain adaptability, i.e., learning and understanding the semantic information of the text better and obtaining higher accuracy scores with different datasets [26,27]. A representative model proposed recently is MC-BERT [28], which combines BERT with BiLSTM, a CNN, and a CRF layer to handle the complexities of Chinese electronic medical records. This hybrid approach aims to leverage the strengths of each component—BERT for contextual word embeddings, BiLSTM for sequence processing, the CNN for feature extraction, and the CRF layer for sequence labeling, optimizing the recognition of medical entities. In [6], instead, the NER task is divided into two branch tasks which focus on entity boundary and type recognition. The approach integrates medical entity dictionary information and Chinese radical features and shows how such a combination effectively improves performance. Another study [7] compares the performance of some BERT-based hybrid models for Chinese medical NER, including BioBERT, BlueBERT, PubMedBERT, and SciBERT, and found that PubMedBERT performed best by a small margin. Representative studies on NER in Chinese medical text analysis are summarized in Table 2, together with a brief overview of their key findings.
As seen in the literature, most existing studies concentrate on improving the model itself in the context of a particular domain or a particular language, and they sometimes provide contradictory conclusions on which model performs best; e.g., BERT does not outperform word2vec as reported in [8], which contradicts many studies such as [9]. This could confuse non-IT experts who would like to deploy such models in their real-world applications. New hybrid models are being continuously proposed in the academic world, which adds to the complexity of model selection by non-IT experts such as clinicians and healthcare administrators. Furthermore, advanced models often require substantial computational resources and deep technical expertise to deploy and maintain, which can be another barrier for non-IT experts.
In contrast, our proposed MedNER framework aims to bridge these gaps by providing a systematic, user-friendly platform that encapsulates these advanced models into a service-oriented architecture. MedNER simplifies the deployment and operational processes, making it more accessible for non-technical users to utilize advanced NER tools effectively via the DLaaS mechanism. It integrates these technologies with an intuitive interface and automates the model selection and data handling processes, thus balancing usability with high-performance NER capabilities.

3. Proposed Framework

3.1. MedNER: A Service-Oriented Framework

To address the research questions raised in Section 1.2, we propose MedNER, a novel unified framework designed to tackle the challenges of Chinese medical NER tasks. MedNER leverages a service-oriented architecture, incorporating multiple pre-trained NER models to enhance flexibility, user-friendliness, and adaptability.
The framework is composed of several key services, each contributing to a seamless and efficient NER process. Figure 1 describes the framework, with red lines denoting the data flow and blue lines denoting the model flow. Below is a detailed description of the services integrated within MedNER.
  • Text Import Service: the entry point for the MedNER framework, responsible for ingesting and preprocessing various types of clinical texts. This service supports multiple input formats, including plain text, XML, JSON, and common medical document formats used in electronic health records (EHRs). Key functionalities include data cleaning, normalization, format detection, and language detection. In general, this service encompasses (complete or partial) Extraction, Load, and Transformation (ETL) functions that prepare the data for the following tasks in the workflow.
  • Model Selection Service: a service that provides users with the ability to choose from a range of pre-trained NER models optimized for various medical domains and tasks. This service addresses the challenge of selecting the most suitable model for specific datasets and contexts. Key features include a model repository of pre-trained models (including those specifically trained on Chinese medical texts), model comparison (for comparing the performance of different models based on various metrics such as accuracy, precision, recall, and F1-score [29]), and automated recommendation (providing a default model for the best-suited models based on the characteristics of the input text and the specific NER task). The pre-trained model repository is structured to facilitate easy access, management, and integration of various pre-trained NER models. It includes model metadata in a structured format like JSON or YAML, describing model architecture, training dataset, performance metrics, compatibility information, and versions. Models are stored in a cloud-based storage system, such as AWS, ensuring scalability and reliability. Access to models is managed through a secure API gateway, allowing authenticated and authorized users or services to fetch models as needed. Model version control could be in place using Git-like repositories.
  • Simple Customization Service: a service designed to enhance user-friendliness by allowing non-IT experts to customize the NER process according to their needs. Features include entity selection, enabling users to specify which types of entities (e.g., medical-related entities, general entities like locations and dates, etc.), and model selection, allowing users to select a model with automated parameter settings without requiring deep technical knowledge.
  • Deep Learning as a Service (DLaaS): the core component of the MedNER framework, providing the computational infrastructure necessary to run DL models for NER tasks. Key aspects of DLaaS include scalable infrastructure utilizing cloud-based resources, inference using pre-trained models, and performance optimization through techniques such as parallel processing, and GPU acceleration. We will provide more information about this component in the following sections.
  • Result and Visualization Service: a service providing users with a comprehensive view of the NER outcomes, facilitating data analysis and decision-making. Key features include user-friendly dashboards and reports summarizing and highlighting the NER results. Multiple export options can be implemented utilizing processing modules to define an interoperable pipeline offering outputs in various formats, such as TXT, CSV, and JSON, to enable a seamless integration of the results with other downstream healthcare information systems and tools.

3.2. Technologies Involved

The development and implementation of the MedNER framework involve a range of advanced technologies and methodologies, each playing a crucial role in achieving the desired functionality, flexibility, and performance. In the following, we highlight the technologies employed in MedNER.

3.2.1. Traditional NER Model Elements

Some DL elements that have recently been used include CRFs and BERT. The CRF model, formally known as the Conditional Random Field model, is a type of probabilistic graphical model commonly used for modeling sequence-labeling problems. Traditional HMM has certain limitations in sequence-labeling tasks, such as considering only local label dependencies and assuming independence between current observations and current states. In contrast, the CRF model is an undirected graphical model that can more flexibly model dependencies between labels. In a CRF model, each position in the sequence is modeled as a node, and the connections between nodes represent the dependencies between labels. The CRF model integrates the relationships between local observations and global labels by defining feature functions on the nodes and transition probabilities between labels to model the entire sequence.
The fundamental idea of the CRF model is to find the optimal hidden label sequence given a set of observation sequences, such that the conditional probability is maximized. Inference algorithms in probabilistic graphical models, such as the forward–backward algorithm and the Viterbi algorithm [30], are used for training and predicting with the CRF model.

3.2.2. Deep Learning as a Service

The core of MedNER’s NER capabilities is built on various types of neural networks, particularly those well-suited for natural language processing tasks, such as long short-term memory (LSTM) networks and Transformer-based models like BERT. In addition, pre-trained models such as BERT, RoBERTa, and other variants specifically fine-tuned for medical text in Chinese can also be leveraged. These models provide a strong foundation for NER tasks by capturing contextual relationships in text. Note that advanced tokenization techniques can be used to handle the segmentation of Chinese text, which lacks explicit word boundaries. This includes the use of Python libraries like Jieba and SpaCy. In line with what has been explored in the literature [31], we adopt the DLaaS to facilitate the flexibility of selecting and utilizing various pre-trained DL models. Furthermore, exploiting DLaaS enables a series of advantages that take an effective place in our context. Using DLaaS enables researchers and practitioners to use sophisticated and state-of-the-art approaches without heavily investing in on-premise infrastructure. Also, the inherent scalability expressed by DLaaS architecture provides useful advantages for systems that could eventually manage different computational needs.
NER models, along with any MedNER implementation, can be deployed on cloud platforms to leverage the scalability and computational power of cloud infrastructure. Services like AWS, Google Cloud, or Azure can be utilized for this purpose. Tools and frameworks such as TensorFlow Serving, Kubernetes, and Docker can also be used to manage the deployment, scaling, and orchestration of NER models in a service-oriented architecture.

3.2.3. Service-Oriented Architecture (SOA)

MedNER is designed using a service-oriented architecture, where each NER model is encapsulated as an individual service. The Service-Oriented Architecture (SOA) is a design pattern in software architecture where application components provide services to other components through a communication protocol, typically over a network. These services are designed to be reusable and can be independently deployed, managed, and combined to form complex applications. This modular approach offers benefits like flexibility, scalability, interoperability, service reusability, and easy maintenance [32].
SOA provides the following benefits in implementing MedNER:
  • Scalability: Medical texts, especially in large healthcare institutions, can be vast and continuously growing. A service-oriented architecture allows for scalable infrastructure, which can dynamically allocate resources based on the volume of data and the complexity of tasks.
  • Flexibility: Different NER tasks may require different models or configurations. By modularizing the system into distinct services, each component can be updated or replaced independently without disrupting the entire framework. This flexibility is crucial for adapting to new advancements in NER technology and varying user needs.
  • Resource Efficiency: Utilizing cloud-based resources and techniques such as parallel processing and GPU acceleration ensures that computational resources are used efficiently. This is particularly important for deep learning tasks, which can be resource-intensive.
  • User Accessibility: The architecture is designed to be user-friendly, enabling non-IT experts to leverage advanced NER models without needing deep technical knowledge. This democratization of technology ensures broader accessibility and usability across different user groups within the medical field.
Commonly, RESTful APIs [33] are used to enable seamless communication between different services and integration with existing healthcare information systems and electronic health records. Services can be encapsulated using RESTful APIs. Encapsulating services with RESTful APIs allows each service to interact with others in a standardized way, ensuring that the communication between different parts of the system is consistent, scalable, and maintainable. Service-to-service communication is enabled using these RESTful APIs. The APIs are designed to be stateless, ensuring that each request is processed independently, enhancing scalability. Figure 2 shows details about how the key services in the MedNER framework can be encapsulated using RESTful APIs.
Services communicate synchronously using HTTP requests. For instance, when the frontend requests the NER Processing Service to process text, it sends an HTTP POST request to the “/dlaas” endpoint, and the service responds with the recognized entities.
For tasks that require longer processing times, services can communicate asynchronously using message queues like RabbitMQ or Apache Kafka. This ensures that the frontend remains responsive, and users can be notified once the processing is complete.

3.2.4. User Interface and Experience (UI/UX)

The frontend of MedNER is designed to be user-friendly, enabling non-IT experts to interact with the system seamlessly. The development of intuitive web-based interfaces that allow non-IT experts to interact with the NER system easily is crucial. This includes the use of frontend technologies like React.js (https://react.dev/, accessed on 17 July 2024), Angular (https://angularjs.org/, accessed on 17 July 2024), or Vue.js (https://vuejs.org/, accessed on 17 July 2024). The implementation of the data visualization function to present the extracted entities and analysis results in a clear and comprehensible manner is essential. Libraries such as D3.js (https://d3js.org/, accessed on 17 July 2024) and Chart.js (https://www.chartjs.org/, accessed on 17 July 2024) are employed for this purpose.
The frontend uses AJAX (Asynchronous JavaScript and XML) and WebSockets (Version 12.0) to communicate with the backend, ensuring that users can upload data, configure models, and receive results without experiencing delays. This asynchronous communication is useful for maintaining a responsive user interface. In addition, the frontend should provide options for users to customize the NER process, such as selecting specific entity types and adjusting model parameters. These configurations are sent to the backend through RESTful API calls, which then trigger the appropriate services.

4. Service Layer: Candidate Model Comparison and DLaaS

This section delves into the backend service layer, focusing on the candidate model comparison and recommendation, as well as the concept of DLaaS. We define a real-world problem on the NER task of diabetes-related texts. Once the data are obtained, the comparison of candidate models is performed by the Model Selection Service. After evaluation, a recommended model is given as the default model choice. Then, the user can opt to take this advice or select a different model in the pre-trained model repository. Note that all this comparison and evaluation process happens at the backend, hiding the technical details from users who have little to no IT expertise. The backend services have been developed using the Python language, combined with the Flask framework, to develop RESTful APIs that handle frontend requests and service-to-service interoperations. A healthcare expert has been involved in the validation of the prototype.
The rest of this section will now illustrate the prototype of our system and its application on a defined use case, with a particular emphasis on the model comparison process and its details.

4.1. Data Preparation

We have used a Chinese corpus on the topic of healthcare (diabetes as an example), sourced from Alibaba Tianchi, which was collected from published diabetes research papers, clinical guidelines, and diabetes education materials. This corpus not only covers the basic medical knowledge of diabetes but also describes in detail the clinical manifestations of diabetes, treatment means and possible complications, and other information. An example of the corpus can be seen in Figure 3. The corpus comprises a total of 363 text files. These files were annotated using the brat software (Version 1.3) [34] to create corresponding 363 annotated files.
When analyzing the corpus, we found that disease, body part, medication, test, and symptom are the entity categories that are described with high frequency. These categories are important in the field of diabetes in that they not only reflect the clinical features of the disease but also involve key information about the diagnostic and therapeutic process. Therefore, we subdivided the valid information in the corpus into four categories of entities: disease, body parts, drugs, and tests. Each category of entities is described as follows:
  • Disease: Disease entities refer to specific types of diabetes or related complications mentioned in the text, such as Type 2 diabetes, diabetic nephropathy, and so on. This information is critical to understanding the patient’s health status and medical needs.
  • Body Part (Anatomy): Body part entities indicate the specific location where the lesion or symptom occurs, e.g., pancreas, blood vessels, etc. In the context of diabetes, clarifying the body part helps in accurate diagnosis and treatment.
  • Drug: Drug entities include all medicines used to treat diabetes and its complications, e.g., insulin, metformin, etc. Drug information is extremely important for disease management and drug regimen development.
  • Test: Test entities include a variety of tests and measurements used to diagnose and monitor diabetes, including measurement of blood glucose levels and glycosylated hemoglobin (HbA1c). This information plays a critical role in accurately assessing the severity of a diabetic’s condition and tracking progress in treatment.
In this study, the BIO labeling method was used, where “B” represents the beginning part of the entity, “I” represents the internal part of the entity, and “O” represents the non-entity part. Each entity type has its corresponding B and I labels for labeling the beginning and internal parts of the entity. For example, for the disease entity “Type 2 Diabetes”, “Type 2” will be labeled as “B-Disease” and “Diabetes” will be labeled as “I-Disease”. The specific named-entity types are shown in Table 3.
To create a high-quality corpus of diabetes texts, we chose a random sampling and labeling approach, which in turn divided the texts into a training set and a test set. Specifically, we randomly selected 364 diabetes texts from the corpus and labeled them in detail using the brat software. Then, we categorized 30% of the texts into the test set and the rest into the training set. Through this approach, we not only laid a solid foundation for subsequent data analysis and model training but also ensured the quality and consistency of the corpus. Table 4 shows the number of named entities. Following the above labeling method, a diabetic corpus is then labeled. An example of it follows:
糖\O 尿\O 病\O 患\O 者\O, \O 无\O 论\O 病\O 情\O 的\O 严\O 重\O 程\O 度\O 如\O 何\O, \O 不\O 管\O 是\O 需\O 要\O 注\O 射\O 胰\B-Drug 岛\I-Drug 素\I-Drug 还\O 是\O 服\O 用\O 口\O 服\O 降\O 糖\O 药\O, \O 都\O 必\O 须\O 严\O 格\O 控\O 制\O 饮\O 食\O. \O 我\O 们\O 要\O 进\O 行\O 每\O 周\O 的\O 血\B-Test 糖\I-Test 测\O 试\O, \O 保\O 证\O 不\O 会\O 因\O 为\O 胰\B-Anatomy 岛\I-Anatomy 素\I-Anatomy 分\O 泌\O 不\O 足\O 导\O 致\O 身\O 体\O 的\O 各\O 项\O 机\O 能\O 受\O 伤\O.

4.2. Candidate NER Models in Prototype

In this prototype, we have chosen two prevalent NER models, BiLSTM-CRF and BERT-BiLSTM-CRF, as the candidate models that are potentially not only accurate but also practical for real-world healthcare applications where computational resources might be limited.
The rationale behind the selection is that balancing model performance with computational efficiency is crucial for practical reasons. Many other complex models like RoBERTa, ALBERT, XLNet, ELECTRA [35], ERNIE [36], BioBERT, SciBERT, and T5 [37] are known for their high performance but also come with increased computational requirements and complexity. Thus, in this prototype, we have selected the most prevalent state-of-the-art models, but it is noteworthy to point out how the interoperable and modular nature of our system allows one to easily extend it with other different pre-trained models. In what follows, we present the two integrated NER models of choice.

4.2.1. BiLSTM-CRF

BiLSTM-CRF is a hybrid model that combines BiLSTM and CRF, which has been widely used for Chinese NER tasks. As shown in Figure 4, the BiLSTM layer captures forward and backward dependencies within the text through its bi-directional structure and thus efficiently extracts Chinese textual features in a sentence, whereas the CRF layer optimizes the labeling results of the whole sequence by learning the dependencies between these features. This combined use of BiLSTM and CRF not only enhances the model’s ability to recognize Chinese-related entities but also improves the accuracy and consistency of the annotation, making the BiLSTM-CRF model a powerful tool for text processing in the medical field.
In this model, all embeddings are randomly initialized and adjusted with the iterative process of training. The input of BiLSTM-CRF is the word embedding vector and the output is the predicted label corresponding to each word. While it is possible to train BiLSTM-based NER models without a CRF layer, the inclusion of a CRF layer can impose constraints such as ensuring that sentences begin with “B-” or “O” rather than “I-”, ensuring that labels of the same entity class are consecutive in the schema and standardizing that named entities should start with “B-” instead of “I-”. These constraints can significantly reduce the number of incorrectly predicted sequences and improve the accuracy and robustness of the model.

4.2.2. BERT-BiLSTM-CRF

The BERT-BiLSTM-CRF model can be regarded as a structure consisting of three main layers: the BERT layer, as shown in Figure 5, the BiLSTM layer, and the CRF decoding layer. Specifically, the function of the BERT layer is to convert the characters in a sentence into vectorial representations. Immediately after that, the BiLSTM layer is responsible for capturing the contextual semantic features embedded in these vectors, and finally, the CRF decoding layer accomplishes the entity recognition and segmentation tasks by generating the globally optimal label sequences.
In particular, the BERT module of the hybrid model is a pre-trained language model introduced by Google AI Institute in October 2018, iconic to advancing the field of natural language processing. It adopts a bidirectional Transformer architecture that relies on the attention mechanism for word representation and processing, as shown in Figure 6.
The input part of the BERT is a linear sequence that consists of two sentences (sentence A and sentence B) which are split by a separator and two special identifier numbers are added at the top and the end. There are three embeddings for each word: the positional information embedding, which encodes the positional relation of the word in the sequence; the token embedding, which represents the semantic information of the word; and the segment embedding, which corresponds to the semantic information of the whole sentence. Superimposing these three embedding forms the input to BERT.
In the linear sequence of inputs, the special symbol [CLS] is used to denote the representation of the whole input sequence for subsequent classification tasks. The special symbol [SEP] is used to split two sentences and to differentiate the two sentences by adding the segmentation code A to the first half of sentences A and B and the segmentation code B to the second half. In addition, positional coding is used to indicate the specific position of a word in a sequence, which is an important component in the Transformer architecture because it helps the model to perceive the positional relationship between words.

4.3. Evaluation Metrics

To assess the performance of the model, we resort to using different classical performance metrics, namely, (i) accuracy, (ii) precision, (iii) recall, and (iv) F1 score. Accuracy, as a basic assessment metric, directly reflects the ratio of correctly predicted cases to the total number of cases by the model, providing an intuitive demonstration of the model’s effectiveness. It is defined as:
A c c u r a c y = ( T P + T N T P + T N + F P + F N   ) ,
Additionally, precision and recall, which can be considered complementary measures, assess the model’s precision and comprehensiveness in positive category prediction, respectively. Precision is the ratio of correct positive predictions to the total predicted positives and is defined as:
P r e c i s i o n = ( T P T P + F P   ) ,
Recall, also known as sensitivity, measures the proportion of actual positives that were correctly identified and is calculated as:
R e c a l l = ( T P T P + F N   ) ,
The F1 score, as the reconciled average of precision and recall, provides an important basis for evaluating the model’s balanced performance in both precision and comprehensiveness. It is the harmonic mean of precision and recall and is calculated as:
F 1 = ( 2 × Precision × Recall Precision + Recall   )
By integrating these metrics, the Model Selection Service can comprehensively assess the multidimensional performance of the model, ensuring the thoroughness and comprehensiveness of the evaluation process and providing a sound recommendation to the user. Pragmatically, in this study, we used the scikit-learn library [38] to calculate these key metrics.

4.4. Experiments Configuration and Parameters Tuning

The specific configurations used in the experiment are shown in Table 5, while Table 6 depicts the parameters used in our experiments for the BERT pre-trained model.
Also, it is important to notice that different experimental parameters could affect the effect of the model. Therefore, during the experiments, we continuously adjusted the parameters while analyzing the experimental results to determine the final experimental parameters. This process has been employed to ensure the accuracy and reliability of the model, as also shown within various contexts present in the literature [39,40]. The final parameters obtained by such a process are shown in Table 7.
The tuning process was accomplished using a grid search technique, which identifies the best parameter settings by systematically traversing a series of predefined parameter combinations. In this way, the prototype could automatically search the parameter space to identify the optimized parameters, which in turn improves the model’s performance and generalization capabilities. The tuning of some key parameters is illustrated as follows.
For the learning rate, initially, it was set to 1 × 10−5. Subsequently, the learning rate was gradually raised to 1 × 10−4 and finally settled at 1 × 10−3 when the model demonstrated high accuracy. In addition to tuning the learning rate, some other key parameters were optimized. For example, the target number of categories was set to 31, a number that defines the number of categories or features that the model needs to recognize. At the same time, the training period of the model was set to 25 to ensure that the model can fully utilize the entire dataset for performance optimization. Further, the size of the hidden layer was set to 256 units, a setting that directly affects the efficiency of the model in processing information. To cope with input sequences of different lengths, ‘PAD’ was employed as a filler and ‘UNK’ was used to identify unknown words.
The impact of training epochs on the performance is automatically analyzed. The number of training epochs is the number of cycles to complete training on all samples, which is related to the adequacy of model training and the risk of overfitting [41]. Lower training epochs may lead to insufficient model learning, while training epochs that are too high may lead to overfitting. In this process, the variation in model performance is monitored by fixing other parameters and varying the number of training epochs. Table 8 illustrates an example of results when comparing different numbers of training epochs, i.e., 10, 20, 30, 40, 50, 60, 70, 80, 90, and 99, showing that the model achieves the highest precision, recall, and F1 at 10 epochs.
The batch size is the amount of data received by the model in a single training session. A larger batch size increases the accuracy of gradient descent and reduces oscillations but may also cause the model to fall into a local optimum; conversely, a smaller batch size introduces more randomness and sometimes makes the model difficult to converge, although in some cases it can improve the performance of the model [42]. In this prototype, the optimal batch size value has been identified by setting the batch size to 20, 40, 60, 80, and 100 and comparing the results. As shown in Figure 7, when the batch size was set to 100, the model achieved the optimal performance, so the system selected it as the final value.

4.5. Comparison Results

The optimized BERT-BiLSTM-CRF model is compared with the basic model of BiLSTM-CRF and the learning rate decay strategy, which are directly connected. The comparison results are shown in Figure 8, where (a) to (c) show the performance of BERT-BiLSTM-CRF compared to BiLSTM-CRF. It is found that BERT-BiLSTM-CRF has an advantage over BiLSTM-CRF in terms of performance measures, with better recognition results, but it requires a large amount of training text to achieve it.
Figure 9 illustrates the performance comparison of the two models (BERT-BiLSTM-CRF and BiLSTM-CRF) under three different evaluation metrics (Micro-averaging, Macro-averaging, and Weighted averaging). It is observed that BERT-BiLSTM-CRF outperforms BiLSTM-CRF under Micro-averaging and Weighted averaging, while under Macro-averaging, both perform similarly. This indicates that BERT-BiLSTM-CRF has better performance in the overall and weighted-averaging cases, but both have similar performance in the macro-averaging case.
In this prototype, a performance comparison between BiLSTM-CRF and BERT-BiLSTM-CRF was conducted by the Model Selection Service to evaluate the effectiveness of the pre-trained models in the Chinese medical NER task in the diabetes context, aiming to make a recommendation to the user. Precision, recall, and F1 have been used as the evaluation metrics, as shown in Table 9.
By comparing the performance of the BERT-BiLSTM-CRF and BiLSTM-CRF models in the Chinese named-entity recognition task, it could be found that the inclusion of the pre-trained BERT model significantly improves the recognition results. Specifically, the BERT-BiLSTM-CRF model shows high precision, recall, and F1 scores for the entity categories of disease names, body parts, drugs, and test names, with F1 scores of 89% for disease name entities, 84% for body part names, 85% for drug names, and 86% for test names, all of which are significantly higher than the BiLSTM- CRF model performance. The weighted average F1 scores indicate that the overall performance of the BERT-BiLSTM-CRF model (78%) is also superior to the BiLSTM-CRF model (67%). These results emphasize the importance of incorporating BERT into BiLSTM and CRF structures to improve the accuracy of NER tasks, validating the value of DL techniques, especially BERT, in medical text analysis.
In this case study, based on the result of the performed analyses and evaluations, the Model Selection Service would recommend BERT-BiLSTM-CRF to the user as the default option. Furthermore, while in this case study we only considered two models, the interested user could also decide to employ other models we have deployed in the model repository, and the Model Selection Service would recommend the optimal one.

5. User Layer: Simple User Interaction

To enhance the usability, the user layer and the GUI of the prototype have been kept as simple as possible. Users can upload a file of text or input a single piece of text: see Figure 10 for an example. After that, the user interface will asynchronously operate with the Model Selection Service, which will carry out the model comparison in a dedicated backend, and then the recommended model will be shown as the default option. The healthcare expert involved in the study can then select some customizations, such as changing the model and selecting the named entities to be displayed; see Figure 11 for an example. Specifically, the healthcare expert is presented with a list of pre-trained NER models, each accompanied by a brief description of their strengths and suitable applications. For example, models might be labeled as “Best for Diabetes-related Entities” or “Optimized for Drug Names”. Such guidance helps users make informed choices without needing to understand the technical details of each model. In addition, they can select to display only medical-related entities or all entities, including the general ones like locations. Once everything is confirmed, the user can trigger the analysis, and DLaaS will perform the NER task.
The result is generated at the backend in plain text format, and an example of such a result is depicted in Figure 12.
In our proposed prototype, we have developed the visualization feature using classic Web-based technologies such as HTML, CSS, and JavaScript and enhanced it with modern frontend framework Vue.js to improve the user interaction experience. After the processing is completed, the user sees the visualization on the page, with the option to download the output in TXT, CSV, PDF, and JSON formats to facilitate further analysis and integration with other healthcare tools. If there are multiple documents analyzed, the user can click on each document to check the recognition results, as shown in Figure 13, showing that different entity labels will be highlighted in different colors in one document. Color-coding different concept classes (e.g., diseases, symptoms, medications) helps medical professionals quickly identify and interpret key information in the text. This visual aid reduces cognitive load and allows for faster insight generation. Furthermore, the system provides a brief explanation of each class.
Combining the frontend and backend, we are now able to answer the research questions proposed in Section 1.2, thus effectively proving the efficacy of the MedNER framework, as well as the functionality of the proposed prototype. As far as RQ 1 is concerned, MedNER offers a systematic method that can facilitate non-IT experts, such as clinicians and healthcare administrators, in utilizing advanced NER capabilities for clinical decision support and data analysis, without the need to understand any deep AI knowledge. Regarding RQ 2, as mentioned in Section 4, a healthcare expert has been involved in the prototype validation and confirmed that the system has solid and sound usability, as well as very accurate results. This helps in assessing the effectiveness of our proposed system, as well as its general usability.
In light of these considerations, we observe that the MedNER framework can successfully balance accuracy and usability, resulting in a rewarding choice for healthcare providers and practitioners.

6. Conclusions

In this paper, we introduced MedNER, a service-oriented framework designed to enhance Named Entity Recognition (NER) for Chinese medical texts. MedNER leverages the concept of “Deep Learning as a Service” (DLaaS) to encapsulate multiple pre-trained NER models, providing a flexible, scalable, and user-friendly solution for healthcare providers. By addressing the specific challenges of Chinese medical text processing, particularly with an example focus on diabetes-related data, MedNER demonstrates significant advancements in the field of medical NER.
Note that in many applications, NER is the first step, followed by tools for normalizing concepts, recognizing relationships, and constructing knowledge graphs. For example, it can normalize different terms referring to the same drug (e.g., ‘metformin’ and ‘Glucophage’), identify relationships between symptoms and diseases, and build a knowledge representation that supports expert systems and clinical decision support tools. However, on its own, the MedNER system can help answer specific medical questions, such as identifying the most mentioned symptoms in a set of patient records or tracking the frequency of certain drug prescriptions. These insights can inform clinical decision-making and policy development. Thanks to the SOA concept, the system can be extended to support more downstream services, e.g., statistical analysis, integration into other medical databases to support diagnosis, forming patient portraits, constructing knowledge graphs to build a fully functional expert system, etc.
The key contributions of this work include the development of a systematic, modular framework that integrates various NER models, balancing accuracy and efficiency in recognizing complex medical entities. The service-oriented architecture ensures that the system is adaptable, allowing for seamless integration with existing healthcare IT infrastructures and facilitating easy updates and maintenance. This design also supports interoperability, enabling the framework to work across different platforms and systems.
MedNER’s user-friendly interface is particularly beneficial for non-IT experts, such as clinicians and healthcare administrators, who can now utilize advanced NER capabilities without extensive technical knowledge. The intuitive design, combined with robust data visualization tools, enhances the practical application of NER in clinical decision support and comprehensive data analysis.
By addressing the research gaps, this MedNER framework not only improves the extraction and analysis of medical entities but also ensures that these advanced tools are accessible and usable by healthcare professionals, ultimately contributing to better patient care and more informed clinical decisions. Future work will focus on expanding the framework to cover a broader range of medical texts and further refining the comparison and integration of more models and user interfaces based on ongoing feedback from healthcare practitioners. In addition, while our current research is centered on the Chinese language due to its specific linguistic intricacies within the medical domain, it should be noted that multilingual capabilities are essential in the broader scope of health informatics. We are committed to considering language expansion in our future work, ensuring that our system remains adaptable and responsive to the diverse needs of the global healthcare community.

Author Contributions

Conceptualization, W.C. and F.C.; methodology, W.C. and F.C.; software, P.Q.; validation, W.C. and P.Q.; formal analysis, W.C. and P.Q.; investigation, F.C.; resources, W.C. and F.C.; data curation, P.Q.; writing—original draft preparation, P.Q.; writing—review and editing, W.C. and F.C.; visualization, W.C. and P.Q.; supervision, W.C. and F.C.; project administration, W.C. and F.C.; funding acquisition, W.C. and F.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of Fujian Province, China (Grant No. 2022J05291).

Data Availability Statement

The data used in testing the prototype are sourced from Alibaba Cloud and are available via https://tianchi.aliyun.com/competition/entrance/231687/information, as of 17 July 2024, and the code used in the prototype is available via https://github.com/sunsh1neboys/MedNER (accessed on 17 July 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Jehangir, B.; Radhakrishnan, S.; Agarwal, R. A survey on Named Entity Recognition—Datasets, tools, and methodologies. Nat. Lang. Process. J. 2023, 3, 100017. [Google Scholar] [CrossRef]
  2. Zhao, W.; Wu, X. Boosting Entity-Aware Image Captioning With Multi-Modal Knowledge Graph. IEEE Trans. Multimed. 2024, 26, 2659–2670. [Google Scholar] [CrossRef]
  3. Al-Moslmi, T.; Ocaña, M.G.; Opdahl, A.L.; Veres, C. Named Entity Extraction for Knowledge Graphs: A Literature Overview. IEEE Access 2020, 8, 32862–32881. [Google Scholar] [CrossRef]
  4. Zhang, Y.; Zhang, H. FinBERT–MRC: Financial Named Entity Recognition Using BERT Under the Machine Reading Comprehension Paradigm. Neural Process. Lett. 2023, 55, 7393–7413. [Google Scholar] [CrossRef]
  5. Wang, C.; Wang, H.; Zhuang, H.; Li, W.; Han, S.; Zhang, H.; Zhuang, L. Chinese medical named entity recognition based on multi-granularity semantic dictionary and multimodal tree. J. Biomed. Inform. 2020, 111, 103583. [Google Scholar] [CrossRef] [PubMed]
  6. Peng, H.; Zhang, Z.; Liu, D.; Qin, X. Chinese medical entity recognition based on the dual-branch TENER model. BMC Med. Inform. Decis. Mak. 2023, 23, 136. [Google Scholar] [CrossRef]
  7. Li, J.; Wei, Q.; Ghiasvand, O.; Chen, M.; Lobanov, V.; Weng, C.; Xu, H. A comparative study of pre-trained language models for named entity recognition in clinical trial eligibility criteria from multiple corpora. BMC Med. Inform. Decis. Mak. 2022, 22, 235. [Google Scholar] [CrossRef] [PubMed]
  8. Ashrafi, I.; Mohammad, M.; Mauree, A.S.; Nijhum, G.M.A.; Karim, R.; Mohammed, N.; Momen, S. Banner: A Cost-Sensitive Contextualized Model for Bangla Named Entity Recognition. IEEE Access 2020, 8, 58206–58226. [Google Scholar] [CrossRef]
  9. Yu, Y.; Wang, Y.; Mu, J.; Li, W.; Jiao, S.; Wang, Z.; Lv, P.; Zhu, Y. Chinese mineral named entity recognition based on BERT model. Expert Syst. Appl. 2022, 206, 117727. [Google Scholar] [CrossRef]
  10. Chen, W.; Rabhi, F.; Liao, W.; Al-Qudah, I. Leveraging State-of-the-Art Topic Modeling for News Impact Analysis on Financial Markets: A Comparative Study. Electronics 2023, 12, 2605. [Google Scholar] [CrossRef]
  11. Nasar, Z.; Jaffry, S.W.; Malik, M.K. Named Entity Recognition and Relation Extraction: State-of-the-Art. ACM Comput. Surv. 2021, 54, 1–39. [Google Scholar] [CrossRef]
  12. Yu, J.; Yang, X.; Chen, X. A text analysis model based on Probabilistic-KG. In Proceedings of the 2022 IEEE International Conference on Networking, Sensing and Control (ICNSC), Shanghai, China, 15–18 December 2022; pp. 1–6. [Google Scholar]
  13. Ju, Z.; Wang, J.; Zhu, F. Named Entity Recognition from Biomedical Text Using SVM. In Proceedings of the 2011 5th International Conference on Bioinformatics and Biomedical Engineering, Chongqing, China, 10–12 May 2011; pp. 1–4. [Google Scholar]
  14. Ahmed, I.; Sathyaraj, R. Named entity recognition by using maximum entropy. Int. J. Database Theory Appl. 2015, 8, 43–50. [Google Scholar] [CrossRef]
  15. Morwal, S.; Jahan, N.; Chopra, D. Named entity recognition using hidden Markov model (HMM). Int. J. Nat. Lang. Comput. 2012, 1, 4. [Google Scholar] [CrossRef]
  16. Liu, M.; Tu, Z.; Zhang, T.; Su, T.; Xu, X.; Wang, Z. LTP: A New Active Learning Strategy for CRF-Based Named Entity Recognition. Neural Process. Lett. 2022, 54, 2433–2454. [Google Scholar] [CrossRef]
  17. Chowdhury, S.; Dong, X.; Qian, L.; Li, X.; Guan, Y.; Yang, J.; Yu, Q. A multitask bi-directional RNN model for named entity recognition on Chinese electronic medical records. BMC Bioinform. 2018, 19, 499. [Google Scholar] [CrossRef] [PubMed]
  18. Dong, X.; Chowdhury, S.; Qian, L.; Li, X.; Guan, Y.; Yang, J.; Yu, Q. Deep learning for named entity recognition on Chinese electronic medical records: Combining deep transfer learning with multitask bi-directional LSTM RNN. PLoS ONE 2019, 14, e0216046. [Google Scholar] [CrossRef] [PubMed]
  19. An, Y.; Xia, X.; Chen, X.; Wu, F.-X.; Wang, J. Chinese clinical named entity recognition via multi-head self-attention based BiLSTM-CRF. Artific. Intell. Med. 2022, 127, 102282. [Google Scholar] [CrossRef] [PubMed]
  20. Sun, C.; Yang, Z.; Wang, L.; Zhang, Y.; Lin, H.; Wang, J. Biomedical named entity recognition using BERT in the machine reading comprehension framework. J. Biomed. Inform. 2021, 118, 103799. [Google Scholar] [CrossRef] [PubMed]
  21. Wu, G.; Tang, G.; Wang, Z.; Zhang, Z.; Wang, Z. An attention-based BiLSTM-CRF model for Chinese clinic named entity recognition. IEEE Access 2019, 7, 113942–113949. [Google Scholar] [CrossRef]
  22. Wu, Y.; Huang, J.; Xu, C.; Zheng, H.; Zhang, L.; Wan, J. Research on Named Entity Recognition of Electronic Medical Records Based on RoBERTa and Radical-Level Feature. Wirel. Commun. Mob. Comput. 2021, 2021, 2489754. [Google Scholar] [CrossRef]
  23. Zhao, P.; Wang, W.; Liu, H.; Han, M. Recognition of the Agricultural Named Entities With Multifeature Fusion Based on ALBERT. IEEE Access 2022, 10, 98936–98943. [Google Scholar] [CrossRef]
  24. Chai, Z.; Jin, H.; Shi, S.; Zhan, S.; Zhuo, L.; Yang, Y. Hierarchical shared transfer learning for biomedical named entity recognition. BMC Bioinform. 2022, 23, 1–14. [Google Scholar] [CrossRef] [PubMed]
  25. Brandsen, A.; Verberne, S.; Lambers, K.; Wansleeben, M. Can BERT Dig It? Named Entity Recognition for Information Retrieval in the Archaeology Domain. J. Comput. Cult. Herit. 2022, 15, 1–18. [Google Scholar] [CrossRef]
  26. Hou, J.; Saad, S.; Omar, N. Enhancing traditional Chinese medical named entity recognition with Dyn-Att Net: A dynamic attention approach. PeerJ Comput. Sci. 2024, 10, e2022. [Google Scholar] [CrossRef]
  27. Fu, L.; Weng, Z.; Zhang, J.; Xie, H.; Cao, Y. MMBERT: A unified framework for biomedical named entity recognition. Med. Biol. Eng. Comput. 2024, 62, 327–341. [Google Scholar] [CrossRef] [PubMed]
  28. Chen, P.; Zhang, M.; Yu, X.; Li, S. Named entity recognition of Chinese electronic medical records based on a hybrid neural network and medical MC-BERT. BMC Med. Inform. Decis. Mak. 2022, 22, 315. [Google Scholar] [CrossRef]
  29. Jiawei, H.; Micheline, K. Data Mining: Concepts and Techniques; Morgan Kaufmann: Burlington, MA, USA, 2006. [Google Scholar]
  30. Verlinde, P. Error Detecting and Correcting Codes. In Encyclopedia of Information Systems; Bidgoli, H., Ed.; Elsevier: New York, NY, USA, 2003; pp. 203–228. [Google Scholar]
  31. Bhattacharjee, B.; Boag, S.; Doshi, C.; Dube, P.; Herta, B.; Ishakian, V.; Jayaram, K.R.; Khalaf, R.; Krishna, A.; Li, Y.B.; et al. IBM Deep Learning Service. IBM J. Res. Dev. 2017, 61, 10:1–10:11. [Google Scholar] [CrossRef]
  32. Chen, W.; El Majzoub, A.; Al-Qudah, I.; Rabhi, F.A. A CEP-driven framework for real-time news impact prediction on financial markets. Serv. Oriented Comput. Appl. 2023, 17, 129–144. [Google Scholar] [CrossRef]
  33. Ehsan, A.; Abuhaliqa, M.A.M.E.; Catal, C.; Mishra, D. RESTful API Testing Methodologies: Rationale, Challenges, and Solution Directions. Appl. Sci. 2022, 12, 4369. [Google Scholar] [CrossRef]
  34. Brat. Brat Rapid Annotation Tool. 2022. Available online: https://brat.nlplab.org/ (accessed on 17 July 2024).
  35. Fu, Y.; Bu, F. Research on Named Entity Recognition Based on ELECTRA and Intelligent Face Image Processing. In Proceedings of the 2021 IEEE International Conference on Emergency Science and Information Technology (ICESIT), Chongqing, China, 22–24 November 2021; pp. 781–786. [Google Scholar]
  36. Wang, Y.; Sun, Y.; Ma, Z.; Gao, L.; Xu, Y. An ERNIE-based joint model for Chinese named entity recognition. Appl. Sci. 2020, 10, 5711. [Google Scholar] [CrossRef]
  37. Tavan, E.; Najafi, M. MarSan at SemEval-2022 task 11: Multilingual complex named entity recognition using T5 and transformer encoder. In Proceedings of the 16th international workshop on semantic evaluation (SemEval-2022), Online, 14–15 July 2022; pp. 1639–1647. [Google Scholar]
  38. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  39. Agrawal, T. Hyperparameter Optimization in Machine Learning: Make Your Machine Learning and Deep Learning Models More Efficient; Springer: Berlin/Heidelberg, Germany, 2021. [Google Scholar]
  40. Ippolito, P.P. Hyperparameter Tuning. In Applied Data Science in Tourism: Interdisciplinary Approaches, Methodologies, and Applications; Egger, R., Ed.; Springer International Publishing: Cham, Switzerland, 2022; pp. 231–251. [Google Scholar]
  41. James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning; Springer: Berlin/Heidelberg, Germany, 2013; Volume 112. [Google Scholar]
  42. Takase, T. Dynamic batch size tuning based on stopping criterion for neural network training. Neurocomputing 2021, 429, 1–11. [Google Scholar] [CrossRef]
Figure 1. The MedNER framework.
Figure 1. The MedNER framework.
Bdcc 08 00086 g001
Figure 2. Encapsulation of key services using RESTful API.
Figure 2. Encapsulation of key services using RESTful API.
Bdcc 08 00086 g002
Figure 3. Example of the diabetes-related corpus (Chinese and its English translation).
Figure 3. Example of the diabetes-related corpus (Chinese and its English translation).
Bdcc 08 00086 g003
Figure 4. The BiLSTM-CRF model.
Figure 4. The BiLSTM-CRF model.
Bdcc 08 00086 g004
Figure 5. Structure of the BERT-BiLSTM-CRF model.
Figure 5. Structure of the BERT-BiLSTM-CRF model.
Bdcc 08 00086 g005
Figure 6. BERT embedding layer.
Figure 6. BERT embedding layer.
Bdcc 08 00086 g006
Figure 7. Batch size comparison experiment.
Figure 7. Batch size comparison experiment.
Bdcc 08 00086 g007
Figure 8. Comparison of entity-level results against three metrics: (a) precision; (b) recall; (c) F1 score.
Figure 8. Comparison of entity-level results against three metrics: (a) precision; (b) recall; (c) F1 score.
Bdcc 08 00086 g008
Figure 9. Performance comparison between BERT-BiLSTM-CRF and BiLSTM-CRF against three metrics: (a) precision; (b) recall; (c) F1 score.
Figure 9. Performance comparison between BERT-BiLSTM-CRF and BiLSTM-CRF against three metrics: (a) precision; (b) recall; (c) F1 score.
Bdcc 08 00086 g009
Figure 10. Data typing or upload.
Figure 10. Data typing or upload.
Bdcc 08 00086 g010
Figure 11. Model recommendation and user customization (note: the “Text” column shows Chinese medical texts).
Figure 11. Model recommendation and user customization (note: the “Text” column shows Chinese medical texts).
Bdcc 08 00086 g011
Figure 12. Example of result at the backend (note: the original Chinese medical text is shown on the top of the figure, followed by NER classification result throughout the text, and the recognized named entities).
Figure 12. Example of result at the backend (note: the original Chinese medical text is shown on the top of the figure, followed by NER classification result throughout the text, and the recognized named entities).
Bdcc 08 00086 g012
Figure 13. Example of result at the frontend (note: this is an example of Chinese medical text, with different named entities highlighted in various colours).
Figure 13. Example of result at the frontend (note: this is an example of Chinese medical text, with different named entities highlighted in various colours).
Bdcc 08 00086 g013
Table 1. Overview of prevalent NER models.
Table 1. Overview of prevalent NER models.
ModelDescriptionAdvantagesDisadvantages
BERT [20]Pre-trained language model based on Transformers, suitable for various NLP tasks.Handles long-range dependencies and adapts to multiple tasks.Training and inference are time-consuming and resource-intensive.
BiLSTM-CRF [21]Combination of Bidirectional LSTM and Conditional Random Fields for sequence tagging tasks.Captures contextual information and optimizes sequence labeling.May be less efficient for processing long sequences.
RoBERTa [22]An improved version of BERT with larger training data and longer training duration.Enhances BERT’s performance, especially in text classification and QA tasks.High resource requirements.
ALBERT [23]Lightweight BERT variant with parameter sharing and factorized embeddings to reduce model size.Significantly reduces parameter count while maintaining high performance.Despite fewer parameters, computational complexity remains high.
XLNet [24]Utilizes autoregressive and self-attention mechanisms to capture bidirectional context.Excels in handling long texts.High training complexity.
Table 2. Representative NER methods in Chinese medical text studies.
Table 2. Representative NER methods in Chinese medical text studies.
LiteratureModelKey Findings
Dong et al. (2019) [18]Multitask bi-directional RNNDeep transfer learning is leveraged for the purpose of knowledge transfer and data augmentation in the context of limited data.
Wang et al. (2020) [5]MSD_DT_NERImproved recognition accuracy for Chinese medical texts using multi-granularity semantic dictionary and multimodal tree, combining vocabulary information and position information.
Li et al. (2022) [7]BioBERT, BlueBERT, PubMedBERT, SciBERTA comparative study has been conducted, and PubMedBERT outperformed other pre-trained models in clinical trial eligibility criteria recognition.
An et al. (2022) [19]MUSA-BiLSTM-CRFMulti-head self-attention-based BiLSTM-CRF model with an improved character-level feature representation method combining character embedding and character-label embedding achieved superior performance in Chinese clinical NER tasks.
Chen et al. (2022) [28]MC-BERTCombining BERT with BiLSTM, a CNN, and a CRF layer showed significant improvements in recognizing medical entities in Chinese electronic medical records.
Peng et al. (2023) [6]Dual-branch TENERTENER divides the NER task into two-branch tasks, focusing on entity boundary and type recognition, integrating medical entity dictionary information and Chinese radicals features for improved performance. TENER achieved the best F1 scores on various datasets.
Table 3. Diabetes named-entity types.
Table 3. Diabetes named-entity types.
Entity TypePrefixed LabelNon-Prefixed LabelExample
Disease NameB-DiseaseI-DiseaseDiabetes, hyperglycemia
Body Part NameB-AnatomyI-AnatomyPancreas, blood vessels
Drug NameB-DrugI-DrugInsulin, metformin
Test NameB-TestI-TestGlucose measurement, HbA1c test
Table 4. Number of named entities.
Table 4. Number of named entities.
Entity Name (Chinese)Entity Name (English)Training SetTest Set
测试Test126,52964,995
疾病Disease107,87746,482
身体部位Anatomy78,66435,544
药物Drug47,31317,800
Table 5. Experimental configuration.
Table 5. Experimental configuration.
CategoryItemConfiguration
HardwareCPU
RAM
GPU
Apple M1
8 cores
8 GB
Apple M1
SoftwareOperating System
Python
Pytorch
MacOS Sonoma14.2
3.8.18
2.1.0
Table 6. BERT pre-trained model parameters.
Table 6. BERT pre-trained model parameters.
Model NameEmbedding
Dimension
Maximum PositionalEmbedding Hidden Layer Size
BERT-base-chinese768512768
Table 7. The final parameters employed in our model.
Table 7. The final parameters employed in our model.
Parameter NameParameter ValueParameter Name
Target size31Target size
Learning rate1 × 10−3Learning rate
Training period25Training period
Hidden layer size256Hidden layer size
Filled words<PAD>Filled words
Unknown word<UNK>Unknown word
Table 8. Comparing the performance of different training epochs.
Table 8. Comparing the performance of different training epochs.
NumPrecisionRecallF1
1069%66%67%
2068%66%67%
3068%65%67%
4068%65%67%
5067%65%66%
6067%65%66%
7068%65%67%
8069%64%66%
9069%65%67%
9968%65%66%
Table 9. Model performance comparison on various entity types.
Table 9. Model performance comparison on various entity types.
Model NameEntity NamePrecision (%)Recall Rate (%)F1 Score (%)
BERT-BiLSTM-CRFDisease name889089
Body part name828584
Drug name848585
Test name848886
Weighted average 777978
BiLSTM-CRFDisease name807879
Body part name555656
Drug name717071
Test name777476
Weighted average686667
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, W.; Qiu, P.; Cauteruccio, F. MedNER: A Service-Oriented Framework for Chinese Medical Named-Entity Recognition with Real-World Application. Big Data Cogn. Comput. 2024, 8, 86. https://doi.org/10.3390/bdcc8080086

AMA Style

Chen W, Qiu P, Cauteruccio F. MedNER: A Service-Oriented Framework for Chinese Medical Named-Entity Recognition with Real-World Application. Big Data and Cognitive Computing. 2024; 8(8):86. https://doi.org/10.3390/bdcc8080086

Chicago/Turabian Style

Chen, Weisi, Pengxiang Qiu, and Francesco Cauteruccio. 2024. "MedNER: A Service-Oriented Framework for Chinese Medical Named-Entity Recognition with Real-World Application" Big Data and Cognitive Computing 8, no. 8: 86. https://doi.org/10.3390/bdcc8080086

APA Style

Chen, W., Qiu, P., & Cauteruccio, F. (2024). MedNER: A Service-Oriented Framework for Chinese Medical Named-Entity Recognition with Real-World Application. Big Data and Cognitive Computing, 8(8), 86. https://doi.org/10.3390/bdcc8080086

Article Metrics

Back to TopTop