1. Background
Pneumonia is one of the leading causes of hospitalization and death in older adults worldwide. It has been reported that approximately 80% of older adults hospitalized for pneumonia have “aspiration pneumonia” and that this proportion increases with age [
1]. Aspiration pneumonia is mainly caused by invasion of oral bacteria from the trachea into the lungs together with food and saliva due to deterioration of swallowing function.
Standard methods that have already been established to evaluate swallowing function include videofluoroscopy (VF) and videoendoscopy (VE). VF is the gold standard for evaluating swallowing function, wherein a subject swallows a liquid or a bolus containing a contrast agent while being observed under X-ray fluoroscopy. This technique can help in observing the dynamics of each organ involved in swallowing beyond the oral cavity and determining whether aspiration has occurred. In VE, a fiber scope is inserted from the nasal cavity to observe the movement of various organs involved in swallowing and identify the extent of food residues [
1,
2].
Both of these methods allow the easy evaluation of the entire swallowing process and provide useful information; however, they require expensive testing equipment and environments as well as the technical proficiency of the physicians conducting the test and cannot be easily performed in individuals at their home or in patients under home care [
1]. Furthermore, VF is associated with the risk of radiation exposure, contrast medium aspiration, and contrast agent allergies [
3,
4,
5]. In many cases, patients report pain and discomfort when the fiber scope is inserted for VE. These two methods are also characterized by a testing environment that is markedly different from the usual eating/dietary environment of individuals.
Conversely, methods for evaluating swallowing function that do not require specific equipment include questionnaires and the repetitive saliva swallowing test (RSST) [
6,
7]. In the RSST, the examiner palpates the thyroid cartilage of the subject and checks how many times the subject is able to swallow voluntarily during a 30 s period to assess the risk of aspiration. It is a simple test that does not require any equipment, but there are often clinical cases of patients with reduced cognitive function, such as in dementia, who have difficulty in understanding questions or instructions, thereby affecting the correct evaluation based on the attention and motivation of patients. Dementia is reported to be a predictor of the onset of aspiration pneumonia in older adults [
8,
9,
10], and while the number of patients with dementia is increasing rapidly, there is an urgent need to establish a method of evaluation that can be applied even to patients with dementia with aspiration pneumonia.
In recent years, studies have focused on non-invasive and simple methods for evaluating swallowing function, such as evaluation of the tongue, hyoid bone movement, and tongue pressure using ultrasonography, electromyography, and pressure sensors, as well as the assessment of mastication and swallowing using microphones [
11,
12,
13,
14]. All these methods are non-invasive and useful means for obtaining information, but we cannot rule out the possibility that the method of applying an ultrasonic probe to the mandible while swallowing or attaching a sensor to the skin near the oral cavity or larynx may interfere with the movement of various organs involved in swallowing or may cause discomfort. Additionally, when sensors are attached to the skin, errors may occur depending on the thickness of subcutaneous fat and the degree of excess skin and this may pose difficulties in patients with skin fragility [
15]. Furthermore, since examination circumstances that vary from routine environments can be the cause of confusion or stress for patients with dementia, it would be ideal for the evaluation of swallowing function to be performed in daily eating/dietary environments [
16,
17]. Although there have been many studies exploring simple methods to evaluate tongue movement, tongue pressure, and hyoid bone movement [
11,
12,
13,
14], a method that can easily evaluate the soft palate movement, despite its important role in preventing regurgitation of food bolus in the nasal canal while swallowing and generating pressure for the transport of the food bolus, remains to be established [
18].
To address this limitation, we have developed an earphone-type sensor to objectively measure swallowing function (
Figure 1). Most people have used earphones in their lives, and these do not interfere with chewing and swallowing motions. Therefore, compared with existing evaluation methods, we believe that earphone-type sensors will allow for easier assessment of swallowing under conditions that are closer to swallowing motions that normally occur during routine meals.
This study aimed to verify the validity of the earphone-type sensor by simultaneously recording measurements with VF, the gold standard for evaluating swallowing function, and drawing comparisons between both these approaches to determine whether the movement of the soft palate can be evaluated.
3. Results
A total of 30 swallowing data trials were performed for the six subjects and, of these, only 27 data trials were accepted for analysis. The first trial for subject No. 2 was excluded due to poor sensor mounting, the first trial for subject No. 4 was excluded because it was indiscernible, and the fourth trial of subject No. 6 was excluded for a switch-pressing error.
Figure 4A shows the waveform resulting from poor sensor mounting, while
Figure 5 shows the waveform from data deemed indiscernible. The average swallowing times indicated by the subjects pressing switches, from subjects No. 1 to No. 6, were 1.38, 3.10, 2.60, 2.32, 5.49, and 3.03 s, respectively.
Table 1 shows the differences (DA, DB, and DC) between the times of emergence of soft palate movement in VF footage (VA, VB, and VC) and the times of emergence of waveform movement based on the earphone-type sensor (SA, SB, and SC) in terms of average ± standard deviation (SD).
Figure 5 illustrates the total sensor error for VF for each subject from the results in
Table 1. As shown in
Table 1 and
Figure 5, the smallest errors in the VF and sensor readings were observed in No. 3, with errors of DA: 0.09 ± 0.07 s, DB: 0.13 ± 0.07 s and DC: 0.12 ± 0.04 s. The largest errors were seen for subject No. 2, with DA: 0.50 ± 0.15 s, DB: 0.54 ± 0.12 s and DC: 0.83 ± 0.30 s.
Table 2 shows the time taken (V1) for the soft palate to go move from the lowest and most retracted position (VA) to the highest and most advanced position (VB), the time taken (VII) from VB to the soft palate that goes down again (VC), the time taken from VA to VC (VII), the time taken (SI) from the lowest point of the sensor wave (SA) to the highest point of the sensor wave (SB), the time taken (SII) from SB to the wave that goes down again (SC) and the time taken from SA to SC (SIII), in terms of average ± standard deviation.
Figure 6 shows the Bland–Altman plot created for VI and SI, VI and SII, and VI and SIII based on the results in
Table 2.
Figure 6A shows VI and SI,
Figure 6B shows VIII and SII, and
Figure 6C shows VIIII and SIII. The plots were color-coded for each subject. For TI, the average difference was −0.01 ± 0.14 s and the 95% LOA was −0.28 to 0.28 s. The 95% confidence interval (CI) of the one-sample t-test was −0.06 to 0.05 s and the correlation coefficient of the Bland–Altman plot was −0.13 (
p > 0.05), which meant there was no fixed error or proportional error. The MDC was 0.28 s. Of the 27 measurements, 17 (63%) had relative errors below ±30%. The average values were 0.38 ± 0.14 s for VI and 0.38 ± 0.10 s for SI. For TII, the mean difference was −0.33 ± 0.23 s and the 95% LOA was −0.79 to 0.13 s. The 95% CI of the one-sample t-test was −0.42 to −0.24 s, indicating fixed error, while the correlation coefficient of the Bland–Altman plot was −0.01 (
p > 0.05), which meant there was no proportional error. Of the 27 measurements, 4 (approximately 15%) had relative errors below ±30%. For TIII, the average difference was −0.34 ± 0.31 s and the 95% LOA was −0.97 to 0.28 s. The 95% CI of the one-sample t-test was −0.46 to −0.21 s and indicated a fixed error, while the correlation coefficient of the Bland–Altman plot was −0.13 (
p > 0.05) and meant that there was no proportional error. Of the 27 measurements, 11 (approximately 40%) had relative errors below ±30%.
4. Discussion
In this study, we examined the validity of the earphone-type sensor by simultaneously recording measurements with VF and compared these approaches to determine whether the movement of the soft palate can be evaluated. Of the total of 30 measurements obtained from six subjects, we were able to collect 27 swallowing data trials. To the best of our knowledge, the method used in this study is the first approach to evaluate swallowing function noninvasively from the ear canal, thus providing novel findings.
Regarding the mechanism by which the soft palate movement could be measured with the earphone-type sensor, it is possible that the reflected light received by the sensor was able to better reflect the movement of the eardrum due to the more slender shape of the sensor tip and vicinity to the eardrum. The distance from the ear canal to the eardrum has been reported to be approximately 25 mm for adults [
30], and the length of elastic material of sensor was approximately 15 mm, indicating that it was close to the eardrum. The eardrum is approximately 0.1 mm thick, and it moves due to changes in pressure from the Eustachian tube, which is the cavity that connects the pharynx and the tympanic cavity and is primarily dilated by the tensor veli palatine muscle, while the soft palate is primarily elevated by the levator veli palatine muscle, which has a stop in the Eustachian tube [
31]. Since the series of fluid movements from the elevation of the soft palate to the dilation of the Eustachian tube is performed by coordinated movement of these palate muscles, the pressure changes of the Eustachian tube due to the soft palate movement is reflected in the movement of the eardrum, and our sensor was able to detect these movements.
In this study, the validity of the earphone-type sensor was analyzed by comparing the time of emergence of the movement and the time required for the movement, assuming that the sensor waveform corresponded to the time of the soft palate movement (
Figure 2). We applied the Bland–Altman analysis used for method-comparison studies to examine TI, TII, and TIII (
Figure 6). No plot was created with the DA, DB, and DC measurements themselves, as the measurements increased with the passage of time and by continuously filming and recording five trials of each subject; thus, a bias would arise in the average values of the measurement pairs on the X-axis of the graph. In addition, since each subject underwent five repeated measurements, subject-specific errors occurred; thus, we avoided making comparisons of all 27 measurements trials.
Figure 6A is the graph of TI. The average difference of TI was −0.01 s, and the plot showed both a positive and negative distribution around the 0 point of the Y-axis. Statistically, there were neither fixed errors nor proportional errors. The time of soft palate rising movements (equivalent to VI) reported in previous studies ranged from 0.32 [
32] to 0.5 s [
33], which is consistent with the results of our study. Therefore, with regard to VI, we believe that the SI of the sensor can reflect the time from the soft palate being in the lowest and most retracted position (VA) to the highest and most advanced position (VB).
Figure 6B is the graph of TII. The average difference of TII was −0.33 s, and the plot shows a downward distribution from the center of Y-axis. Statistically, there was no proportional error, but a fixed error was determined. The average values were 0.68 ± 0.13 s for VII and 0.35 ± 0.15 s for SII.
Figure 6C is the graph of TIII. The average difference of TIII was −0.34 s, and like TII, the plot showed a downward distribution from the center of Y-axis. Similarly, there was no proportional error, but a fixed error was noted. The average values were 1.06 ± 0.19 s for VIII and 0.73 ± 0.18 s for SIII. The SC did not match the VC because the TI was consistent, and the TII and TIII presented a fixed error. In addition, there was a time lag at regular intervals, which may have been reflected another organ or another soft palate movement. As previous studies have indicated that on average, the time taken from the soft palate to start rising to return to its original position was 1.159 s [
32], we believe that the VIII in the results of this study was correctly measured. In addition, the opening of the Eustachian tube should occur at almost the same time that the soft palate reaches the highest position or 0.03–0.06 s later [
32]. Since the sensor waveform reflects the movement of the ear canal, including the eardrum, it is possible that the SC did not match the VC due to the effect of the Eustachian tube opening and the middle ear pressure being balanced. Conversely, SII required about 0.35 s, and for all subjects, none matched the opening time of the Eustachian tube in the previous study, and it did not represent the opening of the Eustachian tube itself. In order to correspond to the point (VC), where the soft palate re-descended, it was necessary to reconsider the method of adopting SC points.
Regarding the sensor waveforms, SA, SB, and SC could be identified 27 times during 30 swallows, but there were cases where it was difficult to identify the point due to individual differences in the waveform.
Figure 7 shows an example of waveforms for each subject.
Figure 7A, showing the data of subject No. 3, presented less error than other subjects in all items from DA to DC.
Figure 7B shows the data of subject No. 6. While the waveform of subject No. 3 was relatively simple and smooth, that of subject No. 6 showed small fluctuations. Similar trends were observed in subjects No. 2 and No. 5. Regarding this difference in waveform, a previous study pointed out that when measuring the pressure in the Eustachian tube, the carotid artery beats could be included in the pressure waveform because the carotid artery runs just behind the Eustachian tube [
34,
35]. Even in the case of an earphone-type sensor, it is possible that the waveform is pulsatile depending on the condition of the subject, such as the position of the sensor insertion, hypertension, and mental tension, and the point to be adopted may have been incorrect. It is necessary to reconsider which point to adopt when such a pulsatile waveform is observed. Subject No. 1 (
Figure 7C) seemed to achieve a pulsatile waveform at first glance, but the error was the second smallest in subject No. 3. This was because the subject himself/herself pressed the switch before and after swallowing, and the interval was converted to a waveform based on it being defined as swallowing; thus, the difference in recognition of each subject was reflected in the length of time on the X-axis. Subject No. 1 pressed the switch at approximately 1.38 s, which was as shorter amount of time than for other subjects. Therefore, we believe that the waveform was stretched horizontally, and the position of the plot became clear. As a healthy person requires 0.5–1.0 s for the swallowing reflex, the difference in swallowing time observed between subjects here may have been due to the subjective view of each subject and the speed of pressing the switch. If the extraction time was too short or too long, there was a possibility of misunderstanding the points to be adopted; in the future, it will be necessary to devise methods able to extract the characteristics of the waveform using machine learning for a certain period of time before and after the swallowing reflex.
There were no fixed errors or proportional errors between VI and SI, which confirmed that the sensor measurements would be appropriate for clinical use. This represented the time taken for the soft palate to go from the lowest and most retracted position immediately before swallowing to the highest and most advanced position during swallowing. To date there have not been any methods or devices developed for conducting simple determinations of the timing and/or duration of soft palate movement. Consequently, to the best of our knowledge, there is no literature that provides objective values on the degree to which soft palate movement is delayed or shortened by pathology. Soft palate movement involves the tensor veli palatine muscle, the levator veli palatine muscle, the palatoglossus muscle, the palatopharyngeal muscle, the superior pharyngeal constrictor muscle, and the palatal ptosis muscle [
36]. Because atrophy, weakness, rigidity, and ataxia of these muscles affect soft palate movement, the evaluation of VI (SI) can provide objective, clinically important information.
The first piece of information that can be obtained from the duration of swallowing is the time required to elevate the soft palate. While we were unable to confirm this with data from subjects other than healthy adults, clinically speaking, if the soft palate is paralyzed or atrophied it can be observed to elevate slowly or insufficiently and lower prematurely. If the timing of these events is off, the nasopharyngeal cavity may not close completely, potentially causing aspiration or reflux into the nasal cavity. Further, if the time from final chewing to the elevation of the soft palate (swallowing) is known, the time required for food bolus transport and the oral phase can be determined.
Regarding the compatibility between the VF and sensor measurements, approximately 63% of SI, approximately 15% of SII, and approximately 40% of SIII had relative errors less than ±30% and did not reach the 75% standard for compatibility. Therefore, the results indicated that the sensor and the VF were not compatible. However, the tolerance for concordance in the Bland–Altman analysis was not clearly defined and left to the discretion of the reporter. Here, it was determined that the number of measurements with a relative error less than ±30% was 75% or more, but re-examination should be considered based on reports in the same research field.
It is desirable to evaluate swallowing function in a series of steps from taking food into the oral cavity to chewing, feeding, and swallowing as much as possible; however, thus far, in order to evaluate in a series of flows, it was necessary to test under nonregular environments, such as radiation exposure and invasiveness. Since the earphone-type sensor can take measurements by simply attaching the earphone to the outer ear, it can evaluate in a normal eating environment without requiring a specific examination room. In addition, this measuring device does not interfere with the swallowing operation. Furthermore, there is no risk associated with pain during examination, radiation exposure, or the use of contrast media, which allows a simple and non-invasive evaluation to be performed. Because the device is small and lightweight, it can be used not only in hospitals but also in residences for older adults, home-based medical care, and nursing care facilities. Furthermore, it has been verified that the earphone-type sensor can measure mastication and breathing rate [
20,
21,
22,
23,
24]. The combined use of earphone-type sensors and other non-invasive devices may enable continuous evaluation of swallowing function. For example, the pharyngeal phase of the swallowing process begins with a retraction of the soft palate leading to contact with the posterior pharyngeal wall. It follows that if the earphone-type sensor can detect the time from final chewing to SA, then the time required for Stage II transport [
37] (the transportation of a food bolus) can also be measured. When a fixed amount of sample or foodstuff is taken into the oral cavity, the time required to chew, form a food bolus, and make it ready for swallowing (length of oral phase) can be measured as well. Ordinarily, swallowing begins during the expiratory phase of respiration, and after swallowing, respiration resumes with continued expiration [
38]. We believe it is possible to evaluate the synchrony between swallowing and respiration by checking the phase of respiration at which swallowing has occurred as well as the appearance of the soft palate movement waveforms (SA and SB).
In future studies, it is necessary to reconsider the SC corresponding to the point where the soft palate descends (VC) and the method of adopting points in cases of a pulsatile waveform. Here, the analysis was based on a single subject, and thus it is necessary to confirm the inter-rater and intra-rater reproducibility. Additionally, it is necessary to ascertain whether similar results can be obtained with a larger cohort of subjects. We plan to conduct the same study for older individuals and patients with dysphagia. Furthermore, a method that can continuously evaluate the entire swallowing process in combination with the already verified measurements of mastication [
20] and respiration [
21] should be developed.
This study has several limitations. First, the sensor obtained waveforms from the ear canal, which includes the tympanic membrane, making it difficult to completely capture the sole movement of the specified organ. Second, the study did not take into account the possible influence of the presence of cerumen or other mitigating conditions within the ear canal on the resulting measurements. Third, since the strength of the waveform changes depending on the position and angle of the sensor, it is difficult to evaluate the muscle strength of each organ and its movement distance. Fourth, nasopharyngeal closure is said to be affected by posture [
39], but in this study, measurements were performed in a sitting position with the head and neck trunk held in the intermediate position. Fifth, the presence or absence of aspiration cannot be confirmed. Sixth, although soft palate movement is known to be affected by the respiratory phase [
40], this study did not take such influence into account. Finally, the results of this study are based on data from only six healthy individuals; thus, future studies will be needed to determine whether the same measurements and results can be obtained with larger cohorts. Furthermore, because of the limited sample, this study was unable to confirm the generalizability of this method for elderly and ill patients, which is our topic for future research.