1. Introduction
Noise has become a big problem in big cities. It has been shown that affects human behavior, health and even children’s cognition [
1]. Different measurements and studies of this environmental factor have been performed in the past [
2,
3]. The normal way to do noise measurements is to collect noise samples with a sound level meter, however this technique has many drawbacks. On the one hand, only local measurements in a grid are taken and on the other, it is expensive due to the measuring equipment and personnel costs. Moreover, these studies have been performed observing objective parameters, the evaluation of the equivalent sound pressure level [
4]. However, previous works have shown that the evaluation of psychoacoustic parameters, such as loudness and sharpness, fits better to the assessment of noise subjective annoyance [
5,
6]. Psychoacoustic research has been widely studied and standards for evaluating subjective annoyance and calculating psychoacoustic parameters have been created [
7,
8].
In 2014, the International Standardization Organization introduced the standard ISO 12913 about soundscape [
9], following the recommendations of the COST action Soundscape [
10]. The standard defines the soundscape as: acoustic environment as perceived or experienced and/or understood by a person or people, in context [
9]. This context includes the relationships between person and activity and place, in space and time. The interaction between the different actors in the acoustic environment establish a dynamic context which is able to influence soundscape through different mechanisms as: (1) the auditory sensation, (2) the interpretation of auditory sensation, and (3) the responses to the acoustic environment. In this way, the soundscape differs from the acoustic environment (defined as: sound at the receiver from all sound sources as modified by the environment). In this work, the study is mainly oriented to the analysis of the acoustic environment, but as the evaluation is made in terms of the estimation of the psycho-acoustic annoyance (because it is not related to the direct subjective response of people) by using the Zwicker’s model. Here there is no specific qualification of the sound by subjective perception, but the description of the environment with a simplified version of the Zwicker’s model is used. This can be related to the subjective response, but this will be done in a later study. Other aspects that tackle this study are related to the development of an infrastructure to evaluate the acoustic environment in terms of binaural psycho-acoustic parameters and also its spatial and temporal analysis.
In the context of modern urban planning and smart cities, IoT systems are receiving increasing attention due to their ability to monitor different problematic areas within a population center [
11,
12]. These problems are mostly related to the environment, but also to e-Health, Intelligent Transportation Systems (ITS) and other emerging technological areas with high demands [
13,
14,
15].
This work explains the implementation of an IoT-based system oriented to gather binaural psycho-acoustic information from indoor and outdoor environments. As the annoyance model proposed uses only loudness and sharpness as the main parameters, some simplifications have been made. In order to explain the performance of this parameters in the considered environments, spatial and temporal analysis have been done in the measurements to check the validity of the spatial statistical model proposed using kriging.
The paper is organised as follows: in
Section 2, a brief state-of-the-art is sketched. In
Section 3, the theoretical and practical methodology used to deploy the experiments, both in indoor and outdoor environments, with spatial statistical analysis (using kriging method) are described, discussing also hardware, software and signal processing considerations. Also in the previous section, the procedure to obtain the binaural loudness and sharpness is described. In
Section 4, the results are presented and discussed. Finally,
Section 5 summarizes the main conclusions of this work.
2. Related Works
Different authors have implemented different monitoring systems for environmental noise and have evaluated the related annoyance. The use of WASNs for noise monitoring was the main focus of the works in [
16,
17]. The authors used a WASN with Tmote-Sky motes [
18] and Tmote Invent (TmI), for traffic noise monitoring using the equivalent level (
) and counting the number and type of vehicles. In that deployment, the sampling frequency was set at 8 kHz. As an important conclusion, the authors stated that Tmote-Sky had excessive self-noise making it inappropiate for acquiring sound measurements, while, TmI offered good audio features. In that research, no specific procedure for calibration was provided. In [
19,
20], the authors deployed a WASN in Ostrobothnia (Finland). In these references, the authors evaluated different tests to assess the noise impact. They measured the
with
ms using a sampling frequency of 33 kHz, with 14 calibrated nodes (MicaZ from Crossbow -now MEMSIC- with an ad-hoc acquisition circuit to allow a dynamic range of 60 dB), and synchronised during 96 h, they obtained good results. Other works such as [
21,
22,
23] used mobile phones for noise pollution monitoring. Although the authors obtained interesting results, in this paper there is a lack of information about the recording conditions which means less accuracy in the noise measurements. When evaluating noise parameters, the location of the measuring devices should follow specific rules [
24].
In the previous works [
16,
17,
18,
19,
20,
21,
22,
23], the measurements focused on the
or its A-weighted version (ITU-R 468),
. The calculation of these measurements was made by applying a frequency-selective filter that considers the frequency range around 3–6 kHz, where the human ear is most sensitive.
From the previous references, we can see similar works more oriented to the soundscape description. In [
25,
26], the referred authors focus the description of the soundscape in the evaluation of noise levels and psycho-acoustic parameters from recordings gathered in walk-around. However, none of them include an analysis based on binaural psycho-acoustic annoyance, which probably should be more accurate from the psycho-acoustic point of view in a human being.
In the literature, the evaluation of Psycho-acoustic Annoyance (PA) is mainly based on the work by Zwicker & Fastl [
27]. Different authors have tried to focus this research taking into account the subjective perception combined with specific psychoacoustic parameters [
28]. Loudness is an important factor affecting the perceived psychoacoustic annoyance of a sound [
27,
29]. This parameter is defined as the subjective intensity of sound, which qualifies sounds ranging from quiet to loud. It is mainly related to the perceived amplitude of a sound. There are some models to evaluate this parameter which deliver a numerical estimation of the loudness level based on some objective characteristics of the sound. The best known models are Zwicker’s [
27] and Moore’s [
30], which are standardised in the 2 parts of ISO 532 [
8]. They are based on a monaural stimulus and pass through different filtering stages which are similar to the ear natural filters and simulates the hearing system. Recently, new research in psychoacoustics has shown that monaural loudness is not enough to assess the acoustic loudness and different models for binaural loudness were developed [
28,
31]. The implementation of the Zwicker’s model [
32] allowed a computing perspective for loudness monitoring. In a previous work [
33], the authors developed a binaural loudness monitoring system by using the Zwicker’s model and using binaural synthesis by means of a combination of Head-Related Transfer Functions (HRTFs) and microphone array processing.
In [
34], the authors evaluated the environmental noise with a mobile application and display the results in a map using the kriging method. In [
35], the authors measured 5 minutes sound pressure level (SPL) values with a WASN and also evaluated spatial cross-validation using the kriging method in a small city in Spain. In [
36], the authors implemented a system oriented to compute the psychoacoustic parameters in the server side from recorded audio chunks. In [
33], the authors implemented a edge-computing system by using different Raspberry Pi 3 (Rpi3) nodes in order to perform an evaluation of performance when computing the binaural loudness directly on the Rpi3 nodes.
In this paper, an improvement of the previous work [
33] with the binaural loudness is done by incorporating into the model the computation of the binaural sharpness. Moreover, a simplified version of the Zwicker’s psycho-acoustic annoyance model is evaluated (assuming specific conditions), mainly to assess the spatial distribution of the subjective annoyance in an indoor and outdoor environments.
4. Results and Discussion
In this section, the WASN performance is evaluated carrying out two experiments in an indoor and outdoor environment. Moreover, the results of these experiments are presented and analyzed.
The experiments planned for this work consider an indoor and an outdoor environment. The indoor environment corresponds to the location of 4 nodes in different offices of the Computer Science Dpt at the ETSE in the University of Valencia and the outdoor environment corresponds to the location of different nodes in the façades of houses in a neighborhood from Murcia city close to a highway. The system performance is evaluated and the results are presented and analysed.
Figure 2 shows the ThingSpeak configuration for the data channels in the indoor environment, collecting SPL, channels left and right for loudness (L), binaural loudness (BL), channels left and right for sharpness (S), and binaural sharpness (BS).
4.1. Indoor Environment
The different nodes considered in this experiment were placed at different offices in the third floor of the first block in the ETSE. Here two offices (one with a professor and fluent activity of students during few hours and other with only a professor working), the secretary office of the department and a laboratory with one student working during several hours were selected. Although the activity was different in each location, it was not too high, therefore the noise levels and the computed psycho-acoustic annoyance had low variability. All the nodes were connected to Internet and every node was linked to a different channel in the ThingSpeak platform during 8 h.
Figure 3a shows the location of the four nodes from the upper part of the building.
Table 1 shows the statistical values (mean and standard deviation) for each node for the indoor measurements.
Table 2 shows the correlation matrix of binaural loudness, binaural sharpness and SPL.
4.2. Outdoor Environment Measurement
The outdoor environment corresponds to an urbanization near a highway. Therefore, the traffic noise is considerable. The measurement points were located in two different lines of buildings. The buildings in the second line (far away from the highway) are more protected from the traffic noise, but these buildings are near a pedestrian area where some children were playing during few hours. In this two different areas, the activity is also the criteria selected. In this location, five nodes were placed in different façades of the urbanization houses. Nodes 1 and 2 were located in the ground floor, nodes 3 and 4 were located in the second floor and node 5 was placed in the first floor. All of them in the façade of the corresponding house.
Figure 3b shows the location of these nodes.
Table 3 shows the statistical values (mean and standard deviation) for each node for the outdoor measurements.
Table 4 shows the correlation matrix of binaural loudness, binaural sharpness and SPL.
As shown in
Table 2 and
Table 4, correlation of the temporal series of the different considered parameters (i.e., SPL, binaural loudness and binaural sharpness) show clear differences. This fact is related to the different nature of the parameters considered. In general, SPL is more related to loudness, but sharpness differs, because it depends mainly on the centroid of the spectral content of the signals in each environment.
4.3. Modelling and Cross-Validation of the Measurement Sets
Figure 4a,b show the results of the descriptive analysis of the BPA in every location in the indoor and the outdoor environment.
In order to perform a validation of the values and the number of selected points for the linear modeling of the set of measured values, a Leave-One-Out-Cross-Validation algorithm in R-Studio software package (with the “Linear Model” method) was used for the whole set of time-averaged measurements, corresponding to the spatial locations. In
Table 5 the results of this test are shown in order to select the nodes in the indoor environment. A linear model is generated to predict the % established in the first column (%test/%training) with the mean value of the BPA in every location. In this table,
Rsquared is the Pearson’s squared correlation coefficient and
MAE is the maximum absolute error.
RMSE is the minimum of the list of results obtained with this algorithm. According to this table, the selection of a validation set with 60% of the nodes is enough for our network.
Table 6 shows the results of the cross-correlation test to predict the temporal values of BPA in the indoor environment. In addition, a linear model is generated to predict the values of BPA in the test percentage established in the first column (as in the previous case). Here, the whole temporal series was used in each location and from these results we can predict with high accuracy the test values training the model with the 50% of the measurements (as shown the RMSE and MAE values).
In
Table 7, the results of the test to select the nodes in the outdoor environment are presented. This prediction is made using a linear model for the % established in the first column (%test/%training) with the mean value of the BPA at each location. According to this table, the selection of a validation set with 60% of the nodes is enough for the network of measuring platforms.
Table 8 shows the results of the cross-correlation test to predict the temporal values of BPA in the outdoor environment and generates a linear model to predict the values of BPA in the test percentage established in the first column (as in the previous case). As in the previous case, we used the whole temporal series in each location and from this results we can predict quite well the test values training the model with the 50% of the measurements (as shown the RMSE and MAE values).
It can be seen that in both cases (the indoor and outdoor environments), the relationship between the BPA and binaural loudness + binaural sharpness is better correlated than the BPA and SPL.
In this case, location 4 in the outdoor environment corresponds to the place of the most exposed node to the traffic noise. Therefore, as the variance of the temporal values for the SPL and the binaural loudness and sharpness is the lowest one (see
Table 3), both for the test set and the training set, the random selection of time samples for the LOOCV gave for the RMSE and MAE a very low value in the cases of 80% and 50% of the training set, making a really good matching for the temporal prediction of the SPL and the BPA (from the binaural loudness and sharpness).
4.4. Spatial Statistical Analysis (Kriging Method)
Spatial statistics allows the use of several methods. The most common methods are Inverse Distance Weighted (IDW), spline and kriging. IDW is a simple and intuitive deterministic interpolation method based on principle that sample values closer to the prediction location have more influence on prediction value than sample values farther apart. The major disadvantage of IDW is “bull’s eye” effect and edgy surface. Spline is also a deterministic interpolation method which fits mathematical function through input data to create smooth surface. Kriging is a method based on spatial autocorrelation [
46].
The computed PA from the measurements of binaural loudness and sharpness establish a data set in relation to different locations with its GPS coordinates, longitude and latitude. By denoting the PA computed with the simplified Zwicker’s model at a location
x as
, this data set is defined as
, where
are all the locations of the modelling sets, following the kriging technique [
47].
In this context, the objective of this proposed model is the prediction of
in any location
, particularly those within the validation set. The annoyance reports contain information of the set of covariables included. Therefore,
is modeled as a tendency function of the covariables better involved in the process which explains its variability in a large extent plus some random error which is explained by the short term variability, i.e.,
where
and
is a stationary Gaussian process with zero mean, whose spatial dependence characterization is given by the variogram
[
48]:
where
denotes the variance and
h is an offset. This variogram represents the main function of the kriging method, which presents different procedures such as simple kriging, ordinary kriging, universal kriging, indicator kriging, co-kriging, etc, attending to different statistical aspects considered in the covariable set. Ordinary kriging is the most widely used kriging method. It is used to estimate a value at a point of a region for which a variogram is known, using data in the neighborhood of the estimation location and also can be used to estimate a block value [
49].
In this study, the variogram has been calculated, using ordinary kriging and a spherical model, with the R statistical package for variogram fitting [
50].
Figure 5a shows the spatial distribution of the BPA in the indoor environment and
Figure 5b shows the spatial distribution of the BPA in the outdoor environment. The map shows the evaluation of the equation
3 with the kriging method showing the points between the measuring points. Also
Figure 6a,b show the relative error of the kriging estimation showing that the error of the Kriging estimation is kept bounded inside the region between the measuring points below 25% and grows up as the estimated point is in the outer part and far away from the centroid of the measurement point.
Figure 6 shows the relative error distribution obtained in the Kriging estimation. The estimation absolute error range for the BPA in the indoor environment (see
Figure 6a) is [0.0064, 0.1082] which is reasonably low and is bellow the BPA obtained from measurements. The greater error is focused in the region outside the building where no measurements are done. For the outdoor environment (see
Figure 6b), the estimation error range is [0.9525, 24.2659] which is a larger range compared to the BPA obtained from measurements. The greater error is located in the region next to the highway.
4.5. A Preliminar Subjective Study
In order to check the subjective response in the indoor and outdoor environments, a short poll was conducted to a group of people related to each environment. The poll asked for some social aspects (gender and age) and the valuation of some aspects related to the soundscape, following the Swedish Soundscape-Quality Protocol which assess the perceived affective quality using the following eight adjectives: Pleasant, Unpleasant, Eventful, Uneventful, Monotonous, Exciting, Calm, and Noisy [
51].
An Android application for mobile has been developed with the short poll (translated into Spanish) to allow a quick data gathering of the surveyants. In order to locate the response of every person answering the poll, the GPS coordinates have been registered with the same application.
The analysis of the results of this poll is shown in
Table 9.
According to this values the response in the indoor environment is generally qualified as pleasant and mostly uneventful (calm and nearly monotonous), although at certain hours students make the soundscape a bit exciting.
Taking into account the values in the outdoor environment, in general, the values are a little bit higher for noisy and unpleasant, which means that the people living in this area seems to like the place but stand the noise from the highway and like the noise made by children activities.
5. Conclusions
In this work, an IoT system collecting information from each acoustic environment in different environments (indoor and outdoor) is deployed. The nodes of the system gather audio information and compute psycho-acoustic parameters (loudness and sharpness) performing the binaural synthesis by considering the HRTFs. In the indoor environment, the nodes were placed in different rooms at the Computer Science Department of the University of Valencia and in the outdoor environment, the nodes were located in the outdoor of different houses in a residential area in Murcia.
From the binaural psycho-acoustic parameters gathered and the SPL, different statistical analysis were done. The correlation between these parameters was performed, showing that the binaural loudness and the SPL correlated well in nearly every location (indoor and outdoor, at a 99% significance level). Also, the binaural sharpness worked fine with the SPL, but with smaller Pearson coefficient values.
From the simplified version of the Zwicker’s Psycho-acoustic Annoyance model, the values of the Binaural Psycho-acoustic Annoyance were computed to determine the subjective annoyance at each node location in the indoor and outdoor environment. At this point, different cross-validation analysis were done in order to determine the validity of the BPA spatial and temporal values. To this end, a Leave-One-Out-Cross-Validation algorithm was used to determine the best modelling set (a good training set was established with 50% of the values). In all cases (indoor and outdoor), the binaural loudness and sharpness explained better the relationship with BPA than the sound pressure level, both for spatial averages and for temporal series. Finally, a spatial statistical analysis was done in the indoor and the outdoor environment by using an Ordinary Kriging technique. The error of the Kriging estimation is kept bounded inside the region between the measuring points below 25% and grows up outside this region. Also a short on-site subjective study has been done. According to the results, the BPA model has been validated for the indoor and outdoor environment. Further work is needed for the spatial validation of the subjective assessment.
Currently, further work is being done with the implementation of the complete Zwicker’s Psycho-acoustic Annoyance model, adding the Roughness and Fluctuation Strength algorithms to the implementation of the binaural model.