On the Arcsecant Hyperbolic Normal Distribution. Properties, Quantile Regression Modeling and Applications
Next Article in Journal
A Type of Time-Symmetric Stochastic System and Related Games
Previous Article in Journal
Transverse–Spin Quark Distributions from Asymmetry Data and Symmetry Arguments
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

On the Arcsecant Hyperbolic Normal Distribution. Properties, Quantile Regression Modeling and Applications

by
Mustafa Ç. Korkmaz
1,
Christophe Chesneau
2,* and
Zehra Sedef Korkmaz
3
1
Department of Measurement and Evaluation, Artvin Çoruh University, City Campus, Artvin 08000, Turkey
2
Laboratoire de Mathématiques Nicolas Oresme, University of Caen-Normandie, 14032 Caen, France
3
Department of Curriculum and Instruction Program, Artvin Çoruh University, City Campus, Artvin 08000, Turkey
*
Author to whom correspondence should be addressed.
Symmetry 2021, 13(1), 117; https://doi.org/10.3390/sym13010117
Submission received: 12 December 2020 / Revised: 6 January 2021 / Accepted: 8 January 2021 / Published: 12 January 2021

Abstract

:
This work proposes a new distribution defined on the unit interval. It is obtained by a novel transformation of a normal random variable involving the hyperbolic secant function and its inverse. The use of such a function in distribution theory has not received much attention in the literature, and may be of interest for theoretical and practical purposes. Basic statistical properties of the newly defined distribution are derived, including moments, skewness, kurtosis and order statistics. For the related model, the parametric estimation is examined through different methods. We assess the performance of the obtained estimates by two complementary simulation studies. Also, the quantile regression model based on the proposed distribution is introduced. Applications to three real datasets show that the proposed models are quite competitive in comparison to well-established models.

1. Introduction

Over the past twenty years, many statisticians and researchers have focused on proposing new extended or generalized distributions by adding additional parameters to the basic probability distributions. The common point of these studies is to obtain better inferences than those of the baseline probability distributions. In this context, especially, the modeling approaches on the unit interval have recently multiplied since they are related to specific issues such as the recovery rate, mortality rate, daily patient rate, etc. The beta distribution is the best-known distribution defined over the unit interval for modeling the above measures. It has great flexibility in the shapes of the probability density function (pdf) and hazard rate function (hrf). Although it has very flexible forms for data modeling, sometimes it is not sufficient for modeling and explaining unit datasets. For this reason, new alternative unit models have been proposed in the statistical distribution literature, including the Johnson S B [1], Topp-Leone [2], Kumaraswamy [3], standart two-sided power [4], log-Lindley [5], log-xgamma [6], unit Birnbaum-Saunders [7], unit Weibull [8], unit Lindley [9], unit inverse Gaussian [10], unit Gompertz [11], second degree unit Lindley [12], log-weighted exponential [13], logit slash [14], unit generalized half normal [15], unit Johnson S U [16], trapezoidal beta [17] and unit Rayleigh [18] distributions. Many of the above distributions were obtained by transforming the baseline distribution, and they performed better than the beta distribution in terms of data modeling. For instance, the Johnson S B distribution was created via logistic transformation of the ordinary normal distribution. In this way, a very flexible unit normal distribution was obtained over the unit interval. The other mentioned unit distributions introduced over the last decade can also be seen as alternatives to the well-known beta, Johnson S B , Topp-Leone and Kumaraswamy distributions.
On the other hand, the ordinary regression models explain the response variable for given certain values of the covariates based on the conditional mean. However, the mean may be affected by a skewed distribution or outliers in the measurements. Possible solutions are provided by the quantile regression models proposed by [19], particularly popular for being less sensitive to outliers than the ordinary regression models.
In line with above, the aim of this study is to introduce a new alternative unit probability distribution based on the normal distribution. More precisely, we use a new transformation of the normal distribution based on the hyperbolic secant function. As a matter of fact, the use of the hyperbolic function has not received enough attention in the published literature on distribution theory, despite the great interest among students and practitioners of the few distributions based on it. Examples include the famous hyperbolic secant distribution and its generalizations as presented in [20]. In a sense, we show that the proposed methodology allows us to transport the applicability and working capacity of the normal distribution to the unit interval. In particular, we develop a new quantile regression modeling via the re-parameterizing of the new probability distribution in terms of any quantile. All these aspects are developed in the article through mathematical, graphical and numerical approaches.
The paper has been set as follows. We define the proposed distribution in Section 2. Its basic distributional properties are described in Section 3. Section 4 is devoted to the procedures of the different parametric estimation methods. Two different simulation studies are given to see the performance of the different estimates of the model parameters in Section 5. The new quantile regression model based on the proposed distribution and its residual analysis are introduced by Section 6. Three real data illustrations, one of which relates to quantile modeling and others to univariate data modeling, are illustrated in Section 7. Finally, the paper is ended with conclusions in Section 8.

2. The New Unit Distribution and Its Properties

The new unit distribution is defined as follows: Let Y be a random variable such that Y N ( μ , σ 2 ) where μ R and σ > 0 , and X be the random variable defined by
X = sech Y ,
where sech y = 2 / ( e y + e y ) = 2 e y / ( e 2 y + 1 ) ( 0 , 1 ) is the hyperbolic secant function for y R , also known as the inverse of the hyperbolic cosine function. Then the distribution of X is called “arcsech” normal distribution and it is denoted by A S H N or A S H N ( μ , σ ) when μ and σ are required. To our knowledge, it constitutes a new unit distribution; It is unlisted in the literature. Before stating the motivations for the A S H N distribution, the corresponding cumulative distribution function (cdf) and pdf are presented in the following proposition.
Proposition 1.
The cdf and pdf of the A S H N ( μ , σ ) distribution are given as
F ( x , μ , σ ) = 2 Φ arcsech x + μ σ Φ arcsech x μ σ
and
f ( x , μ , σ ) = 1 σ x 1 x 2 ϕ arcsech x + μ σ + ϕ arcsech x μ σ ,
respectively, for x ( 0 , 1 ) , where arcsech z = log 1 + 1 z 2 / z > 0 is the hyperbolic arcsecant function (or inverse hyperbolic secant function) for z ( 0 , 1 ) , Φ ( x ) and ϕ ( x ) are the cdf and pdf of the N ( 0 , 1 ) distribution, respectively. For x ( 0 , 1 ) , standard completions on these functions are performed.
For the sake of presentation, the proof of this result and those of the results to come are given in Appendix A.
Based on Proposition 1, as a first property, note that, for μ = 0 and x ( 0 , 1 ) , the cdf and pdf are reduced to the quite manageable functions:
F ( x , 0 , σ ) = 2 1 Φ 1 σ arcsech x
and
f ( x , 0 , σ ) = 2 σ x 1 x 2 ϕ 1 σ arcsech x .
In full generality, for x ( 0 , 1 ) , an alternative formulation for the pdf is
f ( x , μ , σ ) = 2 π 1 σ x 1 x 2 e arcsech x 2 + μ 2 2 σ 2 cosh μ σ 2 arcsech x ,
where cosh y = ( e y + e y ) / 2 is the hyperbolic cosine function for y R . Eventually, we can express the cosh term in Equation (5) as
cosh μ σ 2 arcsech x = 1 2 1 x + 1 x 2 1 μ σ 2 + 1 x + 1 x 2 1 μ σ 2 .
Let us now focus on the behavior of f ( x , μ , σ ) at the boundaries.
  • When x tends to 0, since arcsech x log x + and it appears in power 2 the exponential term, we have f ( x , μ , σ ) 0 .
  • When x tends to 1, since arcsech 1 = 0 , we have
    f ( x , μ , σ ) 1 π 1 σ 1 x e μ 2 2 σ 2 + .
    If σ is large and μ 2 2 σ 2 , or μ 2 / 2 σ 2 is large, the point x = 1 appears as a “special singularity” in the following sense: The function f ( x , μ , σ ) can decrease to 0 in the neighborhood of x = 1 , then suddenly explodes at x = 1 . This phenomenon is only punctual; this is not a particular disadvantage for statistical modeling purposes.
Also, from Equation (2), it can seen that
f ( x , μ , σ ) = 1 σ x 1 x 2 ϕ arcsech x μ σ + ϕ arcsech x + μ σ = f ( x , μ , σ ) .
This means that the pdf shapes of the A S H N ( μ , σ ) distribution coincide with those of the A S H N ( μ , σ ) distribution. Another remark is that the A S H N distribution can have one mode into ( 0 , 1 ) , and it corresponds to the x satisfying the following equation:
2 σ 2 x 2 σ 2 arctanh ( x ) + 1 x 2 arcsech x = 0 .
This equation is complex and needs a numerical treatment to determine the value of the mode, if it exists.
The hrf of the A S H N ( μ , σ ) distribution is given by
h ( x , μ , σ ) = ϕ arcsech x + μ σ + ϕ arcsech x μ σ σ x 1 x 2 Φ arcsech x + μ σ + Φ arcsech x μ σ 1 .
Some plots of f ( x , μ , σ ) and h ( x , μ , σ ) are shown in Figure 1.
From Figure 1, the flexibility of the obtained curves is flagrant; J, reversed J, U and bell shapes are observed for the pdf, whereas U, N and reversed J shapes are observed for the hrf. This panel of shapes is a plus for the A S H N distribution, motivating its use for statistical modeling.

3. Distributional Properties

This section is devoted to some mathematical properties satisfied by the A S H N distribution.

3.1. A Likelihood Ratio Order Result

The proposition below shows that the A S H N distribution satisfies a strong intrinsic stochastic order result.
Proposition 2.
Let X A S H N ( μ , σ 1 ) and Y A S H N ( μ , σ 2 ) with μ = 0 and σ 1 > σ 2 . Then X is smaller than Y in likelihood ratio order.
In the general case where μ 0 , there is no actual proof of such stochastic ordering properties. Further, let us mention that the likelihood order is a strong property, implying various stochastic orders such that the usual stochastic, hazard rate, reversed mean inactivity time, mean residual life and harmonic mean residual life orders, among others. We may refer the reader to [21] for all the theory and details about the concept of stochastic ordering.

3.2. Quantile Function

The theoretical definition of the quantile function (qf) of the A S H N ( μ , σ ) distribution is the inverse function of Equation (1), that is
Q ( y , μ , σ ) = F 1 ( y , μ , σ ) , y ( 0 , 1 ) .
In full generality, due to the complexity of F ( x , μ , σ ) , it is not possible to have a closed-form expression of this qf. However, in the case μ = 0 , we arrive at
Q ( y , μ , σ ) = sech σ Φ 1 1 y 2 , y ( 0 , 1 ) ,
where Φ 1 ( x ) denotes the inverse function of Φ ( x ) , which also corresponds to the qf of the N ( 0 , 1 ) distribution. In this case, the first quartile is obtained as Q 1 = Q ( 1 / 4 , μ , σ ) sech σ × 1.150349 , the median is given by M = Q ( 1 / 2 , μ , σ ) sech σ × 0.6744898 , and the third quartile is defined by Q 3 = Q ( 3 / 4 , μ , σ ) sech σ × 0.3186394 . Further, from Q ( y , μ , σ ) , one can generate values from the A S H N distribution through basic simulation methods.

3.3. Moments

Let X A S H N ( μ , σ ) . As prime definition, for any integer r, by denoting as E the expectation operator, the rth ordinary moment of X is defined by
m r = E ( X r ) = 0 1 x r f ( x , μ , σ ) d x = 1 σ 0 1 x r 1 1 x 2 ϕ arcsech x + μ σ + ϕ arcsech x μ σ d x .
For the special case μ = 0 , one can express it via the qf as
m r = 0 1 [ Q ( y , μ , σ ) ] r d y = 0 1 sech σ Φ 1 1 y 2 r d y .
Clearly, there is no simple expression for m r . When the parameters are fixed, it can be calculated numerically through standard numerical integration techniques. As the main analytical approach, one can consider a series expansion for m r as stated in the result below.
Proposition 3.
The rth moment of X A S H N ( μ , σ ) has the following expansion:
m r = 2 r k = 0 + r k e ( 2 k + r ) μ M σ ( 2 k + r ) , μ σ + k = 0 + r k e ( 2 k + r ) μ M σ ( 2 k + r ) , μ σ ,
where M ( x , a ) = E [ e x U I ( U > a ) ] with U N ( 0 , 1 ) , x R and a R , and I ( . ) denotes the indicator function.
The function M ( x , a ) introduced in Proposition 3 can be viewed as the upper incomplete version of the moment generating function of the N ( 0 , 1 ) distribution. Naturally, it can be bounded from above as M ( x , a ) E ( e x U ) = e x 2 2 for x R and a R . By applying the Markov inequality, a lower is obtained as M ( x , a ) e x a ( 1 Φ ( a ) ) for x 0 and a R .
Proposition 3 gives an analytical approach for mathematical manipulations or computations of m r . Further, the following finite sum approximation is an immediate consequence:
m r 2 r k = 0 K r k e ( 2 k + r ) μ M σ ( 2 k + r ) , μ σ + k = 0 K r k e ( 2 k + r ) μ M σ ( 2 k + r ) , μ σ ,
where K denotes a reasonably large integer.
From the moments, we can derive other measures of interest for X. For instance, the mean of X is just m 1 , the variance of X can be determined through the Koenig-Huyghens formula involving m 1 and m 2 , that is V = m 2 m 1 2 , the rth central moment defined by m r c = E [ ( X m 1 ) r ] can be expressed via m 1 , , m r by using the binomial formula, the skewness coefficient of X is defined by S = m 3 c V 3 2 and the kurtosis coefficient of X is given by K = m 4 c V 2 . These coefficients evaluate the “peakedness” and “tailedness” of the A S H N distribution, respectively. Figure 2 represents these coefficients while varying the values for μ and σ .
From Figure 2, we see that the skewness coefficient can be negative and positive, and the kurtosis coefficient can be either very small or very large. Both have a complex non-monotonic structure. These facts attest to the ability of the A S H N distribution to adapt to various situations from heterogeneous unit data.

3.4. Order Statistics

The order statistics are important since they are involved in many statistical modeling and methods. Here, the basics of them in the context of the A S H N distribution are described. Let X 1 , X 2 , , X n be a random sample from X A S H N ( μ , σ ) , and X ( 1 ) , X ( 2 ) , , X ( n ) be the corresponding order statistics, that is X ( 1 ) X ( 2 ) X ( n ) . Then, the pdf of X ( i ) has the following general expression:
f X ( i ) ( x , μ , σ ) = n ! ( i 1 ) ! ( n i ) ! f ( x , μ , σ ) [ F ( x , μ , σ ) ] i 1 [ 1 F ( x , μ , σ ) ] n i .
Owing to Equations (1) and (2), for x ( 0 , 1 ) , we obtain
f X ( i ) ( x , μ , σ ) = n ! ( i 1 ) ! ( n i ) ! 1 σ x 1 x 2 ϕ arcsech x + μ σ + ϕ arcsech x μ σ × 2 Φ arcsech x + μ σ Φ arcsech x μ σ i 1 × Φ arcsech x + μ σ + Φ arcsech x μ σ 1 n i .
The pdf of the extreme statistics X ( 1 ) and X ( n ) are derived by substituting i = 1 and i = n in the above equation, respectively. Other important results are that
E [ F ( X ( i ) , μ , σ ) ] = i n + 1 , V [ F ( X ( i ) , μ , σ ) ] = i ( n i + 1 ) ( n + 2 ) ( n + 1 ) 2 .
The order statistics, as well as their mean and variance, will be useful in the next section.

4. Different Methods of the Parameter Estimation

In this section, we point out some different estimators to estimate the parameters of the A S H N model. More precisely, the maximum likelihood, maximum product spacings, least squares, weighted least squares, Anderson-Darling and Cramér-von Mises estimates are derived.

4.1. Maximum Likelihood Estimation

Let X 1 , X 2 , , X n be a random sample from the A S H N distribution with observed values x 1 , x 2 , , x n , and Θ = ( μ , σ ) T be the vector of the model parameters. Then, the log-likelihood function is given by
= Θ = n log σ n 2 log 2 π i = 1 n log x i 1 x i 2 1 2 σ 2 i = 1 n arcsech ( x i ) μ 2 + i = 1 n log 1 + e 2 μ σ 2 arcsech ( x i ) .
Based on Θ , the maximum likelihood estimations (MLEs) of μ and σ , say μ ^ and σ ^ , respectively, are obtained as
( μ ^ , σ ^ ) = argmax Θ R × ( 0 , + ) ( Θ ) .
Mathematically, this is equivalent to solve the following equations with respect to the parameters:
( Θ ) μ = 1 σ 2 i = 1 n arcsech ( x i ) μ 2 σ 2 i = 1 n arcsech ( x i ) e 2 μ σ 2 arcsech ( x i ) 1 + e 2 μ σ 2 arcsech ( x i ) = 0
and
( Θ ) σ = n σ + 1 σ 3 i = 1 n arcsech ( x i ) μ 2 + 4 μ σ 3 i = 1 n arcsech ( x i ) e 2 μ σ 2 arcsech ( x i ) 1 + e 2 μ σ 2 arcsech ( x i ) = 0 .
From Equation (10), we have
1 2 i = 1 n arcsech ( x i ) μ = i = 1 n arcsech ( x i ) e 2 μ σ 2 arcsech ( x i ) 1 + e 2 μ σ 2 arcsech ( x i ) .
Substituting the right hand side of Equation (12) in Equation (11), the following equation is obtained for the desired solution for σ 2 :
σ 2 = 1 n i = 1 n arcsech ( x i ) μ 2 + 2 μ i = 1 n arcsech ( x i ) μ = 1 n i = 1 n arcsech ( x i ) 2 μ 2 .
Then, substituting Equation (13) in Equation (9), we obtain the profile log-likelihood according to μ as
( μ ) = n 2 log 1 n i = 1 n arcsech ( x i ) 2 μ 2 n 2 log 2 π i = 1 n log x i 1 x i 2 + i = 1 n log 1 + exp 2 μ arcsech ( x i ) 1 n i = 1 n arcsech ( x i ) 2 μ 2 1 i = 1 n arcsech ( x i ) μ 2 2 1 n i = 1 n arcsech ( x i ) 2 μ 2 .
Following the normal routine of the parameter estimation based on the profile log-likelihood function, we have
( μ ) μ = n μ 1 n i = 1 n arcsech ( x i ) 2 μ 2 μ i = 1 n arcsech ( x i ) μ 2 i = 1 n arcsech ( x i ) μ 1 n i = 1 n arcsech ( x i ) 2 μ 2 1 n i = 1 n arcsech ( x i ) 2 μ 2 2 i = 1 n 2 arcsech ( x i ) 1 n i = 1 n arcsech ( x i ) 2 + 4 μ 2 exp 2 μ arcsech ( x i ) 1 n i = 1 n arcsech ( x i ) 2 μ 2 1 1 n i = 1 n arcsech ( x i ) 2 μ 2 2 1 + exp 2 μ arcsech ( x i ) 1 n i = 1 n arcsech ( x i ) 2 μ 2 1 .
Hence, the numerical methods are needed to obtain μ ^ . Once μ ^ is obtained, the MLE σ ^ is obtained by taking the square root of σ ^ 2 as governed by Equation (13).
The well-known theory of the maximum likelihood method states that, under mild regularity conditions, one can use the bivariate normal distribution with mean μ = ( μ , σ ) and covariance matrix I 1 , where
I = 2 μ 2 ( Θ ) 2 μ σ ( Θ ) 2 μ σ ( Θ ) 2 σ 2 ( Θ ) Θ = Θ ^ ,
to construct confidence intervals or likelihood ratio test on the parameters. The components of I can be derived through standard derivatives formula. Then, approximate 100 ( 1 ϑ ) % confidence intervals for μ and σ can be determined by μ ^ ± z ϑ / 2 s μ ^ and σ ^ ± z ϑ / 2 s σ ^ , respectively, where z ϑ / 2 is the upper ( ϑ / 2 ) th percentile of the standard normal distribution, s μ ^ is the first diagonal element of I 1 and s σ ^ is its second diagonal element. Thus defined, they are the (asymptotic) standard errors (SEs) of μ ^ and σ ^ , respectively.

4.2. Maximum Product Spacing Estimation

Cheng and Aming [22] have proposed the maximum product spacing (MPS) method as an alternative to the maximum likelihood method. It is based on the idea that differences (spacings) between the values of the cdf at consecutive data points should be identically distributed. Now, let X ( 1 ) , X ( 2 ) , , X ( n ) be the order statistics from the A S H N distribution with sample size n, and x ( 1 ) , x ( 2 ) , , x ( n ) be the ordered observed values. Then, the MPS estimates (MPSEs) of μ and σ , say μ ^ M P S and σ ^ M P S , respectively, are given as
( μ ^ M P S , σ ^ M P S ) = argmax Θ R × ( 0 , + ) M P S ( Θ ) ,
where
M P S ( Θ ) = 1 n + 1 i = 1 n + 1 log F ( x ( i ) , μ , σ ) F ( x ( i 1 ) , μ , σ ) .
They are also given as the simultaneous solutions of the following equations:
M P S ( Θ ) μ = 1 n + 1 i = 1 n + 1 F μ ( x ( i ) , μ , σ ) F μ ( x ( i 1 ) , μ , σ ) F ( x ( i ) , μ , σ ) F ( x ( i 1 ) , μ , σ ) = 0
and
M P S ( Θ ) σ = 1 n + 1 i = 1 n + 1 F σ ( x ( i ) , μ , σ ) F σ ( x ( i 1 ) , μ , σ ) F ( x ( i ) , μ , σ ) F ( x ( i 1 ) , μ , σ ) = 0 ,
where
F μ ( x , μ , σ ) = 1 σ ϕ arcsech x μ σ ϕ arcsech x + μ σ
and
F σ ( x , μ , σ ) = 1 σ 2 arcsech x μ ϕ arcsech x μ σ + arcsech x + μ ϕ arcsech x + μ σ .

4.3. Least Squares Estimation

The least square estimates (LSEs) of μ and σ , say μ ^ L S E and σ ^ L S E , respectively, are obtained as
( μ ^ L S E , σ ^ L S E ) = argmin Θ R × ( 0 , + ) L S E ( Θ ) ,
where
L S E ( Θ ) = i = 1 n F ( x ( i ) , μ , σ ) E F ( X ( i ) , μ , σ ) 2 ,
where, by Equation (8), E F ( X ( i ) , μ , σ ) = i / ( n + 1 ) for i = 1 , 2 , , n . Then, μ ^ L S E and σ ^ L S E are solutions of the following equations:
L S E ( Θ ) μ = 2 i = 1 n F μ ( x ( i ) , μ , σ ) F ( x ( i ) , μ , σ ) i n + 1 = 0
and
L S E ( Θ ) σ = 2 i = 1 n F σ ( x ( i ) , μ , σ ) F ( x ( i ) , μ , σ ) i n + 1 = 0 ,
where F μ ( x ( i ) , μ , σ ) and F σ ( x ( i ) , μ , σ ) are mentioned before.

4.4. Weighted Least Squares Estimation

Similarly to LSEs, the weighted least square estimates (WLSEs) of μ and σ , say μ ^ W L S E and σ ^ W L S E , respectively, are given as
( μ ^ W L S E , σ ^ W L S E ) = argmin Θ R × ( 0 , + ) W L S E ( Θ ) ,
where
W L S E ( Θ ) = i = 1 n 1 V F ( X ( i ) , μ , σ ) F ( x ( i ) , μ , σ ) E F ( X ( i ) , μ , σ ) 2 ,
where, by Equation (8), E F ( X ( i ) , μ , σ ) = i / ( n + 1 ) and V F ( X ( i ) , μ , σ ) = i ( n i + 1 ) / [ ( n + 2 ) ( n + 1 ) 2 ] for i = 1 , 2 , , n . Then, μ ^ W L S E and σ ^ W L S E are solutions of the following equations:
W L S E ( Θ ) μ = 2 i = 1 n ( n + 2 ) ( n + 1 ) 2 i ( n i + 1 ) F ( x ( i ) , μ , σ ) i n + 1 F μ ( x ( i ) , μ , σ ) = 0
and
W L S E ( Θ ) σ = 2 i = 1 n ( n + 2 ) ( n + 1 ) 2 i ( n i + 1 ) F ( x ( i ) , μ , σ ) i n + 1 F σ ( x ( i ) , μ , σ ) = 0 .

4.5. Anderson-Darling Estimation

The Anderson-Darling minimum distance estimates (ADEs) of μ and σ , say μ ^ A D and σ ^ A D , respectively, are determined as
( μ ^ A D , σ ^ A D ) = argmin Θ R × ( 0 , + ) A D ( Θ ) ,
where
A D Θ = n i = 1 n 2 i 1 n log [ F ( x ( i ) , μ , σ ) ] + log 1 F ( x ( n + 1 i ) , μ , σ ) .
Therefore, μ ^ A D and σ ^ A D can be obtained as the solutions of the following system of equations:
A D Θ μ = i = 1 n 2 i 1 n F μ ( x ( i ) , μ , σ ) F ( x ( i ) , μ , σ ) F μ ( x ( n + 1 i ) , μ , σ ) 1 F ( x ( n + 1 i ) , μ , σ ) = 0
and
A D Θ σ = i = 1 n 2 i 1 n F σ ( x ( i ) , μ , σ ) F ( x ( i ) , μ , σ ) F σ ( x ( n + 1 i ) , μ , σ ) 1 F ( x ( n + 1 i ) , μ , σ ) = 0 .

4.6. The Cramér-von Mises Estimation

The Cramér-von Mises minimum distance estimates (CVMEs) of μ and σ , say μ ^ C V M and σ ^ C V M , respectively, are specified as
( μ ^ C V M , σ ^ C V M ) = argmin Θ R × ( 0 , + ) C V M ( Θ ) ,
where
C V M Θ = 1 12 n + i = 1 n F ( x ( i ) , μ , σ ) 2 i 1 2 n 2 .
Therefore, the estimates μ ^ C V M and σ ^ C V M can be obtained as the solutions of the following system of equations:
C V M Θ μ = 2 i = 1 n F ( x ( i ) , μ , σ ) 2 i 1 2 n F μ ( x ( i ) , μ , σ ) = 0
and
C V M Θ σ = 2 i = 1 n F ( x ( i ) , μ , σ ) 2 i 1 2 n F σ ( x ( i ) , μ , σ ) = 0 .
All the presented equations contain complex non-linear functions; it is not possible to obtain explicit forms of all estimates. Therefore, they need to be solved through numerical methods such as the Newton-Raphson and quasi-Newton algorithms. In addition, Equations (9) and (14)–(18) can be also optimized directly by using the software such as R (constrOptim and optim), S-Plus and Matlab to numerically optimize ( Θ ) , M P S Θ , L S E Θ , W L S E Θ , A D Θ and C V M Θ functions.

5. Empirical Simulations

In this section, we perform two graphical simulation studies to see the performance of the above estimates with varying sample size n. We generate N = 1000 samples of size n = 20 , 25 , , 1000 from the A S H N distribution based on the following parameter values: ( μ = 2 , σ = 2 ) and ( μ = 0.5 , σ = 0.5 ) for the first and second simulation studies, respectively. The random numbers generation is obtained by the qf of the model. All the estimates based on the estimation methods are obtained by using the constrOptim function in the R program. Further, we calculate the empirical mean, bias and mean square error (MSE) of the estimates for comparisons between the methods. For ϵ = μ or ϵ = σ , the bias and MSE associated to ϵ are calculated by
B i a s ϵ ( n ) = 1 N i = 1 N ( ϵ ϵ ^ i ) , M S E ϵ ( n ) = 1 N i = 1 N ( ϵ ϵ ^ i ) 2 ,
respectively, where i is related to the ith sample. We expect that the empirical means are close to true values when the MSEs and biases are near zero. The results of this simulation study are shown in Figure 3 and Figure 4.
Figure 3 and Figure 4 show that all estimates are consistent since the MSE and biasedness decrease to zero with increasing sample size as expected. One can state that all estimates are asymptotic unbiased. According to these two simulation studies, the amount of the biases and MSEs of the MLE method are smaller than those of the other methods for both parameters. Therefore, the ;MLE method can be chosen as more reliable than other methods of the newly defined model. Generally, the performances of all estimates are close when sample size increases. The similar results can be seen for different parameter values.
Moreover, we also give simulation study of the MLEs based on their 95% confidence intervals. In this regard, we use the coverage probability (CP) criteria defined by
C P ϵ ( n ) = 1 N i = 1 N I ( ϵ i ^ ± 1.95996 s ϵ i ^ ) ,
where s ϵ ^ i is the SE of the MLE ϵ ^ i . Figure 5 displays the obtained simulation results. From Figure 5, as expected, for each parameter, the CPs converge to the nominal value, that is 0.95, when sample size increases. The simulation results verify the consistency property of the MLEs.

6. A New Quantile Regression Model Based on the Special ASHN Distribution

6.1. Motivation

The quantile regression has been developed in the seminal work of [19] as a way to model the conditional quantiles of an outcome variable as a function of covariates (regressors). Since this analysis aims to model the conditional quantiles of the response variable, it is a good robust alternative model to the ordinary LSE model, which estimates the conditional mean of the response variable. This is because the mean is affected by a skewed distribution or outliers in the measurements. Hence, the quantile response regression model will be less sensitive to outliers than the mean response regression model.
On the other hand, if the support of the response variable is defined on the unit interval, one can use an unit regression model based on an unit distribution for modeling the conditional mean or quantiles of the response variable via covariates. The beta regression [23] model first comes to mind to relate to continuous unit mean response variables in the unit interval with covariates. One may also see [12,13,24,25,26] for alternative unit mean response regression models to beta regression models. If the conditional dependent variable is skewed or has outliers, the quantile response modeling may be more appropriate when compared with the mean response modeling. The model is also motivated by the natural idea of replacing mean by median as a central tendency measure when the response data is severely asymmetric [27].
On the other hand, with the re-parameterizing the probability distribution as a function of the quantile approach, the Kumaraswamy [28,29] and unit Weibull [30] quantile regression models have been proposed for modeling the conditional quantiles of the unit response. One may also refer to [27,31,32,33,34] for alternative quantile response regression models. On the basis of these references, we want to propose an alternative quantile regression model considering a parameterization of the A S H N distribution in terms of its any quantile. More precisely, the re-parameterizing process is applied via a scale parameter as being a quantile of the of the A S H N distribution.

6.2. Proposed Quantile Regression Model

Now, we can focus on introducing an alternative quantile regression model based on a special A S H N ( μ , σ ) distribution. Since the A S H N distribution has not an explicit qf, we propose another distribution based on a special A S H N distribution. We call it exponentiated A S H N ( E A S H N ) distribution. Its cdf and pdf are given by
G ( y , α , σ ) = 2 2 Φ 1 σ arcsech y α
and
g ( y , α , σ ) = 2 α σ y 1 y 2 ϕ 1 σ arcsech y 2 2 Φ 1 σ arcsech y α 1 ,
respectively, where y ( 0 , 1 ) and α , σ > 0 . For y ( 0 , 1 ) , standard completions on these functions are performed. The cdf in Equation (19) is obtained as the exponentiated A S H N ( 0 , σ ) distribution, that is G ( y , α , σ ) = F ( y , 0 , σ ) α . We can call this model as Lehmann type I A S H N model.
The qf of the E A S H N distribution is given by
Q ( u , α , σ ) = sech σ Φ 1 1 u 1 α 2 ,
where u ( 0 , 1 ) . We are also motivated with the quantile regression modeling thanks to its manageable qf. Then, the pdf of the E A S H N distribution can be re-parameterized in terms of its uth quantile as η = Q ( u , α , σ ) . Let σ = arcsech η / Φ 1 ( 2 u 1 / α ) / 2 . Then, the cdf and pdf of the re-parameterized distribution are given by
G ( y , α , η ) = 2 2 Φ Φ 1 1 u 1 α / 2 arcsech η arcsech y α
and
g ( y , α , η ) = 2 α Φ 1 1 u 1 α / 2 arcsech η y 1 y 2 ϕ Φ 1 1 u 1 α / 2 arcsech η arcsech y × 2 2 Φ Φ 1 1 u 1 α / 2 arcsech η arcsech y α 1 ,
respectively, where α > 0 is the shape parameter. The parameter η ( 0 , 1 ) represents the quantile parameter and it is assumed that u is known. A random variable Y having the pdf in Equation (21) is denoted by Y E A S H N ( α , η , u ) . Some possible shapes of the re-parameterized model are shown in Figure 6. We see that the possible pdf shapes of the E A S H N distribution are the skewed shapes as well as U-shapes, N-shapes and increasing shapes.
We present the quantile regression model based on the E A S H N distribution with pdf in Equation (21). Let y 1 , y 2 , , y n be n random observations from the re-parameterized distribution such that, for i = 1 , , n , y i is a realization of Y E A S H N ( α , η i , u ) , with unknown parameters η i and β , recalling that the parameter u is known. Then the E A S H N quantile regression model is defined as
g ( η i ) = x i β T ,
where β = β 0 , β 1 , β 2 , , β p T and x i = 1 , x i 1 , x i 2 , x i 3 , , x i p are the unknown regression parameter vector and known ith vector of the covariates. Thus defined, g ( x ) is the link function which is used to link the covariates to conditional quantile of the response variable. For instance, when u = 0.5 , the covariates are linked to conditional median of the response variable. The choice of the appropriate link function should be done considering the domain of the distribution.

6.3. Parameter Estimation

The unknown parameters of the E A S H N quantile regression model are obtained by means of the MLE method. Since the E A S H N distribution is defined on the unit interval, we use the logit-link function, that is
g ( η i ) = logit ( η i ) = log η i 1 η i = x i β T ,
implying that
η i = exp x i β T 1 + exp x i β T .
By putting Equation (22) into Equation (21), the log-likelihood function of the E A S H N quantile regression model is
Ω = n log 2 n 2 log 2 π + n log α + n log Φ 1 1 u 1 α 2 i = 1 n log arcsech ( η i ) y i 1 y i 2 1 2 Φ 1 1 u 1 α 2 2 i = 1 n arcsech ( y i ) arcsech ( η i ) 2 + ( α 1 ) i = 1 n log 2 2 Φ arcsech ( y i ) arcsech ( η i ) Φ 1 1 u 1 α 2 ,
where Ω = α , β denotes the unknown parameter vector. Since Equation (23) includes nonlinear function according to model parameters, it can be maximized directly by software such as R, S-Plus, and Mathematica. Note that, when u = 0.5 , this is equivalent to modeling the conditional median. Under mild conditions of regularity, the asymptotic distribution of Ω ^ Ω is multivariate normal N p + 1 0 , J 1 , where variance-covariance matrix J 1 is defined by the inverse of the expected information matrix. For practical aims, we can use the observed information matrix instead of J. The elements of this observed information matrix are evaluated numerically by the software. We use the maxLik function implemented in the R software to maximize Equation (23) (see [35]). This function also gives asymptotic SEs numerically, which are obtained by the observed information matrix.

6.4. Residual Analysis

Residual analysis may be necessary to verify if the regression model is suitable. To see this, we work with the randomized quantile residuals [36] and the Cox-Snell residuals [37].
For i = 1 , , n , the ith randomized quantile residual is defined by
r ^ i = Φ 1 G ( y i , α ^ , η ^ i ) ,
where G ( y , α , η ) is the cdf of the re-parameterized E A S H N distribution specified by Equation (20) and η ^ i is defined by Equation (22) with β ^ replacing β . If the fitted model successfully processes the dataset, the distribution of the randomized quantile residuals will distribute the standard normal distribution.
Alternatively, for i = 1 , , n , the ith Cox and Snell residual is given by
e ^ i = log 1 G ( y i , α ^ , η ^ i ) .
If the model fits to data accordingly, the distribution of the Cox and Snell residuals will distribute a standard exponential distribution, that is with scale parameter 1.

7. Data Analysis

To emphasize the importance of the modeling ability of the A S H N normal distribution, this section is devoted to three real data applications for both univariate data and quantile modelings.

7.1. Univariate Real Data Modeling

Here, we provide applications to two real datasets to prove empirically the potentiality of the A S H N model. The proposed model is compared with some well-known two-parameter unit distributions in the literature, namely:
  • Beta distribution.
    The two-parameter beta pdf is given by
    f B e t a ( x , μ , σ ) = 1 B μ , σ x μ 1 1 x σ 1 , x ( 0 , 1 ) ,
    where μ > 0 and σ > 0 are shape parameters, and B ( μ , σ ) is the classical beta function.
  • Kumaraswamy (Kw) distribution (see [3]).
    The two-parameter Kw pdf is expressed as
    f K w ( x , μ , σ ) = μ σ x μ 1 1 x μ σ 1 , x ( 0 , 1 ) ,
    where μ > 0 and σ > 0 are shape parameters.
  • Johnson S B distribution (see [1]).
    The two-parameter Johnson S B pdf is given by
    f S B ( x , μ , σ ) = σ x 1 x ϕ σ log x 1 x + μ , x ( 0 , 1 ) ,
    where μ R and σ > 0 are shape parameters, and ϕ ( · ) is the pdf of the standard normal distribution. For each model, we estimate the unknown parameters using the maximum likelihood approach.
Two datasets are considered. For them, in order to determine the optimum model, we compute the estimated log-likelihood values ^ , Akaike Information Criteria (AIC), Bayesian Information Criteria (BIC), Kolmogorov-Smirnov (KS), Anderson-Darling ( A * ) and Cramér-von Mises ( W * ) goodness-of-fit statistics for all models. In general, it can be chosen as the best model the one with the smaller values of the AIC, BIC, KS, A * and W * statistics and the larger values of ^ . The p-value of the KS test is also considered; more it is close to 1, better is the model. All computations are performed by using the maxLik [35] and goftest routines in the R software.

7.1.1. Data Analysis I

First, we consider an application to real dataset to show the modeling ability of the proposed distribution. The dataset introduces failure times of the 20 mechanical components given in [38]. The data are: 0.067, 0.068, 0.076, 0.081, 0.084, 0.085, 0.085, 0.086, 0.089, 0.098, 0.098, 0.114, 0.114, 0.115, 0.121, 0.125, 0.131, 0.149, 0.160, 0.485. Recently, these data was analyzed via different approaches by [15,39,40].
Table 1 lists the MLEs of the parameters and their SEs from the above fitted models and the values of the statistics: ^ , AIC, BIC, A * , W * and KS goodness-of-fit statistics. As it can be seen, the results indicate that the A S H N model has the smallest values of these statistics among the fitted models, and therefore it could be considered as the best model. The p-value of the KS test confirms this claim.
The plots of the fitted pdfs and cdfs are displayed in Figure 7. These plots show that the A S H N model provides the correct fit to these data compared to other models. Further, it captures data skewness and kurtosis better than other models.
Figure 8 shows plots of the profile log-likelihood (PLL) functions for the parameters μ and σ based on the first dataset. We observe that the likelihood equations have unique solutions for the MLEs.

7.1.2. Data Analysis II

Here, the Better Life Index (BLI) dataset is used to illustrate the usefulness of the A S H N distribution. The dataset can be found via link https://stats.oecd.org/index.aspx?DataSetCode=BLI2015. The BLI dataset is used to classify the OECD (Organisation for Economic Co-operation and Development) countries with 11 indicators and 24 variables as well as non-OECD economies such as Brazil and Russia. Here, we use an indicator that is entitled Job security as the dataset. This indicator presents the probability to become unemployed. Recently, these data was analyzed by [14]. We give the summary statistics of the dataset in Table 2. The data are right-skewed and have a consequent kurtosis.
Table 3 lists the MLEs, their SEs, ^ and goodness-of-fits statistics from the fitted models for this dataset. Table 3 shows that the proposed model could be chosen as the best model among the fitted models since it has the lowest values of the AIC, BIC, A * , W * and KS statistics and have the biggest ^ value. It also has the biggest p-value of the KS test.
The plots of the fitted pdfs and cdfs are displayed in Figure 9. These plots show that the A S H N model provides the good fit to these data compared to the other models.
Figure 10 shows plots of PLL functions for the parameters μ and σ based on the second dataset. From this figure, we see that the likelihood equations have unique solutions for the MLEs.

7.2. The Quantile Modeling Application of the Reading Accuracy with the Dyslexia and Intelligence Quotient

Here, a real data application is given in order to see the applicability of the newly defined regression model. We compare its results with the unit Weibull quantile regression model [30]. The quantile parameter u has been taken as 0.5 to model the median for regression models. The pdf of the unit Weibull quantile distribution is given by
f ( y , α , μ ) = α log 0.5 y log η log y log η α 1 0 . 5 log y / log η α , y ( 0 , 1 ) ,
where η ( 0 , 1 ) is the median and α > 0 is the shape parameter. The dataset consists of the reading accuracy for nondyslexic and dyslexic Australian children and contains 44 observations on 3 variables. The variable of interest is accuracy providing the scores on a test of reading accuracy taken by 44 children, which is predicted by the two regressors: dyslexia and nonverbal intelligence quotient (IQ). The dataset has been collected by [41], and analyzed by [42,43] via the beta regression modeling based on the data mean modeling. It is noticed that the original reading accuracy score has been transformed by [43] so that accuracy is in the open unit interval. Further, this dataset can be found easily via betareg function [42] in the R software.
The aim is to associate the reading accuracy values ( y ) with covariates. The response variable and covariates are:
  • y: reading score;
  • x 1 : Is the child dyslexic? (0 for no, 1 for yes);
  • x 2 : nonverbal intelligence quotient (IQ, converted to z scores).
The regression model based on η i is given by
logit ( η i ) = β 0 + β 1 x i 1 + β 2 x i 2 , i = 1 , 2 , , 44 ,
where η i denotes the median for the unit Weibull and E A S H N models.
The results of the E A S H N and unit Weibull regression models with model selection criteria are given in Table 4. As seen from the values of AIC and BIC statistics, the proposed regression model has lower values than those of the unit Weibull regression model. So, one can say that the E A S H N regression model exhibits better modeling ability than the unit Weibull regression model. Additionally, according to the estimated parameters of the E A S H N regression model, the parameters β 1 and β 2 have been seen statistically significant at any usual level. Hence, it is concluded that, when IQ increases, the reading accuracy increases also. However, the reading accuracy of the children with no dyslexia is higher than those of the children with dyslexia as expected.
Figure 11 and Figure 12 display the QQ plots of the randomized quantile residuals and PP plot of the Cox-Snell residuals for both regression models, respectively. These figures indicate that the fit of the E A S H N regression model is better than the one of the unit Weibull model.
Since the randomized quantile residuals have standard normal distributions, one may see whether they fit this corresponding distribution. The KS, A * and W * results are given in Table 5. It is clear that the results based on the E A S H N quantile regression model of the randomized quantile residuals are more suitable than those of the unit Weibull regression model.

8. Conclusions

We define a new unit model, called “arcsech” normal distribution, in order to model percentage, proportion and rate measurements. The idea is to take advantage of the hyperbolic arcsecant function to transpose the modeling capacities of the normal distribution for the processing of data defined on the unit interval. We investigate general structural properties of the new distribution. The model parameters are estimated by six different methods. The simulation studies are performed to see the performances of these estimates. The empirical findings indicate that the proposed model provides better fits than the well-known unit probability distributions in the literature for both its univariate data modeling and its regression modeling. It is hoped that the new distribution will attract attention in the other disciplines.

Author Contributions

M.Ç.K., C.C. and Z.S.K. have contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

The authors are grateful to the two anonymous referees helpful comments that improved the paper.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

The proofs of our main results are contained in this appendix.
Proof of Proposition 1.
Firstly, it is noticed that the hyperbolic secant function is the symmetrical on the , interval as well as it is the increasing function on the , 0 interval and it is the decreasing function on the 0 , interval. Based on the representation X = sech Y , where Y N ( μ , σ 2 ) , the cdf of X can be determined as
F ( x ) = P X x = P Y arcsech x + P arcsech x Y = Φ arcsech x μ σ + 1 Φ arcsech x μ σ = 2 Φ arcsech x + μ σ Φ arcsech x μ σ .
We get the declared definition of F ( x , μ , σ ) . By differentiation of F ( x , μ , σ ) with respect to x, since ( arcsech x ) / x = x 1 x 2 1 , the pdf f ( x , μ , σ ) follows, ending the proof.  □
Proof of Proposition 2.
In the case μ = 0 , owing to Equation (4), some simplifications give
f ( x , 0 , σ 1 ) f ( x , 0 , σ 2 ) = 2 ϕ ( arcsech x ) / σ 1 / ( σ 1 x 1 x 2 ) 2 ϕ ( arcsech x ) / σ 2 / ( σ 2 x 1 x 2 ) = σ 2 σ 1 e 1 2 ( σ 1 σ 2 ) σ 1 + σ 2 σ 1 2 σ 2 2 ( arcsech x ) 2 .
Since arcsech x is a positive decreasing function, if σ 1 > σ 2 , the above ratio function is decreasing with respect to x as composition of an increasing exponential function and a decreasing function. This proves the desired result.  □
Proof of Proposition 3.
We propose to exploit the following characterization of the A S H N distribution: We can express X as X = sech Y with Y N ( μ , σ 2 ) . Then, through the use of the generalized version of the binomial formula, we obtain
X r = ( sech Y ) r = 2 r e r Y ( 1 + e 2 Y ) r I ( Y < 0 ) + I ( Y = 0 ) + 2 r e r Y ( 1 + e 2 Y ) r I ( Y > 0 ) = 2 r k = 0 + r k e ( 2 k + r ) Y I ( Y < 0 ) + I ( Y = 0 ) + 2 r k = 0 + r k e ( 2 k + r ) Y I ( Y > 0 ) .
Therefore, since P ( Y = 0 ) = 0 , we get
m r = 2 r k = 0 + r k E [ e ( 2 k + r ) Y I ( Y < 0 ) ] + k = 0 + r k E [ e ( 2 k + r ) Y I ( Y > 0 ) ] .
In the distribution sense, one can write Y = μ + σ U with U N ( 0 , 1 ) , implying that
E [ e ( 2 k + r ) Y I ( Y < 0 ) ] = e ( 2 k + r ) μ E [ e σ ( 2 k + r ) U I ( U < μ / σ ) ] = e ( 2 k + r ) μ M σ ( 2 k + r ) , μ σ
and
E [ e ( 2 k + r ) Y I ( Y > 0 ) ] = e ( 2 k + r ) μ M σ ( 2 k + r ) , μ σ .
The stated result follows by combining the equations above together.  □

References

  1. Johnson, N.L. Systems of frequency curves generated by methods of translation. Biometrika 1949, 36, 149–176. [Google Scholar] [CrossRef]
  2. Topp, C.W.; Leone, F.C. A family of J-shaped frequency functions. J. Am. Stat. Assoc. 1955, 50, 209–219. [Google Scholar] [CrossRef]
  3. Kumaraswamy, P. A generalized probability density function for double-bounded random processes. J. Hydrol. 1980, 46, 79–88. [Google Scholar] [CrossRef]
  4. Van Dorp, J.R.; Kotz, S. The standard two-sided power distribution and its properties: With applications in financial engineering. Am. Stat. 2002, 56, 90–99. [Google Scholar] [CrossRef]
  5. Gómez-Déniz, E.; Sordo, M.A.; Calderín-Ojeda, E. The log–Lindley distribution as an alternative to the beta regression model with applications in insurance. Insur. Math. Econ. 2014, 54, 49–57. [Google Scholar] [CrossRef]
  6. Altun, E.; Hamedani, G.G. The log-xgamma distribution with inference and application. J. Soc. Fr. Stat. 2018, 159, 40–55. [Google Scholar]
  7. Mazucheli, J.; Menezes, A.F.; Dey, S. The unit-Birnbaum-Saunders distribution with applications. Chil. J. Stat. 2018, 9, 47–57. [Google Scholar]
  8. Mazucheli, J.; Menezes, A.F.B.; Ghitany, M.E. The unit-Weibull distribution and associated inference. J. Appl. Probab. Stat. 2018, 13, 1–22. [Google Scholar]
  9. Mazucheli, J.; Menezes, A.F.B.; Chakraborty, S. On the one parameter unit-Lindley distribution and its associated regression model for proportion data. J. Appl. Stat. 2019, 46, 700–714. [Google Scholar] [CrossRef] [Green Version]
  10. Ghitany, M.E.; Mazucheli, J.; Menezes, A.F.B.; Alqallaf, F. The unit-inverse Gaussian distribution: A new alternative to two-parameter distributions on the unit interval. Commun. Stat. Theory Methods 2019, 48, 3423–3438. [Google Scholar] [CrossRef]
  11. Mazucheli, J.; Menezes, A.F.; Dey, S. Unit-Gompertz distribution with applications. Statistica 2019, 79, 25–43. [Google Scholar]
  12. Altun, E. The log-weighted exponential regression model: Alternative to the beta regression model. Commun. Stat. Theory Methods 2021. [Google Scholar] [CrossRef]
  13. Altun, E.; Cordeiro, G.M. The unit-improved second-degree Lindley distribution: Inference and regression modeling. Comput. Stat. 2020, 35, 259–279. [Google Scholar] [CrossRef]
  14. Korkmaz, M.Ç. A new heavy-tailed distribution defined on the bounded interval: The logit slash distribution and its application. J. Appl. Stat. 2020, 47, 2097–2119. [Google Scholar] [CrossRef]
  15. Korkmaz, M.Ç. The unit generalized half normal distribution: A new bounded distribution with inference and application. Univ. Politeh. Buchar. Sci. Bull. Ser. Appl. Math. Phys. 2020, 82, 133–140. [Google Scholar]
  16. Gündüz, S.; Korkmaz, M.Ç. A New Unit Distribution Based On The Unbounded Johnson Distribution Rule: The Unit Johnson SU Distribution. Pak. J. Stat. Oper. Res. 2020, 16, 471–490. [Google Scholar] [CrossRef]
  17. Figueroa-Zu, J.I.; Niklitschek-Soto, S.A.; Leiva, V.; Liu, S. Modeling heavy-tailed bounded data by the trapezoidal beta distribution with applications. Revstat 2021. Available online: https://www.ine.pt/revstat/pdf/ModelingBoundedDataWithHeavyTails.pdf (accessed on 10 January 2021).
  18. Bantan, R.A.R.; Chesneau, C.; Jamal, F.; Elgarhy, M.; Tahir, M.H.; Aqib, A.; Zubair, M.; Anam, S. Some new facts about the unit-Rayleigh distribution with applications. Mathematics 2020, 8, 1954. [Google Scholar] [CrossRef]
  19. Koenker, R.; Bassett, G., Jr. Regression quantiles. Econom. J. Econom. Soc. 1978, 46, 33–50. [Google Scholar] [CrossRef]
  20. Fischer, M.J. Generalized Hyperbolic Secant Distributions: With Applications to Finance; Springer-Verlag Berlin and Heidelberg GmbH & Co. KG: Berlin, Germany, 2013. [Google Scholar]
  21. Shaked, M.; Shanthikumar, J.G. Stochastic Orders; Wiley: New York, NY, USA, 2007. [Google Scholar]
  22. Cheng, R.C.H.; Amin, N.A.K. Maximum Product of Spacings Estimation with Application to the Lognormal Distribution; Math Report; University of Wales Institute of Science and Technology: Cardiff, Wales, 1979; p. 79-1. [Google Scholar]
  23. Ferrari, S.; Cribari-Neto, F. Beta regression for modelling rates and proportions. J. Appl. Stat. 2004, 31, 799–815. [Google Scholar] [CrossRef]
  24. Bayes, C.L.; Bazán, J.L.; García, C. A new robust regression model for proportions. Bayesian Anal. 2012, 7, 841–866. [Google Scholar] [CrossRef]
  25. Kieschnick, R.; McCullough, B.D. Regression analysis of variates observed on (0, 1): Percentages, proportions and fractions. Stat. Model. 2003, 3, 193–213. [Google Scholar] [CrossRef] [Green Version]
  26. Migliorati, S.; Di Brisco, A.M.; Ongaro, A. A new regression model for bounded responses. Bayesian Anal. 2018, 13, 845–872. [Google Scholar] [CrossRef]
  27. Galarza, C.E.; Zhang, P.; Lachos, V.H. Logistic quantile regression for bounded outcomes using a family of heavy-tailed distributions. Sankhya B 2020, 1–25. [Google Scholar] [CrossRef]
  28. Bayes, C.L.; Bazán, J.L.; De Castro, M. A quantile parametric mixed regression model for bounded response variables. Stat. Its Interface 2017, 10, 483–493. [Google Scholar] [CrossRef]
  29. Mitnik, P.A.; Baek, S. The Kumaraswamy distribution: Median-dispersion re-parameterizations for regression modeling and simulation-based estimation. Stat. Pap. 2013, 54, 177–192. [Google Scholar] [CrossRef]
  30. Mazucheli, J.; Menezes, A.F.B.; Fernandes, L.B.; de Oliveira, R.P.; Ghitany, M.E. The unit-Weibull distribution as an alternative to the Kumaraswamy distribution for the modeling of quantiles conditional on covariates. J. Appl. Stat. 2020, 47, 954–974. [Google Scholar] [CrossRef]
  31. Gallardo, D.I.; Gómez-Déniz, E.; Gómez, H.W. Discrete generalized half-normal distribution and its applications in quantile regression. Sort-Stat. Oper. Res. Trans. 2020, 265–284. [Google Scholar] [CrossRef]
  32. Jodra, P.; Jiménez-Gamero, M.D. A quantile regression model for bounded responses based on the exponential-geometric distribution. Revstat-Stat. J. 2020, 18, 415–436. [Google Scholar]
  33. Korkmaz, M.Ç.; Chesneau, C.; Korkmaz, Z.S. Transmuted unit Rayleigh quantile regression model: Alternative to beta and Kumaraswamy quantile regression models. Univ. Politeh. Buchar. Sci. Bull. Ser. Appl. Math. Phys. 2021. to appear. [Google Scholar]
  34. Sánchez, L.; Leiva, V.; Galea, M.; Saulo, H. Birnbaum-Saunders quantile regression models with application to spatial data. Mathematics 2020, 8, 1000. [Google Scholar] [CrossRef]
  35. Henningsen, A.; Toomet, O. maxLik: A package for maximum likelihood estimation in R. Comput. Stat. 2011, 26, 443–458. [Google Scholar] [CrossRef]
  36. Dunn, P.K.; Smyth, G.K. Randomized quantile residuals. J. Comput. Graph. Stat. 1996, 5, 236–244. [Google Scholar]
  37. Cox, D.R.; Snell, E.J. A general definition of residuals. J. R. Stat. Soc. Ser. (Methodol.) 1968, 30, 248–265. [Google Scholar] [CrossRef]
  38. Murthy, D.P.; Xie, M.; Jiang, R. Weibull Models; John Wiley & Sons: Hoboken, NJ, USA, 2004; Volume 505. [Google Scholar]
  39. Silva, R.B.; Bourguignon, M.; Dias, C.R.; Cordeiro, G.M. The compound class of extended Weibull power series distributions. Comput. Stat. Data Anal. 2013, 58, 352–367. [Google Scholar] [CrossRef] [Green Version]
  40. Genç, A.A.; Korkmaz, M.Ç.; Kus, C. The Beta Moyal-Slash Distribution. J. Selçuk Univ. Nat. Appl. Sci. 2014, 3, 88–104. [Google Scholar]
  41. Pammer, K.; Kevan, A. The Contribution of Visual Sensitivity, Phonological Processing and Non-Verbal IQ to Children’s Reading; The Australian National University: Canberra, Australia, 2004; Unpublished manuscript. [Google Scholar]
  42. Cribari-Neto, F.; Zeileis, A. Beta regression in R. J. Stat. Softw. 2010, 34, 1–24. [Google Scholar] [CrossRef] [Green Version]
  43. Smithson, M.; Verkuilen, J. A better lemon squeezer? Maximum-likelihood regression with beta-distributed dependent variables. Psychol. Methods 2006, 11, 54. [Google Scholar] [CrossRef] [Green Version]
Figure 1. The possible pdf and hrf shapes of the A S H N distribution.
Figure 1. The possible pdf and hrf shapes of the A S H N distribution.
Symmetry 13 00117 g001
Figure 2. The skewness and kurtosis plots of the A S H N distribution.
Figure 2. The skewness and kurtosis plots of the A S H N distribution.
Symmetry 13 00117 g002
Figure 3. The results related to μ (top) and σ (bottom) for the first simulation study.
Figure 3. The results related to μ (top) and σ (bottom) for the first simulation study.
Symmetry 13 00117 g003
Figure 4. The results related to μ (top) and σ (bottom) for the second simulation study.
Figure 4. The results related to μ (top) and σ (bottom) for the second simulation study.
Symmetry 13 00117 g004
Figure 5. Estimated CPs for the first (left) and second (right) simulation studies.
Figure 5. Estimated CPs for the first (left) and second (right) simulation studies.
Symmetry 13 00117 g005
Figure 6. The pdf shapes of the re-parameterized E A S H N distribution.
Figure 6. The pdf shapes of the re-parameterized E A S H N distribution.
Symmetry 13 00117 g006
Figure 7. The fitted plots for the first dataset.
Figure 7. The fitted plots for the first dataset.
Symmetry 13 00117 g007
Figure 8. The plots of the PLL functions for the first dataset.
Figure 8. The plots of the PLL functions for the first dataset.
Symmetry 13 00117 g008
Figure 9. The fitted plots for the second dataset.
Figure 9. The fitted plots for the second dataset.
Symmetry 13 00117 g009
Figure 10. The plots of the PLL functions for the second dataset.
Figure 10. The plots of the PLL functions for the second dataset.
Symmetry 13 00117 g010
Figure 11. The QQ plot of the randomized quantile residuals.
Figure 11. The QQ plot of the randomized quantile residuals.
Symmetry 13 00117 g011
Figure 12. The PP plots of the Cox-Snell residuals based on the regression application.
Figure 12. The PP plots of the Cox-Snell residuals based on the regression application.
Symmetry 13 00117 g012
Table 1. MLEs, SEs of the estimates (in parentheses), ^ and goodness-of-fits statistics for the first dataset (p-value is given in [ · ] ).
Table 1. MLEs, SEs of the estimates (in parentheses), ^ and goodness-of-fits statistics for the first dataset (p-value is given in [ · ] ).
Model μ ^ σ ^ ^ AIC BIC A * W * KS
ASHN2.91790.432233.2443−62.4885−60.49701.18500.16640.1746
(0.0966)(0.0684) [0.5754]
Beta3.112621.824527.8813−51.7626−49.77112.26110.37260.2537
(1.0287)(7.7997) [0.1521]
Kw1.587721.867325.6484−47.2968−45.30542.68890.46810.2626
(0.3966)17.9755 [0.1265]
Johnson S B 3.89521.860531.3599−58.7198−56.72831.55310.23070.2039
(0.6554)(0.2942) [0.3765]
Table 2. Some summary statistics of the second dataset.
Table 2. Some summary statistics of the second dataset.
MinimumMeanMedianMaximumVarianceSkewnessKurtosisn
0.02400.05670.05150.17800.00072.711712.017336
Table 3. MLEs, SEs of the estimates (in parentheses), ^ and goodness-of-fits statistics for the second dataset (p-value is given in [ · ] ).
Table 3. MLEs, SEs of the estimates (in parentheses), ^ and goodness-of-fits statistics for the second dataset (p-value is given in [ · ] ).
Model μ ^ σ ^ ^ AIC BIC A * W * KS
ASHN3.64220.379190.1076−176.2152−173.04810.59630.08950.1261
(0.0632)(0.0447) [0.6162]
Beta5.856997.145886.9760−169.9519−166.78481.11520.17680.1636
(0.5166)(6.2564) [0.2903]
Kw2.1577373.387882.0487−160.0975−156.93052.20410.36510.1916
(0.0648)8.4525 [0.1422]
Johnson S B 7.11492.460889.6573−175.3146−172.14760.66660.10080.1322
(0.8440)(0.2864) [0.5554]
Table 4. The results of the E A S H N and unit Weibull regression models with model selection criteria.
Table 4. The results of the E A S H N and unit Weibull regression models with model selection criteria.
ParametersEASHNUnit-Weibull
EstimateSEp-ValueEstimateSEp-Value
β 0 2.28100.0025<0.0012.40450.2589<0.001
β 1 −1.04900.0028<0.001−1.33620.37510.0003
β 2 0.59180.00001<0.0010.48370.24530.0486
α 0.12600.00001<0.0010.97950.1193<0.001
37.946637.3185
AIC−67.8934−66.6369
BIC−60.7566−59.5001
Table 5. The goodness-of-fit results of the randomized quantile residuals for the regression models.
Table 5. The goodness-of-fit results of the randomized quantile residuals for the regression models.
ModelsKSp-Value A * p-Value W * p-Value
EASHN0.08490.90930.42110.82670.05020.8775
Unit-Weibull0.11590.59550.49890.74700.07200.7419
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Korkmaz, M.Ç.; Chesneau, C.; Korkmaz, Z.S. On the Arcsecant Hyperbolic Normal Distribution. Properties, Quantile Regression Modeling and Applications. Symmetry 2021, 13, 117. https://doi.org/10.3390/sym13010117

AMA Style

Korkmaz MÇ, Chesneau C, Korkmaz ZS. On the Arcsecant Hyperbolic Normal Distribution. Properties, Quantile Regression Modeling and Applications. Symmetry. 2021; 13(1):117. https://doi.org/10.3390/sym13010117

Chicago/Turabian Style

Korkmaz, Mustafa Ç., Christophe Chesneau, and Zehra Sedef Korkmaz. 2021. "On the Arcsecant Hyperbolic Normal Distribution. Properties, Quantile Regression Modeling and Applications" Symmetry 13, no. 1: 117. https://doi.org/10.3390/sym13010117

APA Style

Korkmaz, M. Ç., Chesneau, C., & Korkmaz, Z. S. (2021). On the Arcsecant Hyperbolic Normal Distribution. Properties, Quantile Regression Modeling and Applications. Symmetry, 13(1), 117. https://doi.org/10.3390/sym13010117

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop