A generalization of Tukey's g−h family of distributions 1 - Atlantis Press

Loading...

Journal of Statistical Theory and Applications, Vol. 14, No. 1 (March 2015), 28-44

A generalization of Tukey’s g − h family of distributions J.A. Jim´eneza , V. Arunachalamb and G.M. Sernac a

Department of Mathematics, Universidad Nacional de Colombia, Bogot´a, Colombia. b Department of Statistics, Universidad Nacional de Colombia, Bogot´ a, Colombia. c Department of Business Studies, University of Alcal´ a de Henares, Espa˜na. a [email protected],b [email protected],c [email protected]

Received 30 May 2013 Accepted 12 December 2014

A new class of distribution function based on the symmetric densities is introduced, these transformations also produce nonnormal distributions and its pdf and cd f can be expressed in parametric form. This class of distributions depend on the two parameters, namely g and h which controls the skewness and the elongation of the tails, respectively. This class of skewed distributions is a generalization of Tukey’s g − h family of distributions. In this paper, we calculate a closed form expression for the density and distribution of the Tukey’s g − h family of generalized distributions, which allows us to easily compute probabilities, moments and related measures. Keywords: Tukey’s g − h family of distributions, generalized error distribution, Lambert’s function, Fourier transform. MSA 2010: 60E05, 62E15

1. Introduction On many occasions, statistical data show asymmetry, indicating some kind of skewness. This is of the case of actuarial and financial data, which have characteristic asymmetrically distributed structures with extreme values yielding heavier tails. For example, the probability distributions of financial asset returns are not normally distributions, but usually have asymmetry and leptokurtosis. The most important and useful characteristic of the Tukey’s g − h family of distributions is that it covers most of the pearsonian family of distributions, and also can generate several known distributions, for example lognormal, Cauchy, Exponential, Chi-squared (see Mart´ınez & Iglewicz (1984)). Tukey’s g − h family of distributions has been used in the context of statistical, simulation studies that include such topics as financial markets Badrinath & Chatterjee (1988), Mills (1995), and Badrinath & Chatterjee (1991) have used the g and h to model the return on a stock index, also the return on shares in several markets. Dutta & Babbel (2004) showed that the skewed and leptokurtic behavior of LIBOR was modeled effectively using the distribution g − h. Dutta & Babbel (2005) used g and h to model interest rates and options on interest rates, while Dutta & Perry Published by Atlantis Press Copyright: the authors 28

J. A. Jim´enez and V. Arunachalam and G. Serna

(2007) used the g − h to estimate operational risk; Tang & Wu (2006) studied the portfolio management. Jim´enez & Arunachalam (2011) provided the explicit expressions of skewness and kurtosis for VaR and CVaR calculations. They propose the use of Tukey’s classical g and h transformations applied to the normal distribution to capture these distributional features. In this paper, we propose a generalization of Tukey’s g − h family of distributions, when the standard normal variate is replaced by a continuous random variable U with mean 0 and variance 1. The attraction of this family of distribution is that from a symmetric variate with probability density function (pd f ), a large class of distributions can be generated with the parameters g and h which controls the skewness and the elongation of the tails. This new class of distribution allows us to models with large kurtosis measures and will useful in financial and other application in asymmetrical distributions. The paper is organized as follows: Section 2 presents the Tukey’s g − h family of generalized distributions. Section 3 presents its statistical properties: pdf, cumulative distribution function (cd f ), expressions for the nth moment and quantile-based measures of skewness and kurtosis are derived. Section 4 introduces very briefly the g generalized distribution and its moments. Section 5 explains the adjustment methodology based on real data, i.e., we demonstrate how the g − h can be used to simulate or model combined data sets when only the mean, variance, skew, and kurtosis associated with the underlying individual data sets are available. Finally, conclusion are presented. 2. Tukey’s g − h family of generalized distributions

Tukey (1977) introduced a family of distributions by two nonlinear transformations called the g − h distributions, which is defined by 1 Y = Tg,h (Z) = (exp{gZ} − 1) exp{hZ 2 /2} g

with g 6= 0 , h ∈ R

(2.1)

where the distribution of Z is standard normal. When these transformations are applied to a continuous random variable normalized U , i.e., with mean 0 and variance 1, such that its pdf fU (·) is symmetric about the origin and cdf FU (·), the transformation Tg,h (U ) is obtained, which henceforth will be termed Tukey’s g − h generalized distribution: 1 Y = Tg,h (U ) = (exp{gU } − 1) exp{hU 2 /2} g

with g 6= 0 , h ∈ R.

(2.2)

The parameters g and h represent the skewness and the elongation of the tails of the Tukey’s g − h generalized distribution, respectively. In this paper, for h 6= 0, we assume that the random variable U has a Generalized Error Distribution of parameter α , denoted U ∼ GED(α ), with pdf given by  1 1 u α fU (u, α ) = exp − , u ∈ R, 0 < α ≤ 1, (2.3) 2λ Γ (α + 1) λ q Γ(α ) 1 where λ = Γ(3 α ) and Γ (·) is the gamma function, α is a tail-thickness parameter. When α = 2  √  then U ∼ N (0, 1) and when α = 1 then U ∼ Laplace 0, 22 , which are symmetric with standardized skewness of zero and standardized kurtosis of 3 and 6, respectively. Also,  we present for 1 h = 0 five special cases   of the Tukey’s g − h distributions, when U ∼ GED 2 , U ∼ GED (1) ,

U ∼ Logistic 0,



3 π

, the hyperbolic secant (HyperSec) and the hyperbolic cosecant (HyperCsc).

Published by Atlantis Press Copyright: the authors 29

A generalization of Tukey’s g − h family of distributions

When we assume h = 0 in (2.2) the Tukey’s g − h generalized distribution reduces to 1 Tg,0 (U ) = (exp(gU ) − 1) g

(2.4)

 which is said to be Tukey’s g generalized distribution. When U ∼ GED 12 its distribution also known as the family of lognormal distributions, because they have a lengthening of the tails than the standard normal distribution and they are skewed as well. Similarly, when g goes to 0 the Tukey’s g − h generalized distribution is given by T0,h (U ) = U exp{hU 2 /2}

(2.5)

known as the Tukey’s h generalized distribution. This distribution has the characteristic of being symmetrical but with tails heavier than the distribution of a random variable U with increasing value of the parameter h. If we wish to model an arbitrary random variable X using the transformation given in (2.2), we introduce two new parameters, A (location) and B (scale) and propose the following model X =A + BY

Y =Tg,h (U ).

with

(2.6)

We must estimate four parameters that satisfy either of the following relationships: x p =A + By p ,

and

x1−p =A − B exp{−gu p }y p .

(2.7)

where p > 0.5 and x p is the p−th quantile of the random variable X , such that x p = inf{x|P[X ≤ x] > p} = sup{x|P[X < x] ≤ p}.

Quantile p−value is the median, quartiles, eighth digit. Hoaglin et al. (1985) refer to them as the letter values, respectively, for the M (median), F (fourths), E (eighths), etc. The estimation of parameters of Tukey’s g − h family of generalized distributions can be obtained using the method of moments Majumder & Ali (2008) or with the method of quantiles proposed by Hoaglin (1985). 3. Statistical properties of the Tukey’s g − h family

In this section we discuss the statistical properties Tukey’s g − h family of generalized distributions. 3.1. Density function In Jim´enez (2004) using the inverse function theorem provides the following relation ′ 1 1 d FU−1 (FU (u p )) = up = ′ = dp FU (u p ) fU (u p )

(3.1)

where p is the only number that satisfies FU (u p ) = p and fU (·) is the pdf of the continuous random variable U. The pdf for the Tukey’s g − h generalized distribution is obtained by using the following result tg,h (y p ) =

fU (u p ) ′ (u ) Tg,h p

whenever

|h|u p

e−gu p − 1 < 1, g

(3.2)

where y p and u p denote the p−th quantile of the transformation Y = Tg,h (U ) and the continuous random variable U , respectively. From equation (2.7) and using the expression (3.1) (Jim´enez & Published by Atlantis Press Copyright: the authors 30

J. A. Jim´enez and V. Arunachalam and G. Serna

Mart´ınez (2006)) obtained the pdf for the random variable X as follows: fX (x p ) = fX (A + By p ) =

1 tg,h (y p ). |B|

(3.3)

The parameter g controls the skewness with positive values of g generate positive skewness and negative values generate negative skewness and g = 0 corresponds to symmetry. 3.2. Cumulative distribution function We now proceed to find the cdf of the Tukey’s g − h family of generalized distributions, denote by Fg,h (y) . The following equality can be easily verified : Z b a

tg,h (u) du =

Z T −1 (b) g,h −1 Tg,h (a)

    −1 −1 fU (v) du = FU Tg,h (b) − FU Tg,h (a) ,

(3.4)

−1 where Tg,h (·) is the inverse of the transformation given in (2.2) and FU (·) is the cdf of the continuous random variable U. There is no explicit form for the inverse of the transformation of Tg,h (U ). However we get the inverse transformation when h = 0 or g = 0 as given below,

• If h = 0 then Tg,0 (U ) is given by (2.4) and 1 −1 Tg,0 (y) = ln (1 + gy) , g

gy > −1.

(3.5)

• If g = 0 then T0,h (U ) is given by (2.5), it must be  hY 2 =h [T0,h (U )]2 = hU 2 exp hU 2 ,

(3.6)

the expression (3.6) is of the form u = w exp{w}, where w = W(z) is the Lambert’s function. Then the solution of (3.6) is given by  hU 2 =W hy2



−1 T0,h (y) =

r

1 W (hy2 ). h

(3.7)

The basic properties of the function W(z) are given in Olver et al. (2010). Though the inverse of the transformation of Tg,h (U ) cannot be evaluated analytically, it can be evaluated numerically.

3.3. Measures of skewness and kurtosis Since the transformation given in (2.2) is simply a quantile-based distribution, we use quantile-based measures of skewness (SK) and kurtosis (KR). For 0.5 < p < 1 the measure proposed by Hinkley Published by Atlantis Press Copyright: the authors 31

A generalization of Tukey’s g − h family of distributions

(1975) is given bya SK2 (p) =

ng o U HS p /LHS p − 1 exp{gu p } − 1 = = tanh up , U HS p /LHS p + 1 exp{gu p } + 1 2

(3.8)

where U HS p = x p − x0.5 and LHS p = x0.5 − x1−p , denote the p-th upper half-spread and lower halfspread, respectively (Hoaglin et al. (1985)). Note that this expression only depends on the parameter g. For fixed p one can have values of SK2 (p) varying values of g as is illustrated in Figure 1 Coefficient of skewness for p= 0.975 1

0.8

0.6

Values of SK2 (p)

0.4

0.2

0

−0.2

−0.4 Normal Laplace

−0.6

−0.8

−1 −2

−1.5

−1

−0.5

0 Values of g

0.5

1

1.5

2

Fig. 1. Measure of skewness SK2 (p)

When U ∼ GED obtain

1 2



we use the measure of skewness given in Groeneveld & Meeden (1984) to n n o o g2 g2 1 − exp − 12 1−h 1 − exp − 21 1−h   SK3 = = nq o. g 2 √g −1 2Φ √1−h tanh π 1−h

Here we use the expression given in Tocher (1964). Note that this last expression depends on two parameters g, h which is zero when g = 0. Also Groeneveld & Meeden (1984) present four properties that any reasonable coefficient of skewness must satisfy. Furthermore, assuming that U ∼ GED 21 measure of kurtosis presented in Hogg (1974) we would read KR2 (p; q) =

U p − Lp , U q − Lq

where " # ∗) Φ (δ1s ) − Φ (δ2s 1 Φ (δ2s ) − Φ (δ1s ) ∗  U s − Ls = µg,h Φ (δ2s ) + − µg,h Φ (δ2s ) + ∗ s (1 − h) (δ2s − δ1s ) (1 − h) δ1s − δ2s  ∗ ) µg,h 1/s Φ (δ2s ) − Φ (δ1s ) Φ (δ1s ) − Φ (δ2s ∗ = (Φ (δ2s ) − Φ (δ2s )) + + , ∗ s 1−h δ2s − δ1s δ1s − δ2s a SK 1

and KR1 are the standardized values for skewness and kurtosis, respectively.

Published by Atlantis Press Copyright: the authors 32

J. A. Jim´enez and V. Arunachalam and G. Serna

where

δ1s =



g δ2s = δ1s + √ , 1−h

1 − hzs ,

∗ = δ1s − √ δ2s

g 1−h

Making use of the measure for kurtosis in Crow & Siddiqui (1967) for p > q > 0.5 we have ( sinh(gu )  h 2 p  − u2q if g 6= 0, sinh(guq ) exp 2 u p KR3 p; q = u p h 2 2 if g = 0. uq exp 2 u p − uq 3.4. Moments of the Tukey’s g − h family of generalized distribution

The next two propositions spell out the moments of the Tukey’s g − h family of generalized distributions. The corresponding proofs are given Appendix A. Proposition 3.1. The m−th power the Tukey’s g − h family of generalized distribution is given by   m m−1 k m−1 m m Tge,eh (U ), m ≥ 1, (3.9) Y =Tg,h (U ) = m−1 ∑ (−1) k g k=0 where ge = (m − k)g and e h = mh.

Proposition 3.2. Let U be a continuous random variable with pdf fU (u) and cdf FU (u). If FU′ (u) is never zero, then FU−1 (u) is differentiable and satisfies

µn′ = E (U n ) =

Z ∞

−∞

wn fU (w) dw =

where q is the unique value that satisfies FU (uq ) = q.

Z 1 0

n FU−1 (q) dq,

(3.10)

Proposition 3.3. Let Y = Tg,h (U ) be transformation given in (2.2), then the n−th moments of the random variable Y are given by  o n ∞ n  k n R 2  gu) exp 12 e hu2 fU (u)du if g 6= 0,  gn ∑ ( 1) k cosh (e k=0 0 o n µn′ = (3.11) R∞ n  1 n  hu2 fU (u)du if g = 0,  [1 + ( 1) ] u exp 2 e 0

where ge = (n − k) g and e h = nh.

Proof. Using the expression (3.10) when g 6= 0 we obtain E (Y n ) =

Z 1 0

  −1 fU Tg,h (y)   dy. Yqn dq = yntg,h (y) dy = yn −1 ′ −∞ −∞ Tg,h Tg,h (y) Z ∞

Z ∞

Making the following change of variable −1 u =Tg,h (y)

du =

Published by Atlantis Press Copyright: the authors 33

dy  , −1 ′ Tg,h Tg,h (y)

(3.12)

A generalization of Tukey’s g − h family of distributions

and using the expression (3.9) we have Z ∞ n−1 Tge,eh (u) fU (u)du E (Y ) = n−1 ∑ (−1) k g −∞ k=0  Z ∞   2 n 1e 2 k n = n ∑ (−1) cosh (e gu) exp hu fU (u)du, k 0 g k=0 2 n

n

n−1

k



where ge = (n − k) g and e h = nh. In the latter term, we used that fU (u) is a function symmetrical about the origin. 3.4.1. Special cases of moments In general, when the continuous random variable U is symmetrically distributed about the origin, then the moment generating function (mg f ) can be written as follows tU

MU (t) = E e



=2

Z ∞ 0

cosh (tu) fU (u)du,

(3.13)

and the characteristic function for the random variable U is given by Z  ΨU (t) = E eitU = 2

∞ 0

cos (tu) fU (u)du,

(3.14)

√ where i is the imaginary quantity whose value is equal to −1. Since that fU (u) is an even function, then the Fourier integral representation of fU (u) may be written as fU (u) =

Z ∞ 0

A (t) cos (ut) dt,

with

1 A (t) = ΨU (t) . π

Using the Fourier frequency convolution theorem we can write 2

Z ∞ 0

2 − |h| 2 t

cos (gt) fU (t)e



    |h| 2 1 g2 dt = F fU (t) exp − t =p exp − ∗ F [ fU (t)] , 2 2|h| 2|h|π

where ∗ denotes convolution. The expression (3.13) allows us to obtain the moments of Tukey’s g − h distribution. However moments of some orders do not exist for a certain range of values of the parameter h, considering that we have the following cases:  (1) Supposing that U ∼ GED 12 and h < n1 , we have    n  1 k n M  √n−k g g 6= 0 √ ( 1)  ∑ U k  1−nh  gn 1 − nh k=0 n n 1) Γ(n) 1 + ( E (Y ) = g=0  h i n+1   Γ(n/2) 2  2 n2 1 − e h

(3.15)

where MU (t) is the mg f of a standard normal random variable and Γ(·) is the Gamma function. This expression is consistent with those obtained by Mart´ınez & Iglewicz (1984). Published by Atlantis Press Copyright: the authors 34

J. A. Jim´enez and V. Arunachalam and G. Serna

(2) When U ∼ GED (1) and h < 0, we haveb     q n−1 α2  n π 1 k   ∑ ( 1) k exp 2n,k Φ ( αn,k ) + n  g n|h|    k=0     q   2 1 βn,k  2 n  n|h|  exp 2 Φ (βn,k ) + 2( 1) e Φ , g 6= 0, n|h| ′ µn =   √ n n  k   1 1+( 1)n  2  √ e n|h| ∑ nk −1 Γ k+1  2 n|h|  2 n|h|  k=0    1  R n|h| 1  (k−1) −u  − 0 u2 e du , g = 0, 

(3.16)

where αn,k and βn,k are the larger and smaller roots respectively, of the quadratic equation n |h| r2 − 2 n − k

p

2 n |h|gr + n − k g2 − 2 = 0.

(3.17)

Expression (3.16) was wrongly calculated in Klein & Fischer (2002). From the preceding equations we obtain the expected value µ for g 6= 0: (1) Assumes that U ∼ GED then we must

1 2



. Using the expression (3.15) with n = 1 for calculated E[Y ],   1 g2 1 2 1−h √ E [Y ] = −1 . e g 1−h

(3.18)

(2) Assuming in the expression (2.6) that the variable U ∼ GED (1) , h < 0 and using the expression (3.16) with n = 1, we obtain L µg,h

1 = g

r

" 1 1 2 1 2 π e 2 α1,0 Φ ( α1,0 ) + e 2 β1,0 Φ (β1,0 ) − 2e |h| Φ |h|

s

2 |h|

!#

,

(3.19)

where α1,0 and β1,0 be the larger and smaller roots of the quadratic equation given in (3.17), respectively. 4. The g generalized distribution The g generalized distribution given by equation (2.4) is a nonlinear transform of a continuous random variable U and is parameterized by g. This subfamily contains distributions whose skewness increases when the value of the parameter g increases. This subfamily of distributions to help them get to have great importance in the statistical analysis to be a suitable means to study skewed distributions. Its distributional form includes only the parameter g which fixes the amount and direction of skewness. b Appendix

B contains the respective proof of this expression.

Published by Atlantis Press Copyright: the authors 35

A generalization of Tukey’s g − h family of distributions

Now, we give below an empirical rule for a random variable X which can be expressed as (2.6) with Y = Tg,0 (U ), xp − θ θ − x0.5 = x0.5 − θ θ − x1−p

for all p > 0.5.

(4.1)

In particular, the expression (4.1) is satisfied if

θ = A − sgn(g)

B , |g|

(4.2)

where sgn(·) denote the signum function. The constant θ relates to the location and scale parameters, known as “threshold parameter” and was given by Hoaglin et al. (1985). Taking h = 0 in expression (3.2) and replacing the expression (3.5) we get that 1 fU tg,0 (y) = 1 + gy



 ln (1 + gy) , g

gy > −1.

(4.3)

Moreover, if we solve for the variable y in equation (2.7) by substituting the expression given in (4.3), we obtain tg,0



x−A B

Since g ∈ R then



= fU



    1 x−A x − A −1 ln 1 + 1+ g B/g B/g

x−A > −1 B/g

     1 B   f ln (x − θ ) − ln U g g     B  if g > 0 x−A g x − θ      tg,0 = 1 B  B fU |g| ln |g| − ln (θ − x)   B  if g < 0 |g| θ −x

(4.4)

B where |g| > 0, for simplicity and without loss of generality we assume g > 0 and we replace the expression (4.2) and if we use the result given in (3.3), which relates the pdf of X and Y = Tg,h (Z) on the quantiles, we can rewrite (4.4) as follows

1 fX (x) = fU g (x − θ )

where µ ∗ = ln



 1 ∗ (ln (x − θ ) − µ ) g

x > θ,

(4.5)

  B g

. We say that the random variable X has a log-symmetric distribution with  threshold parameter θ , scale parameter µ ∗ and shape parameter g, denoted by X ∼ LS µ ∗ , g, θ . If  θ = 0 we denote by X ∼ LS µ ∗ , g . The cdf of the random variable X given by FX (x) =FU



 1 ∗ (ln (x − θ ) − µ ) , g

x > θ.

Expression (4.5) allows us to obtain the following pdf associated with the Tukey’s g function. Published by Atlantis Press Copyright: the authors 36

(4.6)

J. A. Jim´enez and V. Arunachalam and G. Serna

4.1. Special cases (1) If U ∼ GED

1 2



and g 6= 0, we have that

(   ) 1 ln (x − θ ) − µ ∗ 2 1 fX (x) = √ exp − , 2 g 2π g (x − θ )

(4.7)

where µ ∗ = ln(µX − θ )− 21 g2 and x > θ . Note that when θ = 0 the last expression coincides with the pdf of the classic Log Normal random variable. In this case, we say that X is Log-Normal distributed with three parameters µX , g and θ . Many practical applications of this distribution are discussed in the literature, for example, Aitchison & Brown (1963) and Crow & Shimizu (1988). √ (2) When U ∼ GED (1) and 0 < g < n2 , the resulting distribution is given by the pdf (  x−θ β −1 β ,θ


√ 2 g



and (ε − θ ) = (µX − θ ) 1 − . Note again that this expression coincides where β = with the pdf of log-Laplace with three parameters µX , g and θ .  λ π −1 (3) If U ∼ Logistic 0, λ , 0 < g < n and λ = √3 , then the pdf of X can be expressed as 1 β2

    π π x − θ α −1 π x − θ −2α , (4.9) fX (x) = 1+ ε −θ α ε −θ α ε −θ √  where α = λg y (ε − θ ) = (µX − θ ) sin 3g . Note that this expression coincides with the pdf of three parameters Log-Logistic (µX , g and θ ).

Taking the expectation of the linear transformation given in equation (2.6) we obtain B   B E (X − θ ) = E egU = MU (g) g g



B =g

E (X ) − θ , MU (g)

where MU (g) is the mg f of the random variable U . The nth moment of the random variable X could be obtained using the formula   n n ∗ k n E [(X − E [X ]) ] = µn (X ) = exp{nµ } ∑ (−1) MU (e g) MUk (g) , k k=0 note that these expressions do not depend on the parameter θ . Thus, the standardized values for skewness and kurtosis corresponding to linear transformation given by equation (2.6) with Y = Tg,0 (U ) can be expressed as SK1 (X ) = KR1 (X ) =

MU (3g) − 3MU (2g) MU (g) + 2MU3 (g) ,  3 MU (2g) − MU2 (g) 2

MU (4g) − 4MU (3g) MU (g) + 6MU (2g) MU2 (g) − 3MU4 (g) .  2 MU (2g) − MU2 (g)

Note that the above expressions depend only on the parameter g. Published by Atlantis Press Copyright: the authors 37

(4.10) (4.11)

A generalization of Tukey’s g − h family of distributions

The n-th moment of the random variable X − θ is given by  n B n E [(X − θ ) ] = MU (ng) . g

(4.12)

When we rewrite the expression (4.12) and use properties of the mg f , we obtain   B E en ln(X−θ ) = MV (n) =en ln( g ) MU (ng) = Mln( B )+gU (n), g

(4.13)

 

where V = ln (X − θ ) , then E(V ) = µV = ln Bg and Var(V ) = σV2 = g2 . When the relation (4.1) is satisfied, then h = 0 and if we assume that θ > xmin , we can conclude that the value of g is estimated by g = sgn (SK1 (X )) σV . Here SK1 (X ) denote the coefficient of skewness from the variable we want to approximate. The scale parameter is estimated by B = g exp{E(V )}. 4.2. Approximations We first assume the value of θ to be negligibly small in (4.12) to obtain  n B n MU (ng) . E (X ) = g

(4.14)

The above expression allows to obtain the various moments about the origin of the random variable X , when the distribution of U includes the normal, hyperbolic secant, hyperbolic cosecant, Logistic and Laplace, which are all symmetric  with standarized skewness of zero. In (4.14) if we let U ∼ GED 12 and g > 0, we obtain  n       B 1 2 2 B 1 2 2 n E (X ) = exp n g = exp n ln + n g . (4.15) g 2 g 2   This expression coincides with the mg f of a Normal random variable with parameters µ = ln Bg     and σ = g. By the uniqueness of the mg f , we conclude that V = ln(X ) ∼ N ln Bg , g , i.e., V is   a Lognormal random variable with parameters µ = ln Bg and σ = g. Similarly, we show that the relation between the random variables X and U presented in Table 1, for the selected set of well known symmetrical distributions. Distribution of the r.v. U

µ, a

Parameters σ, b g 6= 0

Laplace

0

Logistic

0

√ 2 2 √ 3 π

Normal

0

1

HyperSec

0

HyperCsc

0

2 π √ 2 π

0
0


2 n

√π 3n

g>0

Log-Laplace Loglogistic Lognormal

π 2n

LoghyperSec

√π 2n

LoghyperCsc

0
Distribution of the r.v. V

Parameters µ, a σ , b √ 2 ln Bg |g|   √2 3 ln Bg |g|   π ln Bg g   2 ln Bg |g|   √π 2 ln Bg π |g|

Table 1. Parameters of the pdf of the random variable V = ln(X)

Published by Atlantis Press Copyright: the authors 38

J. A. Jim´enez and V. Arunachalam and G. Serna

5. An Illustration We consider now data concerning the circumference measures (centimeters) taken from the ankle, chest, hip, neck and of 252 adult men. The data have been previously analyzed in Headrick (2010) and are available for download at http://lib.stat.cmu.edu/datasets/bodyfat. The following table presents the statistics for these data. Variable Ankle Chest Hip Neck

Mean

St. Dev.

SK1

KR1

JB test

23.1024 100.8242 99.9048 37.9921

1.6949 8.4305 7.1641 2.4309

2.2417 0.6775 1.4882 0.5493

14.6858 3.9441 10.3002 5.6422

1631.8565 28.4092 647.4181 85.2964

Table 2. Summary Descriptive Statistics

By using the test proposed by Jarque & Bera (1987), the statistics in Table 2 clearly indicate that  the distribution of each of the variables can not be normal random variable. When U ∼ GED 21 , the g and h parameter estimates result in a fitted distribution matching the sample moments Variable Ankle Chest Hip Neck

A

B

g

h

Mean

St. Dev.

SK1

KR1

22.7282 99.9523 98.9181 37.8553

1.2843 8.0301 5.7427 2.0760

0.5125 0.2117 0.2933 0.1143

0.0376 0.0082 0.0846 0.0871

23.1016 100.8225 99.9028 37.9918

1.6915 8.4138 7.1498 2.4261

2.2417 0.6775 1.4882 0.5493

14.6858 3.9441 10.3003 5.6422

Table 3. Estimation results

When U ∼ GED (1) , the g and h parameter estimates result in a fitted distribution matching the sample moments Variable Ankle Chest Hip Neck

A

B

g

h

Mean

St. Dev.

SK1

KR1

22.8330 100.0895 99.1886 37.8884

1.5613 9.6635 6.9025 2.4850

0.3349 0.1771 0.2040 0.0856

-0.0273 -0.0721 -0.0098 -0.0122

23.0878 100.8069 99.8867 37.9914

1.6915 8.4137 7.1498 2.4261

2.2417 0.6775 1.4882 0.5493

14.6858 3.9441 10.3001 5.6422

Table 4. Estimation results

Inspection of these tables indicates that both the Normal g − h and Laplace g − h pdfs provide good approximations to the empirical data. Figures 2 for Hip and Neck, respectively, shows such a histogram and the pdfs indicates that the two transformations will produce similar approximations for this particular set of sample statistics.  Since the value of h for variable Chest when U ∼ GED 21 is very small, we assume this parameter equal to zero, to illustrate the process of adjusting using Tukey’s g generalized family of distributions, we assume zero to approximate g by Tukey’s generalized. Published by Atlantis Press Copyright: the authors 39

A generalization of Tukey’s g − h family of distributions Hip of 252 Men: histogram

Neck of 252 Men: histogram

120

90 Histogram Normal Normal g−h Laplace g−h Kernel

Histogram Normal Normal g−h Laplace g−h Kernel

80

100 70

60

Frequency

Frequency

80

60

40

50

40

30

20 20 10

0 70

80

90

100

110 120 Hip (centimeters)

130

140

150

0 25

160

30

35

40 Neck (centimeters)

(a)

45

50

55

(b)

Fig. 2. (a) Hip vs. Normal Distribution and estimated pdf ’s Tukey’s g − h. (b) Neck vs. Normal Distribution and estimated pdf ’s Tukey’s g − h

To pursue elongation in these data, we first verify whether if it satisfies the condition given in (4.1). The value of θ turns out to be −66.5955. Letting the parameter h equal to zero, the mean and standard deviation of the variable Z are 5.1193 and 0.04969, respectively. The expression (2.6) reduces to B X = exp{gU } + θ ; (5.1) g where g =0.04969

B = 8.3088.

and

Figure 3 shows such a histogram and it is evident that the data have a slight degree of skewness to the left, leptokurtic and do not follow the normal distribution. As shown in Figure 3, there is a marked difference between the empirical distribution of the data (represented by the histogram) and the normal distribution. Tukey’s g − h family of generalized distributions better approximates the empirical. Density Functions Tukey(g,h), g= 0.04969, h= 0 140 Histogram Normal Normal g−h Laplace g−h Logistic g−h Hyperbolic Secant g−h Hyperbolic Cosecant g−h Kernel

120

100

Frequency

80

60

40

20

0 70

80

90

100

110 Chest (centimeters)

120

130

140

150

Fig. 3. Chest vs. Normal Distribution and estimated pdf ’s Tukey’s g − h

In order to determine how the fitted distribution agrees with fitted date, we use the methodology described by Hoaglin et al. (1985) to determine the sample quantiles of the form p = 2−k , k = 1, 2, . . . , 8. In Table 5 we present these quantile p−values along with their estimates, calculated using (5.1) by varying the variable U . Published by Atlantis Press Copyright: the authors 40

J. A. Jim´enez and V. Arunachalam and G. Serna

p

X (1)

X (2)

X (3)

X (4)

X (5)

1 256 1 128 1 64 1 32 1 16 1 8 1 4 1 2 3 4 7 8 15 16 31 32 63 64 127 128 255 256

83.4 85.1 86.7 88.2 89.2 92.1 94.2 99.6 105.3 110.1 115.3 118.5 119.8 121.6 128.3

81.0038 82.3667 84.0673 85.6874 88.2914 91.3309 95.0540 100.5735 106.2579 110.2843 113.6521 116.5300 119.0230 121.1539 122.8980

76.7796 79.1386 82.1685 84.8543 88.7569 92.6691 96.5452 100.5883 104.6864 108.7677 112.9997 117.2418 121.4055 125.3393 128.8264

78.7795 80.8365 83.2546 85.4107 88.5973 91.9394 95.6102 100.5753 105.6713 109.5978 113.2513 116.7229 120.0352 123.1144 125.8167

77.6844 80.0255 82.7684 85.1965 88.7254 92.2918 95.9715 100.5787 105.2917 109.1998 113.0733 116.9047 120.6461 124.1705 127.2879

Table 5. Observed and estimated values by the expression (5.1) for the heights of Australian athletes

The columns of Table 5 provide the following information: X (1) : X (2) : X (3) : X (4) :

Sample quantiles.  Values obtained using equation (5.1) with U ∼ GED 21 . Values obtained using equation (5.1) with U ∼ GED (1). √  Values obtained using equation (5.1) with U ∼ Logistic 0, π3 .  X (5) : Values obtained using equation (5.1) with U ∼ sech 0, π2 . Note that these adjustments are satisfactory for the four distributions used in the expression (5.1). Table 6 summarize the statistical results for the pdf of each estimated g − h. Fitted distribution Normal g − h Laplace g − h Logistic g − h HyperSec g − h

Mean

Stan. Dev.

SK1

KR1

100.8222 100.8194 100.8207 100.8199

8.3189 8.2724 8.2891 8.2811

0.1426 0.3109 0.2087 0.2534

2.9492 5.3597 3.8893 4.5306

Table 6. Results for the estimation of Chest taken from 252 men.

These results indicate  √ the importance of selecting a distribution on the g − h transformation, when U ∼ Logistic 0, π3 the sample moments are closer to the theoretical moments. Published by Atlantis Press Copyright: the authors 41

REFERENCES

6. Conclusion This paper presents a generalization of the well-known Tukey’s g − h family of distributions for fitting skewed data. We calculate explictly the cdf and pdf, and also the set of regularity properties obtained with respect to the expected values and variances. We also present a simulation procedure to estimate the value of the paramater g, that is, the standard deviation of the random variable ln (X − θ ), when the parameter h goes to zero. The proposed generalization is also used to generated a large class distributions from a symmetric density of the parameters g and h which controls the skewness and the elongation of the tails, respectively. References Aitchison, J. & Brown, J. (1963), The Lognormal Distribution, Cambridge University Press, United Kingdom. Badrinath, S. & Chatterjee, S. (1988), ‘On measuring skewness and elongation in common stock return distributions: The case of the market index’, The Journal of Business 61(4), 451–472. Badrinath, S. & Chatterjee, S. (1991), ‘A data analytic look at skeness and elongation in common stock return distributions’, Journal of Business & Economic Statistics 9(2), 223–233. Crow, E. L. & Shimizu, K. (1988), Lognormal distributions: Theory and applications, Statistics, textbooks and monographs, CRC Press, New York. Crow, E. L. & Siddiqui, M. M. (1967), ‘Robust estimation of location’, Journal of the American Statistical Association 62(318), 353–389. Dutta, K. K. & Babbel, D. F. (2004), ‘On measuring skewness and kurtosis in short rate distributions: The case of the us dollar london inter bank offer rates’, Working paper 02 − 25, Wharton School Financial Institutions Center . Dutta, K. K. & Babbel, D. F. (2005), ‘Extracting probabilistic information from the prices of interest rate options: Tests of distributional assumptions’, Journal of Business 78(3), 841–870. Dutta, K. K. & Perry, J. (2007), ‘A tale of tails: An empirical analysis of loss distribution models for estimating operational risk capital’, Federal Reserve Bank of Boston, Working Paper No. 06-13 . Groeneveld, R. A. & Meeden, G. (1984), ‘Measuring skewness and kurtosis’, Journal of the Royal Statistical Society. Series D (The Statistician) 33(4), 391–399. Headrick, T. C. (2010), Statistical Simulation: Power Method Polynomials and Other Transformations, Taylor & Francis Group, LLC, Chapman & Hall/CRC Press, Boca Raton, FL, USA. Hinkley, D. V. (1975), ‘On power transformations to symmetry’, Biometrika 62(1), 101–111. Hoaglin, D. C. (1985), ‘Summarizing shape numerically: the g−and−h distributions’, In: Hoaglin, D. C., Mosteller, F., Tukey, J. W. (Eds.), Exploring Data Tables, Trends, and Shapes. pp. 461–513. John Wiley & Sons. Hoaglin, D. C., Mosteller, F. & Tukey, J. W. (1985), Exploring Data Tables, Trends, and Shapes, John Wiley & Sons, New York. Hogg, R. V. (1974), ‘Adaptive robust procedures: A partial review and some suggestions for future applications and theory’, Journal of the American Statistical Association 69(348), 909–923. Jarque, C. M. & Bera, A. K. (1987), ‘A test for normality of observations and regression residuals’, International Statistical Review 55(2), 163–172. Jim´enez, J. A. (2004), Aproximaciones de las funciones de riesgo del tiempo de sobrevivencia mediante la distribuci´on g − h de tukey, Especialista en actuar´ıa, Facultad de Ciencias. Departamento de Matem´aticas. Universidad Nacional de Colombia. Sede Bogot´a. Published by Atlantis Press Copyright: the authors 42

Appendix

Jim´enez, J. A. & Arunachalam, V. (2011), ‘Using tukey’s g and h family of distributions to calculate value at risk and conditional value at risk’, Journal of Risk 13(4), 95 – 116. Jim´enez, J. A. & Mart´ınez, J. (2006), ‘An estimation of the parameter tukey’s g distribution’, Colombian Journal of Statistics 29(1), 1 – 16. Klein, I. & Fischer, M. (2002), ‘gh−transformation of symmetrical distributions’, In: Mittnik, Stefan, and Klein, Ingo (Eds.), Contribution to Modern Econometrics pp. 119–134. Kluwer Academic Publishers. Majumder, M. M. A. & Ali, M. M. (2008), ‘A comparison of methods of estimation of parameters of tukey’s gh family of distributions’, Pakistan Journal of Statistics 24(2), 135–144. Mart´ınez, J. & Iglewicz, B. (1984), ‘Some properties of the tukey g and h family of distributions’, Communications in Statistics - Theory and Methods 13(3), 353–369. Mills, T. C. (1995), ‘Modelling skewness and kurtosis in the london stock exchange f t − se index return distributions’, The Statistician 44(3), 323–332. Oberhettinger, F. (1973), Fourier transforms of distributions and their inverses: a collection of tables, Academic Press, New York. Olver, F. W. J., Lozier, D. W., Boisvert, R. F. & Clark, C. W. (2010), Handbook of Mathematical Functions, Cambridge University Press, New York. Tang, X. & Wu, X. (2006), ‘A new method for the decomposition of portfolio var’, Journal of Systems Science and Information 4(4), 721–727. Tocher, K. D. (1964), The art of simulation, Electrical engineering series, D. Van Nostrand Company, Inc, Princeton, NJ. Tukey, J. W. (1977), Modern techniques in data analysis, Nsp-sponsored regional research conference at southeastern massachesetts university, North Dartmouth, Massachesetts. Appendix A: Proof of propositions 3.1 and 3.2 Proof. (Proposition 3.1) We consider the m−th power of the expression (2.2),     1 m m 1e 2 k m Y = m∑ (−1) exp geU + hU g k=0 k 2 " #     m m−1 m − 1 (−1)k 1e 2 (−1)m 1 ehU 2 = m−1 ∑ exp geU + hU + e2 , g k ge 2 mg k=0 m−1

where ge = (m − k)g and e h = mh, since (−1)m = − ∑

k=0

then

     1e 2 m − 1 (−1)k 1e 2 hU Y = m−1 ∑ exp geU + hU − e 2 g k ge 2 k=0   m m−1 m − 1 (−1)k = m−1 ∑ [exp(e gU ) − 1] exp(e hU 2 /2) g k g e k=0 m

m

m−1 

k m k (−1) ,

which is the required result.

Published by Atlantis Press Copyright: the authors 43

Appendix

Proof. (Proposition 3.2) Suppose that uq is the smallest number satisfying FU (uq ) = q ie q-th quantile of U, making the change of variable w = uq =FU−1 (q)

dw =duq =

dq , (uq )

FU′

here we use the expression given in (3.1), since FU′ (w) = fU (w) , and lim FU (u) =0

lim FU (u) =1,

u→−∞

u→∞

moreover given that fU (w) is a function with domain the real line and counterdomain the infinite interval [0, ∞), we solve for dq and we obtain n R 1  −1 R∞ n 0 FU (q) dq = −∞ w fU (w) dw. Appendix B: Proof of formula given in (3.16)

In this Appendix, we present the calculation details of the equation given in (3.16), using the Table I of Fourier transforms (Oberhettinger (1973), of expression (79)) after some calculations and simplifying, we get   ) !2  ( √ ! √ r Z ∞  e ie g− 2 π  2 − ie g  |h| 2 p Φ p 2 cos (e gt) fU (t) exp − t dt = exp  2 n|h| 0 2n|h|  n|h|   !2 √ !  √2 + ie g  ie g+ 2  p + exp Φ − p ,  2n|h|  n|h|

where i is the imaginary quantity and Φ(·) is the cdf of a standard normal variable, then   !  √ ! r Z ∞  √2 + ge 2  |e h| 2 g e + π 2 −2t exp p Φ −p 2 cosh (e gt) fU (t)e dt =  n|h| 0 2n|h|  n|h|   ! ! √  √2 − ge 2  2 − ge  p Φ p . + exp  2n|h|  n|h| Substituting the above expression in (3.11) and simplifying we get,    √ !2  √ !   r  n  n 1 π 1 g e + 2 g e + 2 k exp p µn′ = n −1 Φ −p ∑   k g n|h| k=0 2 n|h| n|h|    ! √ !  1 ge − √2 2  ge − 2  p + exp Φ p . 2 n|h|  n|h|

When g = 0 and h < 0, we have "  #  n √ !n  Z 1 n   k 1 1 + −1 n|h| 1 2 n k + 1 ′ (k−1) −u µn = p e n|h| ∑ −1 Γ − u2 e du . n |h| 2 0 2 n |h| k=0 k Published by Atlantis Press Copyright: the authors 44

Loading...

A generalization of Tukey's g−h family of distributions 1 - Atlantis Press

Journal of Statistical Theory and Applications, Vol. 14, No. 1 (March 2015), 28-44 A generalization of Tukey’s g − h family of distributions J.A. Jim...

905KB Sizes 0 Downloads 0 Views

Recommend Documents

The Institutional of Family Planning Program in - Atlantis Press
Soedirman. 4) Israel, Aurto. (1990). Pengembangan. Kelembagaan Pengalaman Proyek-Proyek. Bank Dunia. Jakarta: LP3S. 5) M

Complementary Therapy in Handling ISPA in a Family - Atlantis Press
pandemi di fasilitas pelayanan kesehatan. WHO. (2008). Buletin Peneumonia jendela epidemiologi. Diakses pada 13. Januari

Developing a Model of Recognition of Work - Atlantis Press
Abstract--This study aimed to develop a Model of Recognition of Work Experience and Learning Outcomes (ReWELO) Model and

Developing a Model of a Competency and Students - Atlantis Press
were analyzed using the descriptive statistics and the t-test. Based on the results of the operational field ..... 3-8.

The Need of Public Administration - Atlantis Press
Abstract__Public Administration is a branch of Social Science disciplines taught at universities in Indonesia. Indonesia

MODERATION OF INFORMATION ASYMMETRY - Atlantis Press
Supposed the budgetary participation does not always have a linear effect on budgetary slack. This is because of the fac

ANALYSIS OF IMPLEMENTATION SAK CONVERGED - Atlantis Press
IFRS converged PSAK for financial instruments (PSAK 50, 55, and 60) in accordance with the adaptation stage. ... for Ind

a descriptive study of students' active role through - Atlantis Press
May 10, 2017 - PPP_Penemuan_ terbimbing. Pdf. Roestiyah, N, K. (2012). Strategi Belajar Mengajar. Jakarta: Rineka Cipta.

A Literature Review of Indonesian Life Concept - Atlantis Press
Abstrack: The linuwih counseling study is based on the importance of the counselor to understand the noble values and me

Paper Title - Atlantis Press
Abstract-Prostitution has become a global phenomenon. One contributing factor is the increasing number of clients. It is

Katana Maidens Toji no Miko (25) | EngView Package Desi | First Man El Primer Hombre BluRayRip AC3 5.1