Jurnal Matematika Vol. 2 No. 1, Desember 2011. ISSN : 1693-1394

Model-Check Based on Residual Partial Sums

Process of Heteroscedastic spatial Linear Regression Models

Wayan Somayasa

Jurusan Matematika FMIPA Universitas Haluoleo Kendari 93232

e-mail: [email protected]

Abstract: It is common in practice to evaluate the correctness of an assumed linear regression model by conducting a model-check method in which the residuals of the observations are investigated. In the asymptotic context instead of observing the vector of the residuals directly, one investigates the partial sums of the observations. In this paper we derive a functional central limit theorem for a sequence of residual partial sums processes when the observations come from heteroscedastic spatial linear regression models. Under a mild condition it is shown that the limit process is a function of Brownian sheet. Several examples of the limit processes are also discussed. The limit theorem is then applied in establishing an asymptotically Kolmogorov type test concerning the adequacy of the fitted model. The critical regions of the test for finite sample sizes are constructed by Monte Carlo simulation.

Keywords: heteroscedastic linear regression model, least squares residual, partial sums process, Brownian sheet, asymptotic model-check.

  • 1.    Introduction

Let us consider an experiment performed under n × n experimental conditions taken from a regular lattice given by

Ξn := {(l/n, k/n) : 1 ≤ l,k ≤ n}, n N.

Without loss of generality we consider the unit square I := [0, 1] × [0, 1] as an experimental region instead of any compact subset of R2 . For convenience we take the observations carried out in Ξn row-wise initializing at the point (1 /n, 1 /n) and put them together in an n × n matrix Yn) : = (Ylk)n=nl Rn×n, where the observation in the point (l/n,k/n) is denoted by Ylk, 1 ≤ l,k ≤ n. Consequently, we have a sequence of observable random matrices (Yn))n 1 C Rn×n. As usual we furnish the vector space Rn×n with the Euclidean inner product

< A, B r n×n := trace (Aτ B), A, B R n×n.

Let fi,..., fp : I R be known, real-valued regression functions defined on I. For a real-valued function f defined on I, let fn) := (f (l/n,k/n))n=nι=1 Rn×n. Our aim is to construct an asymptotic test procedure for the hypothesis

p

H0 : Yn) =    βifin)+Envs.H1 : Yn) = gn) + En,                (1)

i=1

where (β 1,..., βp)τ =: β Rp is a vector of unknown parameters, En := (εlk)n=1 e=1 is an n × n random matrix whose components are independent, real-valued random variables εlk , 1 ≤ l, k ≤

  • n, defined on a common probability space (Ω, F, P) having mean 0 and variance σlk, 0 < σ2k < ∞, 1 ≤ ℓ, k ≤ n, and g : I R is the unknown true regression function. Thus, under nullhypothesis we consider a heteroscedastic linear model, while under the alternative we assume a non-parametric heteroscedastic regression model. It is worth mentioning that under H0 and Hι we need not to assume any specific distribution for the random errors εℓk , 1 ≤ ℓ, k ≤ n. Under the assumption fιn),..., fpn) are linearly independent in Rn×n, the corresponding matrix of least squares residuals of the observations under H0 is given by

χn, n P          fin), En)Rn×n fin)

Rn := (rlk)k=1 =1 = En >     , .   .    , μ .

^=1   fin), fin))Rn×n

Recently, for a fixed n ≥ 1, MacNeill and Jandhyala [7], and Xie and MacNeill [11] define an operator Tn : R n×n → C (I), given by

[nz2] [nz 1]

Tn(A)(z 1 , z2) := y y aIk

k=1 I =1

[nz2]                                               [nz1]

+ (nz 1 - [nz 1]) ^ a[nz 1]+1 I + (nz2 [nz2]) ^ ak,[nz2]+1 I=1                             k=1

+ (nz 1 - [nz 1])(nz2 - [nz2])a[nz 1]+1,[nz2]+1, (z 1 ,z2) I,

for every A = (alk)n=,= =1, where [t] := max{n N : n t}, t R and Tn(A)(t,s) = 0, if t = 0 or s = 0. Here C(I) is the space of continuous functions on I furnished with the supremum norm. By the operator Tn, the matrix of the least squares residuals is induced into a stochastic process {Tn(Rn)(t, s) : (t, s) I} having sample paths in C(I). Let us call this process residual partial sums process. It is common in practice to test (1) by investigating a functional of the residual partial sums process such as a Kolmogorov type statistic, defined by Kn := max1 <l,k<n Tn(Rn)(l∕n,k∕n). Therefore in order to establish this test problem we need to investigate the limit process of the sequence {Tn(Rn)(t,s) : (t,s) I}n> 1 under H0 as well as under H1. In MacNeill and Jandhyala [7] and in Xie and MacNeill [11] the limit process of this sequence was derived explicitly in which homoscedasticity was assumed, i.e. σ2k = σ2, for 1 l,k ≤ n. It was shown therein that under the condition of the regression functions are continuously differentiable, the limit process is a complicated function of the Brownian sheet. In Somayasa [10] the limit process of such a sequence was also derived by generalizing the approach of Bischoff [4] from one to higher dimensional case. In contrast to the result of MacNeill and Jandhyala [7] and Xie and MacNeill [11], Somayasa [10] got the structure of the limit process as a projection of the Brownian sheet onto its reproducing kernel Hilbert space. In this paper we establish the limit process of the heteroscedastic linear regression model defined above, see Section 2. In Section 3 we discuss examples of the limit process corresponding to polynomial models. In Section 4 we construct the critical region of the Kolmogorov type test.

  • 2.    Residual partial sums limit process

In the sequel we characterize the heteroscedasticity of the regression model by defining a function h : I R>0, such that σjk = h(l/n, k/n), 1 l,k ≤ n, n N, where h is assumed to be a function of bounded variation in the sense of Vitali, see Clarkson and Adams [5].

Definition 2.1. A stochastic process {Bh(t, s) : (t, s) I} is called a h-Brownian sheet on C(I), if

1.

2.


3.


Bh(t, s) = 0 almost surely (a.s.), if t = 0 or s = 0.

For every rectangle [11 ,t2] × [s 1, s2] 1, 0 11 12 1, 0 s 1 ≤ s2 1,

δ[11 ,t2]× [s 1 ,s2]Bh ^N ^0, Jt t [ h dλ^ ,

where[11 ,t2] × [s 1 ,s2] Bh := Bh (12,s2) - Bh (11, s2) - Bh (12, s 1) + Bh (11, s 1), and λi is the Lebesgue measure on I. Random variable ∆[ 11 ,t2] × [ s 1 ,s2] Bh is called the increment of Bh over [ 11 ,t 2] × [ s 1, s 2]. For any two rectangles I1 I, I2 I with I1 I2 = 0,i1 Bh andi2 Bh are mutually independent.

We refer the reader to MacNeill and et al. [?] for the existence of such a process. In case h is a constant function, Bh is the Brownian sheet whose existence has been studied by Yeh [12], Kuelbs [6], and Park [9]. As a consequent of Definition 2.1, the covariance function of Bh is given by

KBh (11 ,s 1; 12 ,s 2):= Cov (Bh (11 ,s 1) ,Bh (12 ,s 2))= /               h dλ i,

J [0,t 1 Λt2] × [0,s 1 Λs2]

(11, s 1), (12, s2) I, where x y stands for the minimum between x and y.

Theorem 2.1. Let (En)n 1, En := (εlk)n=n11 =1 be a sequence of n × n random matrix such that εlk are mutually independent with E(εℓk) = 0 and V ar (εℓk) = h(ℓ/n, k/n), 1 ≤ ℓ, k ≤ n, n ≥ 1. Then n Tn (En) -→ Bh, as n → ∞, in C (I). Here -→ stands for the convergence in distribution (weakly), see Billingsley [2], p. 23.

Proof. See MacNeill and et al. [?].

Theorem 2.2. Let f1, . . . , fp be continuous and have bounded variation in the sense of Hardy (Clarkson and Adams [5]) on I. If f1,..., fp are linearly independent in L2(I 1), where L2(I 1) is the Hilbert space of squared integrable functions on I with respect to λI , then

1m      D

nTn(Rn) —→ Bh f, as n → ∞, in C (I),

where

Bhf (t, s) := Bh


I        fτI

[0,t] × [0,s]


, (t, s) I,


[0 ,t] × [0 S ]


f1 I,...,

J [0,t] × [0,s]


fp I


J(Rf dBh :


dBh , .


.


., J (R) f p dBh ) .


Here W := (∫1 fi f j 1) p,p1j.=1 Rp×p is invertible. Furthermore Bhff function given by

KBh i (t,s; t',s') := Cov (Bh, j(t,s) ,Bh, f(t' ,s'))


is a process with the covariance


=

J [0,tΛt'][0,sΛs']


I fτ I [0,t']×[0,s']


˜f h dλI [0,t]×[0,s]


-


+


fτ

[0,t]×[0,s]

( fτ

[0,t]×[0,s]


I


˜fh dλI

[0,t'] × [0,s]               /


i) W -1 (J f i f j hdλ i^


p, p


W -1 i=1 ,j=1


Here and in the sequel(R) denotes Riemann-Stieltjes integral, see Young [13] and Somayasa [10], p. 115.

Proof. The proof of Theorem 2.2 in Bischoff [3] and the result of Bischoff [4] can be extended to the case of higher experimental regions.

  • 3.    Examples

In this section we discuss several examples of the residual partial sums limit processes of constant, first-order and second-order regression models.

  • 3.1.    Constant regression model

As a simple case, we consider a constant model, i.e. Yn) = βf1 n) + En, where β is an unknown parameter and f1 (t, s) = 1, for (t, s) I. Then the residual partial sums limit process of this model is given by

Bh,f˜0(t,s) := Bh(t, s) - tsBh(1, 1), (t, s) I,

which is the standard Brownian bridge when h is constant, see e.g. McNeill and Jandhyala [7] and Somayasa [10], p. 20.

  • 3.2.    First order regression model

Let us consider a first-order regression model

Yn) = β1f1n) + β2f2n) + β3f3n) + En

where β1 , β2 and β3 are unknown parameters, f1 (t, s) = 1, f2 (t, s) = t and f3(t, s) = s, for (t, s) I. Associated to this model we have

(t, s) I.


Wayan Somayasa/ Model check for heteroscedastic spatial linear model                    15

where β1 , β2, β3, β4, β5 and β6 are unknown parameters, f1 (t, s) = 1, f2 (t, s) = t, f3(t, s) = s, f4(t, s) = t2, f5 (t, s) = ts, f6 (t, s) = s2, for (t, s) I. Accordingly the matrix W and W-1 are given

by

1     1 / 2  1 / 2  1 / 3  1 / 4  1 / 3

1/2  1/3  1/4  1/4  1/6  1/6

W      1/2  1/4  1/3  1/6  1/6  1/4

=    1 / 3  1 / 4  1 / 6  1 / 5  1 / 8  1 / 9    ,

1 / 4  1 / 6  1 / 6  1 / 8  1 / 9  1 / 8

1/3  1/6  1/4  1/9  1/8  1/5

26   -54   -54   30    36    30  

  • -54  228   36   -180 -72   0

  • -54   36   228    0   -72 -180

W1=                                    .

30   -180    0     180    0     0

36   - 72   - 72    0    144    0

30    0   -180   0     0    180

Let y1 , y2, y3, y4, y5, y6 : I R be functions of I defined by y1 (t, s) := 26ts - 27t2s - 27ts2 + 10t3s + 9t2s2 + 10ts3, y2(t, s) := -54ts + 114t2s + 18ts2 - 60t3s - 18t2s2 - 60ts3, y3(t, s) := -54ts + 18t2s + 114ts2 - 18t2s2 - 60ts3, y4(t, s) := 30ts - 90t2s + 60t3s, y5(t, s) := 36ts - 36t2s - 36ts2 + 36t2s2 and y6(t, s) := 30ts - 90ts2 + 60ts3. The residual partial sums limit process of this model can be expressed

by

Bh,f˜2(t,s) := Bh(t, s) - y1(t,s)Bh(1, 1)

  • -    y2(t,s) Bh(1, 1) - [0 1] Bh(t, s)dt

  • -    y3(t, s) Bh(1, 1) - 0 1 Bh(1, s)ds

  • - y4(t,s) Bh(1, 1) - [0,1] 2Bh(t, 1)tdt

  • -    y5(t,s) Bh(1, 1) -     Bh(t, 1)dt -     Bh(1, s)ds +     Bh(t,s)dtds

[0,1]                                  [0,1]                                   [0,1]

  • -    y6(t,s) Bh(1, 1) - [01] 2Bh(1, s)sds , (t, s) I.

  • 4.    Kolmogorov type test

Kolmogorov type test for Hypotheses (1) is a test based on the statistic Kn,f := max0l,kn nli=0 ^=0 rij.

We put rij = 0 if i = 0 or j = 0. We note that by the property of the partial sums, it holds

Kn,f = sup0≤t,s≤1 n1 Tn(Rn)(t, s)

Theorem 4.1. For a fixed α (0, 1), let c˜α be the α-quantile of sup0≤t,s≤1 Bh,˜f (t, s), i.e. a constant such that P sup0≤t,s≤1 Bh,˜f (t, s) ≤ c˜α = α. Then an asymptotically size α test based on Kn,f is given by

reject H0 if and only if Kn,f ≥ c˜1.

Proof. Let χ Rn×n be the sample space of the model. We define a sequence of non randomized test (δn)nι, where δn : χ → {0, 1}, such that for Yn χ,

δn (Y n ) :   1 { Y n ;SUPO t,s 1 i T n (γ n-χ n (χ T χ n ) - 1χ T γ n ) f ^ } ,

where 1A is the indicator function of A. Then by Theorem 2.1 and by the continuity of supremum function, we have

lim E0 (δn) = P < sup n→∞             [θ≤t,s≤ 1

= P sup

0≤t,s≤1


1

Tn En n


- Xnn (X: Xn ) - 1X: En) ≥ C1-J


Bhff(t,s) ≥ C1-J = α,


where E0 is the expectation operator under H0. The proof is complete because the expression limn→∞ E0(δn) = α holds uniformly under H0. □

Since the quantile C(1 ) of sup0t,s 1 Bh f (t,s) can not be calculated analytically we approximate the finite sample size quantile of Kn,f by Monte Carlo simulations generated according to following algorithm.

step 1: Fix n0 N.

step 2: Generate M i.i.d. pseudo random matrices En0) := (εlkj)n=1 n=1, with independent components generated from N (0, h(l/n0, k/n0)) random variables, 1 ≤ l,k n0, j = 1,..., M.

step 4: Calculate the matrix of residuals Rj by the equation

D (j) = p(j) _ ∑ f^oJsEnO^O^nOf^^

n 0      n 0    ⅛ (fin 0 ) ,fin O ) > R n 0 ×n 0 '

step 5: Calculate the statistic Kj := max0 kfln0 T n0 (Rjo) (l/n 0, k/n 0).

step 6: Calculate the simulated (1 — α)-quantiles of sup0t,s 1 Bhff (t, s): Let KnMfj) be the j’th smallest observation, i.e. K(Mf1) ≤ ... ≤ K(Mfj) ≤ K(Mfj+1) ≤ ... ≤ K(MfM), then the simulated n0,f                          n0,f             n0,f                              n0,f

  • (1 - α)-quantile is given by

  • [ KMf ■ M(1 -α)),      if M(1 — α) N,

cC(1) =

L KMf[M(1 -α)]+1),   otherwise,

where [M (1 - α)] = max{k N : k ≤ M (1 - α)}.

The simulation results obtained by using the statistical software package R 2.0.1 are presented in Table 1 for α = 0.005, 0.010, 0.025, 0.050, 0.100, 0.150, 0.200, 0.250, 0.360 and 0.500, with the corresponding sample size n0 = 30 and the number of replications M = 106.

Models

c0.5000

c0.6500

c0.7500

c0.8000

c0.8500

Constant

0.3740

0.4409

0.4839

0.5139

0.5200

First order

0.3691

0.3963

0.4101

0.4179

0.4306

Second order

0.3295

0.3674

0.3873

0.3982

0.4233

Models

c0.9000

c0.9500

c0.9750

c0.9900

c0.9950

Constant

0.6342

0.6857

0.7381

0.8209

0.8540

First order

0.4631

0.4798

0.5107

0.5425

0.5538

Second order

0.4380

0.4479

0.4867

0.4976

0.4977

Table 1. The simulated c˜(1) , for h(t, s) = ts, (t, s) I.

References

  • [1]    Alexander, K.S., and R. Pyke (1986). A uniform central limit theorem for set-indexed partial-sum processes with finite variance. Annals of Probability, 14, 582-597.

  • [2]    Billingsley, P. (1968). Convergence of Probability Measures. John Wiley & Sons, Inc., New York, Singapore, Berlin, Brisbane.

  • [3]    Bischoff, W. (1998). A functional central limit theorem for regression models. Ann. Stat., 26 (4), 1398-1410.

  • [4]    Bischoff, W. (2002). The structure of residual partial sums limit processes of linear regression models. Theory of Stochastic Processes, 2 (24), 23-28.

  • [5]    Clarkson, J.A. and Adams, C.R. (1933). On definition of bounded variation for functions of two variables. Transactions of the American Mathematical Society, 35 (4), 824-854.

  • [6]    Kuelbs, J. (1968). The invariance principle for a lattice of random variables. The Ann. of Math. Stat., 39 (2), 382-389.

  • [7]    MacNeill, I.B. and Jandhyala, V.K. (1993). Change point methods for spatial data. Multivariate Environmental Statistics, eds. By G.P. Patil and C.R. Rao. Elsevier Science Publisher B.V., 298-306.

  • [8]    MacNeill, I.B., Mao, Y. and Xie, L. (1994). Modeling heteroscedastic age-period-cohort cancer data. The Canadian Journal of Statistics, 22 (4), 529-539.

  • [9]    Park, W.J. (1971). Weak convergence of probability measures on the function space. J. of Multivariate Analysis, 1, 433-444.

  • [10]    Somayasa, W. (2007). Model-checks based on least squares residual partial sums processes, Ph. D. Thesis, Faculty of Mathematic, Karlsruhe Institute of Technology.

  • [11]    Xie, L. and MacNeill, I.B. (2006). Spatial residual processes and boundary detection. South African Statist. J., 40 (1), 33-53.

  • [12]    Yeh, J. (1960). Wiener measure in a space of functions of two variables. Trans. Amer. Math. Soc., 95, 433-450.

  • [13]    Young, W.H. (1917). On multiple integration by parts and the second theorem of mean. The Ann. Math. Stat., 43 (4), 1235-1246.