CURVE ESTIMATION AND ESTIMATOR PROPERTIES OF THE NONPARAMETRIC REGRESSION TRUNCATED SPLINE WITH A MATRIX APPROACH
on
E-Jurnal Matematika Vol. 11(1), Januari 2022, pp. 64-69
DOI: https://doi.org/10.24843/MTK.2022.v11.i01.p362
ISSN: 2303-1751
CURVE ESTIMATION AND ESTIMATOR PROPERTIES OF
THE NONPARAMETRIC REGRESSION TRUNCATED SPLINE WITH A MATRIX APPROACH
Nurul Fitriyani1§, I Nyoman Budiantara2
-
1Department of Mathematics, Universitas Mataram, Mataram, Nusa Tenggara Barat, Indonesia [Email: [email protected]]
-
2Department of Statistics, Institut Teknologi Sepuluh Nopember, Surabaya, Jawa Timur, Indonesia [Email: [email protected]]
§Corresponding Author
ABSTRACT
Regression analysis is one of the statistical analyses used to estimate the relationship between the predictor and the response variable. Data are given in pairs, and the relationship between the predictor and the response variable was assumed to follow a nonparametric regression model. This model is flexible in estimating the curve when a typical data pattern does not follow a specific pattern. The nonparametric regression curve was approached by using the truncated spline function with several knots. The truncated spline estimator in nonparametric regression is linear in the observation. It is highly dependent on the knot points. The regression model's random error is assumed to have an independent normal distribution with zero mean and equal variance. The truncated spline's curve estimate was obtained by minimizing the error model through the least squared optimization method. The nonparametric regression truncated spline's estimator properties are linear, unbiased, and if the error is normally distributed, the estimator is normally distributed.
Keywords: curve estimation, estimator properties, matrix approach, nonparametric regression,
truncated spline.
-
1. INTRODUCTION
Nonparametric regression has received a lot of attention from researchers. This approach loosens assumptions about linearity, as well as information about functional form in regression analysis. Besides, this approach allows data to be explored more flexibly. It is a flexible regression in estimating the regression curve when, in some circumstances, a common data pattern does not follow a specific pattern (Mahmoud, 2019).
There are several approaches to nonparametric regression, one of which is the truncated spline approach. Among the nonparametric regression models, spline regression has several features, such as it is a model with particular and excellent statistical interpretation and visual interpretation. The spline can model data on changing data patterns at certain sub-intervals because the spline is a polynomial slice with segmented properties. This segmented property provides
more flexibility than ordinary polynomials, making it possible to adapt more effectively to the local characteristics of a function or data (Budiantara et al. (2015); Nurcahayani et al., (2019); Wening et al. (2020)).
Previous research has been carried out by applying truncated spline nonparametric regression to various case data, such as pattern data related to poverty, population, education, and health (Budiantara et al. (2012); Fitriyani et al. (2016); Chamidah et al. (2019); Murbarani et al. (2019); Nurcahayani et al. (2019)). This study is conducted related to the curve estimation and estimator properties of one of the nonparametric regression models with a matrix approach. It is expected to provide a scientific insight into the process of curve estimation, and the properties of the nonparametric regression model's estimators with the spline truncated approach.
-
2. CURVE ESTIMATION OF THE NONPARAMETRIC REGRESSION
TRUNCATED SPLINE
The spline function is the sum of the polynomial functions with a truncated function. Consider the nonparametric regression model, where the g curve estimation is done using a spline. Given paired data (xj,yj) and the relationship between Xj and y j assumed to follow a nonparametric regression model:
yj = f(χj) + εj,j = 1,2,-,n (1)
This study is carried out with the regression curve f approached with the spline function g with knots K. The spline function g approaches regression curve with r points of knot K, the spline regression model is obtained according to equation (1). If equation (1) is presented in a matrix form, the following form is obtained.
∕yι∖ ∕g(Xι)∖ )εΛ ∕y2) = (g(χ2)) + (^2) ∖yj ∖g{χn}) ∖εJ
(2)
The shape of the above equation’s regression curve g(χj) is assumed to be unknown, while the errors εj,j = 1,2,- ,n are mutually independent with zero mean and σ2 variance. The spline function in equation (2) can be described in the following form:
g(χj} = a0 + a1Xj + a2χf+.. +amχjn + βι(χj - ki)+ +—
+βr(χj - κr)+
(3)
where a^ and βk are a real constant, and a truncated function as in equation (3). If the truncated spline regression model is presented in the form of a matrix, it is obtained:
χi |
χi2 ∙ |
.. v m χ1 |
(χι |
-Kι)m ∙ |
■• (χi |
- Kr) |
χ2 |
χ2 ∙ |
.. v m χ2 |
&2 |
-Kι)m ∙ |
'∙ ^2 |
- Kr) |
χn |
χn ∙ |
.. v m χn |
(χn |
-K')+- ∙ |
■■ (x2 |
- Kr) |
m +
l m
l m
∕ ao∖
∖ aι I
) a +Θ
∖βj
or it can be written as:
V = X[K1,K2,...,Kr]β + ε
Furthermore, the parameter estimates of β are obtained using the least square method by completing the optimization.
M~ ≡'≡
β [∖εj Vjj
τ
with,
∕
τ
®=®-(1
χι ∙ |
.. v m χi |
(×1 - |
- Kim ∙ |
■• (χι - |
- Kr) |
χ2 ∙ |
.. v m χ2 |
(x2 - |
- Kι)m ∙ |
■■ (x2 - |
- Kr) |
χn |
.. v m χn |
(χn |
- Kι)m ∙ |
■■ (x2 - |
- Kr) |
' m
'm∙ - m
a2
am β1
and,
∖
Ii
*1
*2
*n
V m *1 |
(*1 |
-K1W ∙ |
• (*1 |
-KrW∖ -Kr)m| -KrW) |
«2 |
v m *2 |
(*2 |
-kw ∙ |
• (*2 |
^ «m | |
v m *n |
(*n |
-K1W ∙ |
• (*2 |
β1 | |
In a more straightforward form of presentation, the estimator,
β = («0 «1 ∙∙∙ am βι ■■■ βr~)τ
can be obtained by solving an optimization that minimizes the number of squares of error concerning the vector βτ. Then the result is equalized to zero.
Min {ετε} = Min {(Y - X[K1, K2.....K^ β)τ(Y - X[Kι, K2.....K^ β)
with Y = (yι y2 "' Vn)τ is a vector of matrix of size nx (m+1 + r). The sum of
size n x 1 and X[K1,K2,...,Kr] = X[K] is a squared error is given as follows.
∑‰εf =ετε
= (Y-X[K] β-)τ(Y-X[K] β)
= YτY - 2βτX[K]τY + βτ X[K]τ X[K]β
If the above equation is derived concerning then the result of ∂ετε∕∂βτ = 0 given as the
vector βτ and the result is equalized to zero, following,
∂ετε ∂βκ
= 0
∂(YτY - 2βτX[K]τY + βτX[K]τX[K]β') _
∂β =0
-2X[K]τY + 2X[K]τX[K]β = 0
X[K]τX[K]β = X[K]τY
β = (X[K]τX[K])-1X[K]τY
Consequently, the estimate for the spline regression curve with knots K is given by,
g(xj) = X[K]β
= X[K](X[K]τX[K])-1X[K]τY
The equation can be written as the following form, with X[K] (X[K]τX[K])-1X[K]τ = A[K] is a function of the knot points, and K = (K1, K2,..., Kr)τ is the knot points.
g(Xj) = A[K] Y
-
3. ESTIMATOR PROPERTIES OF THE NONPARAMETRIC REGRESSION TRUNCATED SPLINE
Further study is conducted regarding the properties of the truncated spline estimator in
nonparametric regression.
-
a. Linear Estimator
In this section, a study of the linearity of the spline estimator in nonparametric regression is conducted. The spline
nonparametric regression model can be written in the following matrix form with K = (K1,K2Kr)
Y = X[K] β + ε
If X[K] β is written by g, with X[K] is the function matrix of K, we get:
Y =g + ε
and,
g =X[K]β
= X[K](X[K]τX[K])-1X[K]τY
= A[K]Y
Based on the above equation, it can be seen that the spline estimator g is linear. This linearity can make it easier for researchers to form inference statistics for the spline approach.
-
b. Unbiased Estimator
In this section, the expected value of the estimator spline g is determined to see whether the estimator is biased or not. Then obtained,
E(g) = E(A[K]Y)
= A[K]E(Y)
where,
E(Y) = E(X[K]β + ε)
= E(X[K]β) + E(ε)
ε is a vector of random error εj,j = 1,2, •■• ,n that is mutually independent with zero mean, με = E(ε) = 0, and σ2ε = σ2 variance. The following form is obtained,
E(Y) = E(X[K]β)
= X[K]β
Therefore, the expectations of the estimator g are obtained with the following equation,
E(g) =A[K]X[K]β
= X[K](X[K]τX[K])-1X[K]τX[K]β
= X[κ]β
= g
E(g) = g indicates that the estimator g is unbiased.
-
c. Normally Distributed Estimator
In this section, a study is conducted regarding the distribution of the spline estimator in nonparametric regression. In inference statistics, random error ε is assumed to follow a normal distribution with zero mean , E(ε) = με = 0, and σ2I variance. Since the spline nonparametric regression model’s error is normally distributed, written as N (0,σ2I), the Moment Generating Function (MGF) of the Y vector is given as follows.
Mγ(t) =Mg+ε(t)
= Mg(t) ∙ Mε(t)
= E(exp(trg')) ∙ Mε(t)
= exp(tτg) ∙ Mε(t)
= exp(tτX[K]β) ∙ Mε(t)
Since the spline nonparametric regression model's random error is normally distributed, the Moment Generating Function (or written as MGF) of the vector ε is given as follows.
Mε(t) = E(exp(tτε))
= exp(tτμε + 1tτσ2εlt)
= exp (tτ(0) + ^trσ2It)
= exp (1tτσ2It)
So we get MGF from the Y vector as the following.
MY(t) = exp(tτX[K]β) ∙ exp (1tτ σ2It) = exp (tτX[K]β + 1tτσ2It)
The above results show the MGF of the normal distribution with mean X[K]β and variance σ2I. This means that the vector Y of the spline nonparametric regression model follows a normal distribution with X[K] β mean and σ2I variance.
The next step to determine the properties of the spline estimator in nonparametric regression is to find the distribution of the estimator β. If β = (X[K]τX[K])-1X[K]τY is known, then the MGF of the estimator β is given as follows.
Mβ(t) M(x[K]TX[K]) 1χ[κ]τγ(t)
= Mγ((X[K]τX[K])-1X[K]τt)
= exp {((X[K]τX[K])-1X[K]τt)TX[K]β + 1((X[K]τX[K])-1X[K]τt)τσ2I(X[K]τX[K])-1X[K]τt}
= exp {(X[K]τt)τ((X[K]τX[K])-1)τχ[K]β + i(χ[κ]τt)τ((χ[κ]τχ[κ])-1)τσ2i(χ[κ]τχ[κ])-1χ[κ]τt}
= exp {tτX[K](X[K]τX[K])-1X[K]β +
-
1 tτX[K] (X[K]τX[K]) -1 σ2I(X [K] τX[K])-1X[K]τt}
= exp {tτ[X[K](X[K]τX[K])-1X[K]β] +1 tτ[X[K](X[K]τX[K])-2X[K]τσ2]t}
It is found that Mβ(t) is the MGF of the normal distribution with the mean and variance are given by,
Mean = X[K] (X[K] τX[K])-1X[K] β
Variance = X[K](X[K]τX[K])-2X[K]τσ2
Furthermore, taking into account the equation g = a[k] γ, we will look for the distribution
of the estimator g. By using MGF, the following results were obtained.
Mg(t) = MA[K]y(O
= Mγ(A[K]t)
In the previous section, we got Mγ(t) = exp {tτX[K] β + ∣tτσ2It}, so,
Mg(t) = exp {(A[K]t)τX[K] β + 1 (A[K]t)τσ2IA[K]t}
= exp {tτ(A[K])τX[K] β +1 tτ(A[K])τσ2IA[K]t}
= exp {tτ[(A[K])τX[K] β] +∣tτ[(A[K])τσ2A[K]]t}
Mg(t) is the MGF of the normal distribution, are given in the following formulas,
with the mean (expected value) and variance respectively.
Mean = (A[K])τX[K] β
= (X[K] (X[K] τX[K])-1X[K] τ)τχ[K] β
= X[K](X[K] τX[K])-1X[K] τX[K] β
and,
Variance = (X[K] (X[K] τX[K])-1X[K] τ)τσ2X[K] (X[K] τX[K])-1X[K] τ
= X[K](X[K] τX[K])-1X[K] τX[K] (X[K] τX[K])-1X[K] τσ2
= X[K](X[K] τX[K])-1X[K] τσ2
4. CONCLUSIONS
From the studies that have been done, it can be seen that the spline estimator in nonparametric regression is linear in the observation Y = (yι y2 ^ yn)τ and is highly dependent on the knot points K1,K2,...,Kr. The curve estimate for the truncated spline was obtained, namely g(x) = A[K] Y, with A[K] indicates the function of knot points K = (K1, K2,..., Kr)τ. The
estimator's properties are linear, unbiased, and if the error is normally distributed, the estimator is normally distributed.
REFERENCES
Budiantara, I. N., Ratna, M., Zain, I., &
Wibowo, W. 2012. Modeling the Percentage of Poor People in Indonesia Using Spline Nonparametric Regression Approach. International Journal of Basic & Applied Sciences IJBAS-IJENS, 12(06), 119–124.
Budiantara, I. N., Ratnasari, V., Ratna, M., & Zain, I. 2015. The Combination of Spline and Kernel Estimator for Nonparametric Regression and its Properties. Applied Mathematical Sciences, 9(122), 6083–
6094.
Chamidah, N., Lestari, B., & Saifudin, T. 2019. Modeling of Blood Pressures Based on Stress Score using Least Square Spline Estimator in Bi-response Non-parametric Regression. International Journal of
Innovation, Creativity and Change, 5(3), 1200–1216.
Fitriyani, N., Budiantara, I. N., Zain, I., & Ratnasari, V. 2016. Nonparametric Regression Spline in The Estimation of The Average Number of Children Born Alive Per Woman. The 1st International Conference on Science and Technology (ICST), 169–172.
Mahmoud, H. F. F. 2019. Parametric versus Semi and Nonparametric Regression Models. Preprint, 1–24.
https://arxiv.org/abs/1906.10221
Murbarani, N., Swastika, Y., Dwi, A., Aris, B., & Chamidah, N. 2019. Modeling of the Percentage of AIDS Sufferers in East Java Province using Nonparametric Regression Approach based on Truncated Spline Estimator. Indonesian Journal of Statistics and Its Applications. Vol 3 No 2 (2019), 139 - 147, 3(2), 139–147.
Nurcahayani, H., Budiantara, I. N., & Zain, I. 2019. Nonparametric Truncated Spline Regression on Modelling Mean Years Schooling of Regencies in Java. The 2nd International Conference on Science, Mathematics, Environment, and
Education, AIP Conf. Proc. 2194, 1–8.
Wening, A. W., Budiantara, I. N., & Zain, I. 2020. Semiparametric Regression Curve Estimation for Longitudinal Data using Mixed Spline Truncated and Fourier Series Estimator. ICCGANT 2019, Journal of Physics: Conference Series, 1538, 1–10.
69
Discussion and feedback