-main
(-main & args)
Runs multiple linear regression model with the unknown
joint distribution of error terms using OLS and resampling.
Saves results into the bunch of csv files
in the root execution directory.
Arguments: path
n-rep
Original sample is read from the csv file by the [path] address.
The first row should contain variable labels.
The first column contains values of the response y.
Model:
y=Xb+eps,
eps ~ F(0,sigma^2).
y: [n x 1] vector of the response.
X: [n x (p+1)] matrix of the explanatory variables.
b: [(p+1) x 1] vector of the regression coefficients.
eps: [n x 1] vector of the independent and
identically distributed errors with common distribution F
having mean 0 and finite variance sigma^2.
n: number of observations.
p: number of explanatory variables in the input file.
Assumptions: error terms are independent and identically distributed.
Regression coefficients are estimated using ordinary least squares (OLS).
Hypothesis testing (permutation tests):
output: lu-regression/permutation_tests.csv
lu-regression/permutation_r2_sample.csv
1) Overall model significance - exact permutation test on R-square.
H0: b_1=b_2=...=b_p=0.
out: approximate p-value, calculated after [n-rep] permutations
with 95%-normal approximation interval.
2) Significance of the i-th coefficient - approximate permutation
test (Freedman & Lane, 1983) on t-statistic.
H0: b_i=0.
out: approximate p-value, calculated after [n-rep] permutations
with 95%-normal approximation interval.
Confidence intervals (bootstrapping):
output: lu-regression/regression-stat-bootstrap.csv
estimates:
b_0, b_1, ..., b_n;
R-square, MSE (mean square error).
out: mean with 95% percentile confidence interval.
bootstrap scheme: percentile bootstrap (Efron & Tibshirani, 1993),
left border - value at position of the largest
integer not greater than alpha/2*[n-rep],
right border - value at position of the smallest
integer not less than (1-alpha/2)*[n-rep].
confidence level: alpha=0.05.
## Usage
$ lein run sample-all.csv 10000
$ lein run sample-x1-x2-x3-x9-x10.csv 10000
$ lein run sample-x1-x2-x3.csv 10000
## References:
[1] Anderson, M. (2001). Permutation tests for univariate or multivariate analysis of variance and regression.
Canadian Journal of Fisheries and Aquatic Sciences, 58(3): 626-639. DOI: 10.1139/f01-004.
[2] Freedman, D., & Lane, D. (1983). A Nonstochastic Interpretation of Reported Significance Levels.
Journal of Business & Economic Statistics, 1(4): 292-298. DOI: 10.2307/1391660.
[3] Efron, B. (1979). Bootstrap Methods: Another Look at the Jackknife.
Annals of Statistics, 7(1): 1-26. DOI:10.1214/aos/1176344552.
[4] Efron, B., & Tibshirani, R. (1993). An Introduction to the Bootstrap.
New York: Chapman and Hall.