lu-regression.core

-main

(-main & args)
Runs multiple linear regression model with the unknown
joint distribution of error terms using OLS and resampling.

Saves results into the bunch of csv files
in the root execution directory.

Arguments: path
           n-rep


Original sample is read from the csv file by the [path] address.
The first row should contain variable labels.
The first column contains values of the response y.


Model:
       y=Xb+eps,
       eps ~ F(0,sigma^2).

       y: [n x 1] vector of the response.
       X: [n x (p+1)] matrix of the explanatory variables.
       b: [(p+1) x 1] vector of the regression coefficients.
       eps: [n x 1] vector of the independent and
            identically distributed errors with common distribution F
            having mean 0 and finite variance sigma^2.
       n: number of observations.
       p: number of explanatory variables in the input file.

Assumptions: error terms are independent and identically distributed.

Regression coefficients are estimated using ordinary least squares (OLS).


Hypothesis testing (permutation tests):
   output: lu-regression/permutation_tests.csv
           lu-regression/permutation_r2_sample.csv

   1) Overall model significance - exact permutation test on R-square.
       H0: b_1=b_2=...=b_p=0.

       out: approximate p-value, calculated after [n-rep] permutations
             with 95%-normal approximation interval.

   2) Significance of the i-th coefficient - approximate permutation
      test (Freedman & Lane, 1983) on t-statistic.
       H0: b_i=0.

       out: approximate p-value, calculated after [n-rep] permutations
             with 95%-normal approximation interval.


Confidence intervals (bootstrapping):
   output: lu-regression/regression-stat-bootstrap.csv

   estimates:
           b_0, b_1, ..., b_n;
           R-square, MSE (mean square error).
   out: mean with 95% percentile confidence interval.

   bootstrap scheme: percentile bootstrap (Efron & Tibshirani, 1993),
                     left border - value at position of the largest
                     integer not greater than alpha/2*[n-rep],
                     right border - value at position of the smallest
                     integer not less than (1-alpha/2)*[n-rep].

   confidence level: alpha=0.05.


## Usage

     $ lein run sample-all.csv 10000
     $ lein run sample-x1-x2-x3-x9-x10.csv 10000
     $ lein run sample-x1-x2-x3.csv 10000


## References:
     [1] Anderson, M. (2001). Permutation tests for univariate or multivariate analysis of variance and regression.
         Canadian Journal of Fisheries and Aquatic Sciences, 58(3): 626-639. DOI: 10.1139/f01-004.
     [2] Freedman, D., & Lane, D. (1983). A Nonstochastic Interpretation of Reported Significance Levels.
         Journal of Business & Economic Statistics, 1(4): 292-298. DOI: 10.2307/1391660.
     [3] Efron, B. (1979). Bootstrap Methods: Another Look at the Jackknife.
         Annals of Statistics, 7(1): 1-26. DOI:10.1214/aos/1176344552.
     [4] Efron, B., & Tibshirani, R. (1993). An Introduction to the Bootstrap.
         New York: Chapman and Hall.