public class SurfaceGwr extends Object
A Note on the Suitability of This Implementation: Anyone who values his own time should respect the time of others. With that regard, I believe it appropriate to make this note about the current state of the Tinfour GWR implementation. While I believe that code is implemented correctly, it is not complete. Statistics such as R values and F scores are not yet available. The Tinfour GWR classes also lacks tools for detecting multi-collinearities in model coefficients. These classes were developed with a specific application in mind: the modeling of terrain and bathymetry. And while they can be applied to many other problems, potential users should consider whether the tool is suitable to their particular requirements.
Usage Notes: This class is optimized for the computation of values at specific points in which a set of irregularly spaced sample points are available in the vicinity of the point of interest. Each value calculation involves a separate regression operation.
Regression techniques are used as a way of dealing with uncertainty in the observed values passed into the calculation. As such, they provide methods for evaluating the uncertainty in the computed results. In terrain-based applications, it is common to treat points nearest a query position as more significant than those farther away (in accordance to the precept of "spatial autocorrelation"). In order to support that, a criterion for inverse-distance weighting of the regression is provided based on the Gaussian Kernel described in the references cited below.
Given a set of sample points in the vicinity of the point of interest,(x,y), the class solves for the coefficients treating (x,y) as the origin. Thus algebraic simplifications result in a case where the parameter b0 gives the surface height at point (x,y). Furthermore, the ordering of coefficients is specified for all models so that the coefficients b1 and b2 give the partial derivatives of the surface when evaluated at (x,y). Parameter b1 is the partial derivative with respect to x, b2 with respect to y. Applications requiring information about slope or surface normal may do so by using this construct.
The calculations used to derive regression coefficients are adapted from "Probability and Statistics for Engineers and Scientists (4th ed.)", 1989 by Ronald E. Walpole and Raymond H. Myers, Macmillan Publishing Company, New York City, NY Chapter 10, "Multiple Linear Regression". Walpole and Myers provide an excellent introduction to the problem of multiple linear regression, its general statistics (particularly the prediction interval), and their use. The calculations for weighted regression are not covered in their work, but were derived from the information they provided. Because these calculations are not taken from published literature, they have not been vetted by expert statisticians.
Details of the residual variance and other statistics specific to a weighted regression are taken from Leung, Yee; Mei, Chang-Lin; and Zhang, Wen-Xiu (2000). "Statistical tests for spatial nonstationarity based on the geographically weighted regression model", Environment and Planning A 2000, volumn 32, p. 9-32.
Information related to the AICc criteria as applied to a GWR was found in "Geographically Weighted Regression" by David C Wheeler and Antonio Paez, a white paper I found on the web. It appears to be a chapter from "Handbook of Applied Spatial Analysis: Software Tool, Methods, and Applications", Springer Verlag, Berlin (2010). I also found information in Charlton, M. and Fotheringham, A. (2009) "Geographically Weighted Regression -- White Paper", National Center for Geocomputation, National University of Ireland Maynooth, a white paper downloaded from the web. A number of other papers by Brunsdon, Fotheringham and Charlton (BRC) which provide slightly different perspectives on the same material can be found on the web.
A Note on Safe Coding: This class maintains references to its most recent inputs as member elements. For efficiency purposes, it does not make copies of the input arrays, but uses them directly. Therefore, it is imperative that the calling application not modify these elements until it is done with the results from a computation. Also, some of the getter methods in the class expose internal arrays. So the results obtained from these methods should not be modified by application code but are subject to modification by subsequent interpolation operations. While approach violates well-known safe-coding practices, it is necessary in this case for efficiency reasons. Instances of this class are often used in raster-processing operations that require millions of interpolations in tight loops where the overhead of repeatedly creating arrays would be detrimental to processing.
Development Notes
The current implementation of this class supports a family of surface
models based on polynomials p(x, y) of order 3 or less. While this approach
is appropriate for the original intent of this class, modeling terrain,
there is no reason why the class cannot be adapted to support arbitrary
models.
Originally, I felt that users interested in other problems might
be better served by R, GWR4, or even the Apache Commons Math
GSLMultipleLinearRegression class. But this implementation has
demonstrated sufficient utility, that it may be worth considering
expanding its capabilities in future development.
One of the special considerations in terrain modeling is "mass production". Creating a raster grid from unstructured data can involve literally millions of interpolation operations. The design of this class reflects that requirement. In particular, it featured the reuse of Java objects and arrays to avoid the cost of constructing or allocating new instances. However, recent improvements in Java's handling of short-persistence objects (through escape analysis) have made some of these considerations less pressing. So future work may not be coupled to the same approach as the existing implementation.
| Constructor and Description |
|---|
SurfaceGwr()
Standard constructor.
|
| Modifier and Type | Method and Description |
|---|---|
void |
clear()
Clear all state variables and external references that may have
been set in previous operations.
|
double[] |
computeRegression(SurfaceModel model,
double xQuery,
double yQuery,
int nSamples,
double[][] samples,
double[] weights,
double[][] sampleWeightsMatrix)
Computes the elevation for a point at the specified query
coordinates, by performing a multiple-linear regression using the
observed values.
|
void |
computeVarianceAndHat()
Compute the variance and hat matrix for the sample data.
|
org.apache.commons.math3.linear.RealMatrix |
computeXWX(double xQuery,
double yQuery,
int nSamples,
double[][] samples,
double[] weights) |
double |
getAdjustedR2()
Gets the adjusted R2 value.
|
double |
getAICc()
Get the Akaike information criterion (corrected) organized so that the
minimum value is preferred.
|
double[] |
getCoefficients()
Gets the computed polynomial coefficients from the regression
(the "beta" parameters that).
|
int |
getDegreesOfFreedom()
Gets the number of degrees of freedom for the most recent computation
based on a ordinary least squares treatment (weighting neglected)
|
double |
getEffectiveDegreesOfFreedom()
Get the effective degrees of freedom for the a chi-squared distribution
which approximates the distribution of the GWR.
|
double |
getEstimatedValue(double xQuery,
double yQuery) |
double |
getF()
Gets the F statistic for the regression result which may be used in
hypothesis testing for evaluating the regression.
|
org.apache.commons.math3.linear.RealMatrix |
getHatMatrix() |
double |
getLeungDelta1()
Get leung's delta parameter
|
double |
getLeungDelta2()
Get Leung's delta2 parameter.
|
int |
getMinimumRequiredSamples(SurfaceModel sm)
Get the minimum number of samples required to perform a
regression for the specified surface model
|
SurfaceModel |
getModel()
Get the surface model associated with this instance.
|
double[] |
getPredictionInterval(double alpha)
Gets the prediction interval at the interpolation coordinates
on the observed response for the most recent call to computeRegression.
|
double |
getPredictionIntervalHalfRange(double alpha)
Gets a value equal to one half of the range of the prediction interval
on the observed response at the interpolation coordinates for the
most recent call to computeRegression().
|
double[] |
getQueryCoordinates()
Get the coordinates used for the initial query
|
double |
getR2()
Get the r-squared value, the coefficient of multiple regression.
|
double[] |
getResiduals()
Gets the residuals from the most recent regression calculation.
|
double |
getResidualSumOfTheSquares()
Gets the residual sum of the squared errors (residuals) for
the predicted versus the observed values at the sample locations.
|
int |
getSampleCount()
Gets the number of samples from the most recent computation
|
double[][] |
getSamples()
Gets the samples from the most recent computation.
|
double |
getSigmaML()
Gets the ML Sigma value used in the AICc calculation.
|
double |
getStandardDeviation()
Gets an unbiased estimate of the the standard deviation
of the residuals for the predicted values for all samples.
|
double |
getVariance()
Gets an unbiased estimate of the variance of the residuals
for the predicted values for all samples.
|
double[] |
getWeights()
Gets an array of weights from the most recent computation.
|
void |
initWeightsMatrixUsingGaussianKernel(double[][] samples,
int nSamples,
double bandwidth,
double[][] matrix)
Initializes a square matrix of weights based on the distance between
samples using the Gaussian kernel.
|
void |
initWeightsUsingGaussianKernel(double x,
double y,
double[][] samples,
int nSamples,
double bandwidth,
double[] weights)
Initializes an array of weights based on the distance of samples
from a specified pair of coordinates by using the Gaussian kernel.
|
boolean |
isSampleWeightsMatrixSet()
Indicates whether a sample weights matrix was set.
|
void |
printSummary(PrintStream ps)
Print a summary of the parameters and correlation results for
the most recent interpolation.
|
void |
setSampleWeightsMatrix(double[][] sampleWeightsMatrix)
Allows an application to set the sample weights matrix.
|
String |
toString() |
public double[] computeRegression(SurfaceModel model, double xQuery, double yQuery, int nSamples, double[][] samples, double[] weights, double[][] sampleWeightsMatrix)
Note: For efficiency purposes, the arrays for samples and weights passed to this method are stored in the class directly.
The sample weights matrix is a two dimensional array giving weights based on the distance between samples. It is used when performing calculations for general statistics such as standard deviation, confidence intervals, etc. Because of the high cost of initializing this array, it can be treated as optional in cases where only the regression coefficients are required.
A convenience routine for populating the sample weights matrix is supplied by the initWeightsUsingGaussianKernal method.
model - the model to be used for the regressionxQuery - x coordinate of the query positionyQuery - y coordinate of the query positionnSamples - the number of sample points to be used for regressionsamples - an array of dimension [n][3] giving at least nSamples
points with the x, y, and z values for the regression.weights - an array of weighting factors for samplessampleWeightsMatrix - an optional array of weights based on the
distances between different samples; if general statistics are
not required, pass a null value for this argument.public org.apache.commons.math3.linear.RealMatrix computeXWX(double xQuery,
double yQuery,
int nSamples,
double[][] samples,
double[] weights)
public void computeVarianceAndHat()
org.apache.commons.math3.linear.SingularMatrixException - if the data gives rise to an
unsolvable or numerically ill-conditioned matrix.public void printSummary(PrintStream ps)
ps - a valid print stream to receive the output of this method.public double[] getCoefficients()
public double getR2()
public double getAdjustedR2()
public double getF()
public double getVariance()
public double getStandardDeviation()
public double getSigmaML()
public double getResidualSumOfTheSquares()
public double getLeungDelta1()
public double getLeungDelta2()
public double getEffectiveDegreesOfFreedom()
The definition of this method is based on Leung (2000).
public double getPredictionIntervalHalfRange(double alpha)
alpha - the significance level (typically 0..05, etc).public double[] getPredictionInterval(double alpha)
alpha - the significance level (typically 0..05, etc).public int getDegreesOfFreedom()
public org.apache.commons.math3.linear.RealMatrix getHatMatrix()
public final int getMinimumRequiredSamples(SurfaceModel sm)
sm - the surface model to be evaluatedpublic double[] getQueryCoordinates()
public SurfaceModel getModel()
public double getAICc()
public double getEstimatedValue(double xQuery,
double yQuery)
public void clear()
public double[] getResiduals()
public double[][] getSamples()
public double[] getWeights()
public int getSampleCount()
public void initWeightsUsingGaussianKernel(double x,
double y,
double[][] samples,
int nSamples,
double bandwidth,
double[] weights)
If Double.POSITIVE_INFINITY is passed as the bandwidth parameter, all weights will be set uniformly to 1.0, which would be equivalent to an Ordinary Least Squares regression.
x - the coordinate of the query pointy - the coordinate of the query pointsamples - a two dimensional array giving (x,y) coordinates of the
samplesnSamples - the number of samplesbandwidth - the bandwidth parameterweights - an array to store the resulting weightspublic void initWeightsMatrixUsingGaussianKernel(double[][] samples,
int nSamples,
double bandwidth,
double[][] matrix)
samples - a two dimensional array giving (x,y) coordinates of the
samplesnSamples - the number of samplesbandwidth - the bandwidth parameter specificationmatrix - a square matrix of dimension nSamples to store the
computed weights.public boolean isSampleWeightsMatrixSet()
public void setSampleWeightsMatrix(double[][] sampleWeightsMatrix)
sampleWeightsMatrix - a valid two dimensional array dimensions
to the same size as the number of samples.Copyright © 2019. All rights reserved.