Title: | Functional Conditional Independence Testing with the GHCM |
---|---|
Description: | A statistical hypothesis test for conditional independence. Given residuals from a sufficiently powerful regression, it tests whether the covariance of the residuals is vanishing. It can be applied to both discretely-observed functional data and multivariate data. Details of the method can be found in Anton Rask Lundborg, Rajen D. Shah and Jonas Peters (2022) <doi:10.1111/rssb.12544>. |
Authors: | Anton Rask Lundborg [aut, cre], Rajen D. Shah [aut], Jonas Peters [aut] |
Maintainer: | Anton Rask Lundborg <[email protected]> |
License: | MIT + file LICENSE |
Version: | 3.0.1 |
Built: | 2025-03-07 05:06:31 UTC |
Source: | https://github.com/arlundborg/ghcm |
To learn more about ghcm, start with the vignette: 'browseVignettes(package = "ghcm")'
A simulated dataset containing a combination of functional and scalar variables. Y_1 and Y_2 are scalar random variables and are both functions of Z. X, Z and W are functional, Z is a function of X and W is a function of Z.
ghcm_sim_data ghcm_sim_data_irregular
ghcm_sim_data ghcm_sim_data_irregular
ghcm_sim_data
is a data frame with 500 rows of 5 variables:
Numeric vector.
Numeric vector.
500 x 101 matrix.
500 x 101 matrix.
500 x 101 matrix.
ghcm_sim_data_irregular
is a list with 5 elements:
Numeric vector.
Numeric vector.
500 x 101 matrix.
A data frame with
Integer between 1 and 500 indicating which curve the row corresponds to.
Function argument that the curve is evaluated at.
Value of the function.
A data frame with
Integer between 1 and 500 indicating which curve the row corresponds to.
Function argument that the curve is evaluated at.
Value of the function.
In ghcm_sim_data
the functional variables each consists of 101 observations on
an equidistant grid on [0, 1].
In ghcm_sim_data_irregular
the functional variables X and W are instead only observed on
a subsample of the original equidistant grid.
The generation script can be found in the data-raw
folder of
the package.
Test whether X is independent of Y given Z using the Generalised Hilbertian Covariance Measure. The function is applied to residuals from regressing each of X and Y on Z respectively. Its validity is contingent on the performance of the regression methods. For a more in-depth explanation see the package vignette or the paper mentioned in the references.
ghcm_test( resid_X_on_Z, resid_Y_on_Z, X_limits = NULL, Y_limits = NULL, alpha = 0.05 )
ghcm_test( resid_X_on_Z, resid_Y_on_Z, X_limits = NULL, Y_limits = NULL, alpha = 0.05 )
resid_X_on_Z , resid_Y_on_Z
|
Residuals from regressing X (Y) on Z with a suitable regression method. If X (Y) is uni- or multivariate or functional on a constant, fixed grid, the residuals should be supplied as a vector or matrix with no missing values. If instead X (Y) is functional and observed on varying grids or with missing values, the residuals should be supplied as a "melted" data frame with
Note that in the irregular case, a minimum of 4 observations per curve is required. |
X_limits , Y_limits
|
The minimum and maximum values of the function argument of the X (Y) curves. Ignored if X (Y) is not functional. |
alpha |
Numeric in the unit interval. Significance level of the test. |
An object of class ghcm
containing:
test_statistic
Numeric, test statistic of the test.
p
Numeric in the unit interval, estimated p-value of the test.
alpha
Numeric in the unit interval, significance level of the test.
reject
TRUE
if p
< alpha
, FALSE
otherwise.
Please cite the following paper: Anton Rask Lundborg, Rajen D. Shah and Jonas Peters: "Conditional Independence Testing in Hilbert Spaces with Applications to Functional Data Analysis" Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2022 doi:10.1111/rssb.12544.
if (require(refund)) { set.seed(1) data(ghcm_sim_data) grid <- seq(0, 1, length.out = 101) # Test independence of two scalars given a functional variable m_1 <- pfr(Y_1 ~ lf(Z), data=ghcm_sim_data) m_2 <- pfr(Y_2 ~ lf(Z), data=ghcm_sim_data) ghcm_test(resid(m_1), resid(m_2)) # Test independence of a regularly observed functional variable and a # scalar variable given a functional variable m_X <- pffr(X ~ ff(Z), data=ghcm_sim_data, chunk.size=31000) ghcm_test(resid(m_X), resid(m_1)) # Test independence of two regularly observed functional variables given # a functional variable m_W <- pffr(W ~ ff(Z), data=ghcm_sim_data, chunk.size=31000) ghcm_test(resid(m_X), resid(m_W)) data(ghcm_sim_data_irregular) n <- length(ghcm_sim_data_irregular$Y_1) Z_df <- data.frame(.obs=1:n) Z_df$Z <- ghcm_sim_data_irregular$Z # Test independence of an irregularly observed functional variable and a # scalar variable given a functional variable m_1 <- pfr(Y_1 ~ lf(Z), data=ghcm_sim_data_irregular) m_X <- pffr(X ~ ff(Z), ydata = ghcm_sim_data_irregular$X, data=Z_df, chunk.size=31000) ghcm_test(resid(m_X), resid(m_1), X_limits=c(0, 1)) # Test independence of two irregularly observed functional variables given # a functional variable m_W <- pffr(W ~ ff(Z), ydata = ghcm_sim_data_irregular$W, data=Z_df, chunk.size=31000) ghcm_test(resid(m_X), resid(m_W), X_limits=c(0, 1), Y_limits=c(0, 1)) }
if (require(refund)) { set.seed(1) data(ghcm_sim_data) grid <- seq(0, 1, length.out = 101) # Test independence of two scalars given a functional variable m_1 <- pfr(Y_1 ~ lf(Z), data=ghcm_sim_data) m_2 <- pfr(Y_2 ~ lf(Z), data=ghcm_sim_data) ghcm_test(resid(m_1), resid(m_2)) # Test independence of a regularly observed functional variable and a # scalar variable given a functional variable m_X <- pffr(X ~ ff(Z), data=ghcm_sim_data, chunk.size=31000) ghcm_test(resid(m_X), resid(m_1)) # Test independence of two regularly observed functional variables given # a functional variable m_W <- pffr(W ~ ff(Z), data=ghcm_sim_data, chunk.size=31000) ghcm_test(resid(m_X), resid(m_W)) data(ghcm_sim_data_irregular) n <- length(ghcm_sim_data_irregular$Y_1) Z_df <- data.frame(.obs=1:n) Z_df$Z <- ghcm_sim_data_irregular$Z # Test independence of an irregularly observed functional variable and a # scalar variable given a functional variable m_1 <- pfr(Y_1 ~ lf(Z), data=ghcm_sim_data_irregular) m_X <- pffr(X ~ ff(Z), ydata = ghcm_sim_data_irregular$X, data=Z_df, chunk.size=31000) ghcm_test(resid(m_X), resid(m_1), X_limits=c(0, 1)) # Test independence of two irregularly observed functional variables given # a functional variable m_W <- pffr(W ~ ff(Z), ydata = ghcm_sim_data_irregular$W, data=Z_df, chunk.size=31000) ghcm_test(resid(m_X), resid(m_W), X_limits=c(0, 1), Y_limits=c(0, 1)) }
Computes the matrix of L2 inner products of the splines given in list_of_splines as produced by splines::interpSpline. The splines are assumed to be functions on the interval [from, to].
inner_product_matrix_splines(list_of_splines, from, to)
inner_product_matrix_splines(list_of_splines, from, to)
list_of_splines |
list of interpSpline objects. |
from , to
|
limits of integration. |
matrix of inner products.