A helper function that exposes the adjacency matrix A, normalized
graph Laplacian L, and regularized graph Laplacian L_tau to
model formulas for convenient network regression. Primarily designed
to work with tidygraph::tbl_graph() objects, but can also be used
with a matrix representation of a graph together with a data.frame()
of nodal covariates.
Arguments
- formula
A regression formula that can include ase_specials and vsp_specials, which encode node embeddings. Data for non- embedding terms can come from the global environment,
data, or can be named attributes of anigraphobject. It is likely most convenient and intuitive to but nodal covariates in thenodestable of atidygraph::tbl_graph()object to expose nodal data. See reddit, addhealth and smoking for examples.- graph
An optional
igraph::graph()ortidygraph::tbl_graph()object. If specified, the graph adjacency matrixA, normalized graph LaplacianL, and regularized graph LaplacianL_tauare injected into the environment of formula, so these matrices may be used freely informula. Seeigraph::as_adjacency_matrix()for details about the construction ofA, andinvertiforms::NormalizedLaplacian()andinvertiforms::RegularizedLaplacian()for details about the construction ofLandL_tau. Note that you can also use node embeddings based on arbitrary matrix representations of a graph--see the examples.- data
A
data.frame()with one row for each node in the graph.- attr
Either
NULLor a character string giving an edge attribute name. IfNULLa traditional adjacency matrix is returned. If notNULLthen the values of the given edge attribute are included in the adjacency matrix. If the graph has multiple edges, the edge attribute of an arbitrarily chosen edge (for the multiple edges) is included. This argument is ignored ifedgesisTRUE.Note that this works only for certain attribute types. If the
sparseargumen isTRUE, then the attribute must be either logical or numeric. If thesparseargument isFALSE, then character is also allowed. The reason for the difference is that theMatrixpackage does not support character sparse matrices yet.- ...
Arguments passed on to
stats::lmsubsetan optional vector specifying a subset of observations to be used in the fitting process. (See additional details about how this argument interacts with data-dependent bases in the ‘Details’ section of the
model.framedocumentation.)weightsan optional vector of weights to be used in the fitting process. Should be
NULLor a numeric vector. If non-NULL, weighted least squares is used with weightsweights(that is, minimizingsum(w*e^2)); otherwise ordinary least squares is used. See also ‘Details’,na.actiona function which indicates what should happen when the data contain
NAs. The default is set by thena.actionsetting ofoptions, and isna.failif that is unset. The ‘factory-fresh’ default isna.omit. Another possible value isNULL, no action. Valuena.excludecan be useful.methodthe method to be used; for fitting, currently only
method = "qr"is supported;method = "model.frame"returns the model frame (the same as withmodel = TRUE, see below).model,x,y,qrlogicals. If
TRUEthe corresponding components of the fit (the model frame, the model matrix, the response, the QR decomposition) are returned.singular.oklogical. If
FALSE(the default in S but not in R) a singular fit is an error.contrastsan optional list. See the
contrasts.argofmodel.matrix.default.offsetthis can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be
NULLor a numeric vector or matrix of extents matching those of the response. One or moreoffsetterms can be included in the formula instead or as well, and if more than one are specified their sum is used. Seemodel.offset.
Value
An object of class lm. See stats::lm() for
details.
Examples
data(addhealth, package = "latentnetmediate")
data(smoking, package = "latentnetmediate")
### some examples where data is specified as a tidygraph
# a regression that does not use any node embeddings
nodelm(grade ~ sex, graph = addhealth[[36]])
#>
#> Call:
#> stats::lm(formula = formula, data = data)
#>
#> Coefficients:
#> (Intercept) sexmale
#> 9.79575 -0.04575
#>
# a regression including left and right singular embeddings of
# the adjacency matrix and the normalized graph Laplacian
nodelm(grade ~ sex + U(A, 5) + V(L, 3), graph = addhealth[[36]])
#>
#> Call:
#> stats::lm(formula = formula, data = data)
#>
#> Coefficients:
#> (Intercept) sexmale U(A, 5)1 U(A, 5)2 U(A, 5)3 U(A, 5)4
#> 9.87934 -0.03804 -2.27030 0.33828 -14.13248 1.85827
#> U(A, 5)5 V(L, 3)1 V(L, 3)2 V(L, 3)3
#> 5.56796 0.52628 0.85442 47.54782
#>
nodelm(as.integer(smokes) ~ sex + U(A, 5) , graph = smoking)
#>
#> Call:
#> stats::lm(formula = formula, data = data)
#>
#> Coefficients:
#> (Intercept) sexmale U(A, 5)1 U(A, 5)2 U(A, 5)3 U(A, 5)4
#> 0.96025 0.40651 1.57252 0.64105 -0.17947 -0.09439
#> U(A, 5)5
#> 1.10853
#>
library(Matrix)
library(tidygraph)
B <- igraph::as_adjacency_matrix(addhealth[[36]], attr = "weight")
node <- addhealth[[36]] |>
as_tibble() |>
mutate(level = rowSums(B))
node[5, "sex"] <- NA
node
#> # A tibble: 2,209 × 5
#> sex race grade school level
#> <fct> <fct> <int> <fct> <dbl>
#> 1 male hispanic 10 B 5
#> 2 male hispanic 10 B 0
#> 3 female hispanic 9 B 0
#> 4 male hispanic 10 B 19
#> 5 NA hispanic 9 B 16
#> 6 female hispanic 11 B 0
#> 7 female hispanic 9 B 7
#> 8 male white 9 B 19
#> 9 male white 11 B 8
#> 10 female white 10 B 6
#> # ℹ 2,199 more rows
fit <- nodelm(level ~ sex + grade + race + U(sign(B), 10), data = node)
summary(fit)
#>
#> Call:
#> stats::lm(formula = formula, data = data)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -29.221 -6.024 -1.228 4.687 34.829
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 3.70846 1.62265 2.285 0.022385 *
#> sexmale -2.87030 0.33382 -8.598 < 2e-16 ***
#> grade 0.48221 0.14445 3.338 0.000858 ***
#> raceblack 0.09635 0.94038 0.102 0.918404
#> racehispanic 1.66458 0.79837 2.085 0.037191 *
#> racemixed/other 2.48689 1.11826 2.224 0.026260 *
#> racewhite 1.43935 0.84573 1.702 0.088919 .
#> U(sign(B), 10)1 194.77884 8.38885 23.219 < 2e-16 ***
#> U(sign(B), 10)2 119.71510 8.71828 13.732 < 2e-16 ***
#> U(sign(B), 10)3 185.53943 8.51792 21.782 < 2e-16 ***
#> U(sign(B), 10)4 -89.70215 7.91825 -11.329 < 2e-16 ***
#> U(sign(B), 10)5 -76.51199 9.25422 -8.268 2.37e-16 ***
#> U(sign(B), 10)6 -155.05165 8.64150 -17.943 < 2e-16 ***
#> U(sign(B), 10)7 19.97444 8.13131 2.456 0.014110 *
#> U(sign(B), 10)8 89.03533 8.31167 10.712 < 2e-16 ***
#> U(sign(B), 10)9 -8.72362 7.86190 -1.110 0.267294
#> U(sign(B), 10)10 -28.35095 8.06069 -3.517 0.000445 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Residual standard error: 7.711 on 2132 degrees of freedom
#> (60 observations deleted due to missingness)
#> Multiple R-squared: 0.4453, Adjusted R-squared: 0.4411
#> F-statistic: 106.9 on 16 and 2132 DF, p-value: < 2.2e-16
#>