A helper function that exposes the adjacency matrix A, normalized
graph Laplacian L, and regularized graph Laplacian L_tau to
model formulas for convenient network regression. Primarily designed
to work with tidygraph::tbl_graph() objects, but can also be used
with a matrix representation of a graph together with a data.frame()
of nodal covariates.
Arguments
- formula
A regression formula that can include ase_specials and vsp_specials, which encode node embeddings. Data for non- embedding terms can come from the global environment,
data, or can be named attributes of anigraphobject. It is likely most convenient and intuitive to but nodal covariates in thenodestable of atidygraph::tbl_graph()object to expose nodal data. See reddit, addhealth and smoking for examples.- graph
An optional
igraph::graph()ortidygraph::tbl_graph()object. If specified, the graph adjacency matrixA, normalized graph LaplacianL, and regularized graph LaplacianL_tauare injected into the environment of formula, so these matrices may be used freely informula. Seeigraph::as_adjacency_matrix()for details about the construction ofA, andinvertiforms::NormalizedLaplacian()andinvertiforms::RegularizedLaplacian()for details about the construction ofLandL_tau. Note that you can also use node embeddings based on arbitrary matrix representations of a graph--see the examples.- data
A
data.frame()with one row for each node in the graph.- attr
Either
NULLor a character string giving an edge attribute name. IfNULLa traditional adjacency matrix is returned. If notNULLthen the values of the given edge attribute are included in the adjacency matrix. If the graph has multiple edges, the edge attribute of an arbitrarily chosen edge (for the multiple edges) is included. This argument is ignored ifedgesisTRUE.Note that this works only for certain attribute types. If the
sparseargumen isTRUE, then the attribute must be either logical or numeric. If thesparseargument isFALSE, then character is also allowed. The reason for the difference is that theMatrixpackage does not support character sparse matrices yet.- ...
Arguments passed on to
estimatr::lm_robustweightsthe bare (unquoted) names of the weights variable in the supplied data.
subsetAn optional bare (unquoted) expression specifying a subset of observations to be used.
clustersAn optional bare (unquoted) name of the variable that corresponds to the clusters in the data.
fixed_effectsAn optional right-sided formula containing the fixed effects that will be projected out of the data, such as
~ blockID. Do not pass multiple-fixed effects with intersecting groups. Speed gains are greatest for variables with large numbers of groups and when using "HC1" or "stata" standard errors. See 'Details'.se_typeThe sort of standard error sought. If
clustersis not specified the options are "HC0", "HC1" (or "stata", the equivalent), "HC2" (default), "HC3", or "classical". Ifclustersis specified the options are "CR0", "CR2" (default), or "stata". Can also specify "none", which may speed up estimation of the coefficients.cilogical. Whether to compute and return p-values and confidence intervals, TRUE by default.
alphaThe significance level, 0.05 by default.
return_vcovlogical. Whether to return the variance-covariance matrix for later usage, TRUE by default.
try_choleskylogical. Whether to try using a Cholesky decomposition to solve least squares instead of a QR decomposition, FALSE by default. Using a Cholesky decomposition may result in speed gains, but should only be used if users are sure their model is full-rank (i.e., there is no perfect multi-collinearity)
Value
An object of class lm_robust. See estimatr::lm_robust() for
details.
Examples
### some examples where data is specified as a tidygraph
data(addhealth, package = "latentnetmediate")
data(smoking, package = "latentnetmediate")
# a regression that does not use any node embeddings
nodelm_robust(grade ~ sex, graph = addhealth[[36]])
#> Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper
#> (Intercept) 9.79574861 0.04440072 220.6213812 0.0000000 9.708676 9.88282122
#> sexmale -0.04574861 0.06299104 -0.7262717 0.4677509 -0.169278 0.07778078
#> DF
#> (Intercept) 2160
#> sexmale 2160
# a regression including left and right singular embeddings of
# the adjacency matrix and the normalized graph Laplacian
nodelm_robust(grade ~ sex + U(A, 5) + V(L, 3), graph = addhealth[[36]])
#> Estimate Std. Error t value Pr(>|t|) CI Lower
#> (Intercept) 9.87934123 0.03360119 294.0176290 0.000000e+00 9.8134471
#> sexmale -0.03804066 0.04422524 -0.8601570 3.897983e-01 -0.1247693
#> U(A, 5)1 -2.27029557 0.58329181 -3.8922123 1.023563e-04 -3.4141699
#> U(A, 5)2 0.33828108 1.24915054 0.2708089 7.865640e-01 -2.1113868
#> U(A, 5)3 -14.13248367 0.67382758 -20.9734419 5.097222e-89 -15.4539047
#> U(A, 5)4 1.85826748 0.46787645 3.9717055 7.370557e-05 0.9407304
#> U(A, 5)5 5.56795840 1.32618364 4.1984822 2.796484e-05 2.9672235
#> V(L, 3)1 0.52627629 1.64213265 0.3204834 7.486330e-01 -2.6940558
#> V(L, 3)2 0.85441741 1.53932913 0.5550583 5.789125e-01 -2.1643101
#> V(L, 3)3 47.54782188 1.54149212 30.8453227 2.555583e-173 44.5248526
#> CI Upper DF
#> (Intercept) 9.94523541 2152
#> sexmale 0.04868801 2152
#> U(A, 5)1 -1.12642127 2152
#> U(A, 5)2 2.78794893 2152
#> U(A, 5)3 -12.81106267 2152
#> U(A, 5)4 2.77580454 2152
#> U(A, 5)5 8.16869331 2152
#> V(L, 3)1 3.74660835 2152
#> V(L, 3)2 3.87314490 2152
#> V(L, 3)3 50.57079113 2152
nodelm_robust(as.integer(smokes) ~ sex + U(A, 5) , graph = smoking)
#> Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper
#> (Intercept) 0.96024972 0.09895098 9.7042975 1.083297e-17 0.7647832 1.1557162
#> sexmale 0.40650863 0.07779719 5.2252357 5.533918e-07 0.2528291 0.5601882
#> U(A, 5)1 1.57251864 1.50442927 1.0452593 2.975305e-01 -1.3993116 4.5443489
#> U(A, 5)2 0.64105255 0.52974082 1.2101249 2.280732e-01 -0.4053907 1.6874958
#> U(A, 5)3 -0.17947427 0.40233025 -0.4460869 6.561570e-01 -0.9742323 0.6152837
#> U(A, 5)4 -0.09438519 0.33939577 -0.2780977 7.813080e-01 -0.7648232 0.5760529
#> U(A, 5)5 1.10853421 0.33943585 3.2658135 1.343659e-03 0.4380170 1.7790514
#> DF
#> (Intercept) 155
#> sexmale 155
#> U(A, 5)1 155
#> U(A, 5)2 155
#> U(A, 5)3 155
#> U(A, 5)4 155
#> U(A, 5)5 155
library(Matrix)
library(tidygraph)
B <- igraph::as_adjacency_matrix(addhealth[[36]], attr = "weight")
node <- addhealth[[36]] |>
as_tibble() |>
mutate(level = rowSums(B))
node[5, "sex"] <- NA
node
#> # A tibble: 2,209 × 5
#> sex race grade school level
#> <fct> <fct> <int> <fct> <dbl>
#> 1 male hispanic 10 B 5
#> 2 male hispanic 10 B 0
#> 3 female hispanic 9 B 0
#> 4 male hispanic 10 B 19
#> 5 NA hispanic 9 B 16
#> 6 female hispanic 11 B 0
#> 7 female hispanic 9 B 7
#> 8 male white 9 B 19
#> 9 male white 11 B 8
#> 10 female white 10 B 6
#> # ℹ 2,199 more rows
fit <- nodelm_robust(level ~ sex + grade + race + U(sign(B), 10), data = node)
summary(fit)
#>
#> Call:
#> estimatr::lm_robust(formula = formula, data = data)
#>
#> Standard error type: HC2
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper
#> (Intercept) 3.70846 1.5597 2.3776 1.751e-02 0.6497 6.7672
#> sexmale -2.87030 0.3345 -8.5816 1.771e-17 -3.5262 -2.2144
#> grade 0.48221 0.1375 3.5081 4.607e-04 0.2126 0.7518
#> raceblack 0.09635 0.8779 0.1098 9.126e-01 -1.6252 1.8179
#> racehispanic 1.66458 0.7881 2.1121 3.480e-02 0.1190 3.2102
#> racemixed/other 2.48689 1.0766 2.3099 2.099e-02 0.3756 4.5982
#> racewhite 1.43935 0.8365 1.7207 8.545e-02 -0.2011 3.0797
#> U(sign(B), 10)1 194.77884 9.1004 21.4032 3.198e-92 176.9322 212.6255
#> U(sign(B), 10)2 119.71510 9.2763 12.9055 9.586e-37 101.5236 137.9066
#> U(sign(B), 10)3 185.53943 8.8289 21.0150 2.911e-89 168.2253 202.8536
#> U(sign(B), 10)4 -89.70215 8.2256 -10.9052 5.536e-27 -105.8333 -73.5710
#> U(sign(B), 10)5 -76.51199 7.7617 -9.8577 1.902e-22 -91.7332 -61.2908
#> U(sign(B), 10)6 -155.05165 10.6856 -14.5104 1.437e-45 -176.0069 -134.0964
#> U(sign(B), 10)7 19.97444 6.5063 3.0700 2.168e-03 7.2151 32.7338
#> U(sign(B), 10)8 89.03533 11.4875 7.7506 1.405e-14 66.5074 111.5633
#> U(sign(B), 10)9 -8.72362 9.0506 -0.9639 3.352e-01 -26.4725 9.0252
#> U(sign(B), 10)10 -28.35095 8.0183 -3.5358 4.153e-04 -44.0755 -12.6264
#> DF
#> (Intercept) 2132
#> sexmale 2132
#> grade 2132
#> raceblack 2132
#> racehispanic 2132
#> racemixed/other 2132
#> racewhite 2132
#> U(sign(B), 10)1 2132
#> U(sign(B), 10)2 2132
#> U(sign(B), 10)3 2132
#> U(sign(B), 10)4 2132
#> U(sign(B), 10)5 2132
#> U(sign(B), 10)6 2132
#> U(sign(B), 10)7 2132
#> U(sign(B), 10)8 2132
#> U(sign(B), 10)9 2132
#> U(sign(B), 10)10 2132
#>
#> Multiple R-squared: 0.4453 , Adjusted R-squared: 0.4411
#> F-statistic: 91.57 on 16 and 2132 DF, p-value: < 2.2e-16