egc.model.node_embedding package

Subpackages

egc.model.node_embedding.structure_learning package

Submodules

egc.model.node_embedding.SENet module

SENet Kmeans

egc.model.node_embedding.SENet.get_improved_graph(adj: ndarray, lam: float) → ndarray[source]

Get adjacency matrix of the improved graph.

Parameters:

adj (np.ndarray) – the adjacency matrix of graph.
lam (float) – hyper-parameters.

Returns:

improved graph.

Return type:

np.ndarray

\(S=|N(v_i)∩N(v_j)|min{N(vi),N(v_j)}\)

\(S'_{ij} = S_{ij} >= min{S_{iq} | V q ∈ N(v_i)} ? S_{ij} : 0\)

\(A'=A+lamda*S'\)

class egc.model.node_embedding.SENet.SENetEmbed(feature: FloatTensor, labels: IntTensor, adj: array, n_clusters: int, hidden0: int = 16, hidden1: int = 16, lr: float = 0.03, epochs: int = 50, weight_decay: float = 0.0, lam: float = 1.0, n_iter: int = 3)[source]

Bases: Module

SENet Embedding

Parameters:

feature (FloatTensor) – node’s feature.
labels (IntTensor) – node’s label.
adj (ndarray) – graph’s adjacency matrix
n_clusters (int) – clusters
hidden0 (int,optional) – hidden units size of gnn layer1. Defaults to 16,
hidden1 (int,optional) – hidden units size of gnn layer2. Defaults to 16,,
lr (float,optional) – learning rate. Defaults to 3e-2,
epochs (int,optional) – number of embedding training epochs.Defaults to 50,
weight_decay (float,optional) – weight decay.Defaults to 0.0,
lam (float,optional) – Used for construct improved graph . Defaults to 1.0,
n_iter (int,optional) – the times of convoluting feature . Defaults to 3,
seed (int,optional) – random seed. Defaults to 20.

forward()[source]

Get embedding by three networks

Returns:: (torch.floatTensor, torch.floatTensor, torch.floatTensor) Z1 = tanh(D’^-1 * A’ * X * W1) Z2 = tanh(D’^-1 * A’ * Z1 * W2) F = Z2 * W3 F^T * F = Q * Q^T Z3 = F * (Q^-1)^t

get_imporved_feature(n_iter, features)[source]

Get the improved feature after three convolutions

Parameters:

n_iter (int) – the times of convolution
features (tensor) – origin graph feature

Returns:

(tensor) X’ = (D’^-1 * A’)^3 * X

get_normalized_kernel_martix(feature)[source]

Get kernel martix

Parameters:: features (tensor) – improved graph feature

Returns: (tensor) K = Relu(X’ * X’^T) K = (K + K^T)/2

init_weights()[source]: initial the parameter of networks

get_embedding()[source]

Get kernel martix

Returns: (tensor) Z = [Z1,Z2,Z3]

fit()[source]: train model

training: bool

egc.model.node_embedding.ae module

AE Embedding

class egc.model.node_embedding.ae.AE(n_input: int, n_clusters: int, hidden1: int = 500, hidden2: int = 500, hidden3: int = 2000, hidden4: int = 2000, hidden5: int = 500, hidden6: int = 500, lr: float = 0.0005, epochs: int = 100, n_z: int = 10, activation: str = 'relu', early_stop: int = 20, if_eva: bool = False, if_early_stop: bool = False)[source]

Bases: Module

AutoEncoder Model

Parameters:

n_input (int) – dim of features
n_clusters (int) – cluster num.
hidden1 (int) – hidden units size of encode1.
hidden2 (int) – hidden units size of encode2.
hidden3 (int) – hidden units size of encode3.
hidden4 (int) – hidden units size of decode1.
hidden5 (int) – hidden units size of decode2.
hidden6 (int) – hidden units size of decode3.
lr (float, optional) – learning rate.. Defaults to 0.001.
epochs (int, optional) – number of embedding training epochs. Defaults to 200.
n_z (int) – Number of Z’s dimensions. Default is 20.
activation (str, optional) – activation of gcn layer_1. Defaults to ‘relu’.
early_stop (bool) – steps’ numbers of early stop.
if_eva (bool) – if use kmean to judge the embedding quality.
if_early_stop (bool) – if use early stop.

forward(x)[source]

Forward Propagation

Parameters:: x (torch.Tensor) – node’s features
Returns:: Reconstructed attribute matrix generated by AE decoder z_ae (torch.Tensor):Latent embedding of AE
Return type:: x_hat (torch.Tensor)

fit(data, train_loader, label) → None[source]

Fitting a AE clustering model.

Parameters:

data (torch.Tensor) – node’s features
train_loader (DataLoader) – DataLoader of AE train
label (torch.Tensor) – node’s label

training: bool

class egc.model.node_embedding.ae.AE_encoder(n_input: int, hidden1: int, hidden2: int, hidden3: int, n_z: int, activation: object)[source]

Bases: Module

Encoder for AE

Parameters:: args (argparse.Namespace) – all parameters

forward(x)[source]

Forward Propagation

Parameters:: x (torch.Tensor) – node’s features
Returns:: Latent embedding of AE
Return type:: z_ae (torch.Tensor)

training: bool

class egc.model.node_embedding.ae.AE_decoder(n_input: int, hidden1: int, hidden2: int, hidden3: int, n_z: int, activation: object)[source]

Bases: Module

Decoder for AE

Parameters:: args (argparse.Namespace) – all parameters

training: bool

forward(z_ae)[source]

Forward Propagation

Parameters:: z_ae (torch.Tensor) – Latent embedding of AE
Returns:: Reconstructed attribute matrix generated by AE decoder
Return type:: x_hat (torch.Tensor)

egc.model.node_embedding.agc module

AGC Embedding

class egc.model.node_embedding.agc.AGCEmbed(adj: Tensor, feature: Tensor, labels: Tensor, epochs: int = 60, n_clusters: int = 7, rep: int = 10)[source]

Bases: Module

AGC Embedding

forward()[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

fit()[source]

normalize_adj()[source]

to_onehot(prelabel)[source]

square_dist(prelabel, feature)[source]

get_embedding()[source]

training: bool

egc.model.node_embedding.age module

AGE Model

class egc.model.node_embedding.age.AGE(dims: list | None = None, feat_dim: int | None = None, gnnlayers_num: int = 3, linlayers_num: int = 1, lr: float = 0.001, upth_st: float = 0.0015, upth_ed: float = 0.001, lowth_st: float = 0.1, lowth_ed: float = 0.5, upd: float = 10, bs: int = 10000, epochs: int = 400, norm: str = 'sym', renorm: bool = True, estop_steps: int = 5)[source]

Bases: Module

AGE paper:Adaptive Graph Encoder for Attributed Graph Embedding

Parameters:

dims (list,optional) – Number of units in hidden layer 1.
feat_dim (int,optional) – input feature dimension.
gnnlayers_num (int) – Number of gnn layers
linlayers_num (int, optional) – Number of hidden layers
lr (float, optional) – learning rate.. Defaults to 0.001.
upth_st (float, optional) – Upper Threshold start.
upth_ed (float, optional) – Upper Threshold end.
lowth_st (float, optional) – Lower Threshold start.
lowth_ed (float, optional) – Lower Threshold end.
upd (float, optional) – Update epoch.
bs (int,optional) – Batchsize
epochs (int,optional) – Number of epochs to train.
norm (str,optional) – normalize mode of Laplacian matrix
renorm (bool,optional) – If with the renormalization trick
estop_steps (int,optional) – Number of early_stop steps.

forward(x, y)[source]

Forward Propagation

Parameters:

x (torch.Tensor) – Sample node embedding for x-axis
y (torch.Tensor) – Sample node embedding for y-axis

Returns:

prediction of adj

Return type:

batch_pred (torch.Tensor)

fit(adj: csr_matrix, features: Tensor) → None[source]

Fitting a AGE model

Parameters:

adj (sp.csr_matrix) – 2D sparse adj.
features (torch.Tensor) – features.

get_embedding()[source]

Get cluster embedding.

Returns:tensor.Tensor

training: bool

class egc.model.node_embedding.age.LinTrans(layers, dims)[source]

Bases: Module

Linear Transform Model

Parameters:

layers (int) – number of linear layers.
dims (list) – Number of units in hidden layers.

forward(x)[source]

Forward Propagation

Parameters:: x (torch.Tensor) – feature embedding
Returns:: hiddin embedding
Return type:: out (torch.Tensor)

training: bool

class egc.model.node_embedding.age.SampleDecoder(act=<built-in method sigmoid of type object>)[source]

Bases: Module

Decoder Model , inner dot

Parameters:: activation (object, optional) – activation of Decoder.

forward(zx, zy)[source]

Forward Propagation

Parameters:

zx (torch.Tensor) – Sample node embedding for x-axis
zy (torch.Tensor) – Sample node embedding for y-axis

Returns:

prediction of adj

Return type:

sim (torch.Tensor)

training: bool

egc.model.node_embedding.age.loss_function(adj_preds, adj_labels)[source]

compute loss

Parameters:: adj_preds (torch.Tensor) – reconstructed adj
Returns:: loss
Return type:: torch.Tensor

egc.model.node_embedding.age.update_similarity(z, upper_threshold, lower_treshold, pos_num, neg_num)[source]

update similarity

Parameters:

z (numpy.ndarray) – hidden embedding
upper_threshold (float) – upper threshold
lower_treshold (float) – lower treshold
pos_num (int) – number of positive samples
neg_num (int) – number of negative samples

Returns:

list of positive indexs numpy.ndarray: list of negative indexs

Return type:

numpy.ndarray

egc.model.node_embedding.age.update_threshold(upper_threshold, lower_treshold, up_eta, low_eta)[source]

update threshold

Parameters:

upper_threshold (float) – upper threshold
lower_treshold (float) – lower treshold
up_eta (float) – update step size of upper threshold
low_eta (float) – update step size of lower threshold

Returns:

updated upth lowth (float): updated lowth

Return type:

upth (float)

egc.model.node_embedding.age.preprocess_graph(adj: csr_matrix, layer: int, norm: str = 'sym', renorm: bool = True) → Tensor[source]

Generalized Laplacian Smoothing Filter

Parameters:

adj (sp.csr_matrix) – 2D sparse adj.
layer (int) – numbers of linear layers
norm (str) – normalize mode of Laplacian matrix
renorm (bool) – If with the renormalization trick

Returns:

Laplacian Smoothing Filter

Return type:

adjs (sp.csr_matrix)

egc.model.node_embedding.age.scale(z)[source]

Feature Scale :param z: hidden embedding :type z: torch.Tensor

Returns:: scaled embedding
Return type:: z_scaled (torch.Tensor)

egc.model.node_embedding.dgi module

Embedding By DGI

Adapted from: https://github.com/PetarV-/DGI

egc.model.node_embedding.dgi.avg_readout(h: Tensor, mask: Tensor | None = None)[source]

Average readout of whole graph

Parameters:

h (torch.Tensor) – embeddings of all nodes in graph.
mask (torch.Tensor, optional) – node mask. Defaults to None.

Returns:

Average readout of whole graph.

Return type:

(torch.Tensor)

class egc.model.node_embedding.dgi.DGIEmbed(in_feats: int, out_feats_list: List[int], n_epochs: int = 10000, early_stopping_epoch: int = 20, batch_size: int = 1024, neighbor_sampler_fanouts: List[int] = -1, lr: float = 0.001, l2_coef: float = 0.0, activation: str = 'prelu', model_filename: str = 'dgi')[source]

Bases: Module

DGI Embedding

Parameters:

in_feats (int) – input feature dimension.
out_feats_list (List[int]) – List of hidden units dimensions.
n_epochs (int, optional) – number of embedding training epochs. Defaults to 10000.
early_stopping_epoch (int, optional) – early stopping threshold. Defaults to 20.
batch_size (int, optional) – batch size. Defaults to 1024.
neighbor_sampler_fanouts (List[int] or int, optional) –
List of neighbors to sample for each GNN layer, with the i-th element being the fanout for the i-th GNN layer. Defaults to -1.
- If only a single integer is provided, DGL assumes that every layer will have the same fanout.
- If -1 is provided on one layer, then all inbound edges will be included.
lr (float, optional) – learning rate. Defaults to 0.001.
l2_coef (float, optional) – weight decay. Defaults to 0.0.
activation (str, optional) – activation of gcn layer. Defaults to prelu.
model_filename (str, optional) – path to save best model parameters. Defaults to dgi.

forward(block, input_feats) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

fit(graph: DGLGraph, device: device = device(type='cpu')) → None[source]

Fit for Specific Graph

Parameters:

graph (dgl.DGLGraph) – dgl graph.
device (torch.device, optional) – torch device. Defaults to torch.device(‘cpu’).

get_embedding(graph: DGLGraph, device: device = device(type='cpu'), model_filename: str | None = None) → Tensor[source]

Get the embeddings.

Parameters:

graph (dgl.DGLGraph) – dgl graph.
device (torch.device, optional) – torch device. Defaults to torch.device(‘cpu’).
model_filename (str, optional) – Model file to load. Defaults to None.

Returns:

Embeddings.

Return type:

torch.Tensor

training: bool

egc.model.node_embedding.gae module

GAE embedding

egc.model.node_embedding.gae.bce_loss(preds, labels, norm, pos_weight)[source]

class egc.model.node_embedding.gae.DGL_GAE(epochs: int, n_clusters: int, fead_dim: int, n_nodes: int, hidden_dim1: int = 32, dropout: float = 0.0, lr: float = 0.01, early_stop: int = 10, activation: str = 'relu')[source]

Bases: Module

An implementation of “GAE”

Parameters:

epochs (int, optional) – number of embedding training epochs. Defaults to 200.
n_clusters (int) – cluster num.
fead_dim (int) – dim of features
n_nodes (int) – number of nodes
hidden_dim1 (int) – hidden units size of gcn_1. Defaults to 32.
dropout (int, optional) – Dropout rate (1 - keep probability).
lr (float, optional) – learning rate.. Defaults to 0.001.
early_stop (int, optional) – early stopping threshold. Defaults to 10.
activation (str, optional) – activation of gcn layer_1. Defaults to ‘relu’.

Encode(graph, features)[source]

Encoder for GAE

Parameters:

graph (dgl.DGLGraph) – Graph data in dgl
features (torch.Tensor) – node’s features

Returns:

Latent embedding of GAE

Return type:

h1 (torch.Tensor)

Decode(z)[source]

Decoder for GAE

Parameters:: features (torch.Tensor) – node’s features
Returns:: Latent embedding of GAE
Return type:: h1 (torch.Tensor)

forward()[source]

Forward Propagation

Returns:: Reconstructed adj matrix Latent_Representation (torch.Tensor):Latent embedding of GAE
Return type:: Graph_Reconstruction (torch.Tensor)

fit(adj_csr: csr_matrix, features: Tensor, device: device = device(type='cpu')) → None[source]

Fitting a GAE model

Parameters:

adj_csr (sp.lil_matrix) – 2D sparse features.
features (torch.Tensor) – node’s features.
device (torch.device, optional) – torch device. Defaults to torch.device(‘cpu’).

training: bool

get_embedding() → ndarray[source]

Get the embeddings (graph or node level).

Returns:: embedding.
Return type:: (numpy.ndarray)

egc.model.node_embedding.gmi module

Embedding By GMI

Adapted From: https://github.com/zpeng27/GMI

egc.model.node_embedding.gmi.mi_loss_jsd(pos: Tensor, neg: Tensor) → Tensor[source]

Jensen-Shannon MI Estimator

Parameters:

pos (torch.Tensor) – \(D_w(h_i, x_i) or D_w(h_i, x_j)\).
neg (torch.Tensor) – \(D_w(h_i, x'_i) or D_w(h_i, x'_j)\).

Returns:

JSD loss.

\[\begin{split}& sp(-D_w(h_i,x_i))+E(sp(D_w(h_i,x'_i)))\\ & \textbf{or} \\ & sp(-D_w(h_i,x_j))+E(sp(D_w(h_i,x'_j))). \\\end{split}\]

Return type:

(torch.Tensor)

egc.model.node_embedding.gmi.reconstruct_loss(pred: Tensor, gnd: Tensor) → Tensor[source]

Loss of Rebuilt Adj

Parameters:

pred (torch.Tensor) – \(w_{ij}\).
gnd (torch.Tensor) – \(a_{ij}\).

Returns:

reconstruction loss.

\[\begin{split}\text{reconstruct}_{loss} = & \frac{n^2}{n^2 - |E|} * AVG(\frac{-(n^2-|E|)}{|E|} * a_{ij} * \log(w_{ij} + e^{-10}) \\ & - (1 - a_{ij}) * \log(1 - w_{ij} + e^{-10})).\end{split}\]

Return type:

(torch.Tensor)

egc.model.node_embedding.gmi.preprocess_adj(adj_orig: csr_matrix) → Tuple[Tensor, Tensor][source]

Preprocess of Adjacency Matrix for Row Avarage and Self Loop

Parameters:: adj_orig (<class 'scipy.sparse.csr.csr_matrix'>) – input origin adjacency matrix.
Returns:: row avarage and self loop adj
Return type:: adj_orig, adj_target (<class ‘scipy.sparse.csr.csr_matrix’>, <class ‘numpy.matrix’>)

class egc.model.node_embedding.gmi.GMIEmbed(in_features: int, hidden_units: int = 512, n_epochs: int = 550, early_stopping_epoch: int = 20, lr: float = 0.001, l2_coef: float = 0.0, alpha: float = 0.8, beta: float = 1.0, gamma: float = 1.0, activation: str = 'prelu', gcn_depth: int = 2)[source]

Bases: Module

GMI Embedding

Parameters:

in_features (int) – input feature dimension.
hidden_units (int, optional) – hidden units size of gcn. Defaults to 512.
n_epochs (int, optional) – number of embedding training epochs. Defaults to 550.
early_stopping_epoch (int, optional) – early stopping threshold. Defaults to 20.
lr (float, optional) – learning rate. Defaults to 0.001.
l2_coef (float, optional) – weight decay. Defaults to 0.0.
alpha (float, optional) – parameter for \(I(h_i; x_i)\). Defaults to 0.8.
beta (float, optional) – parameter for \(I(h_i; x_j)\). Defaults to 1.0.
gamma (float, optional) – parameter for \(I(w_ij; a_ij)\). Defaults to 1.0.
activation (str, optional) – activation of gcn layer. Defaults to “prelu”.

calc_loss(mi_pos: Tensor, mi_neg: Tensor, local_mi_pos: Tensor, local_mi_neg: Tensor, adj_rebuilt: Tensor) → Tensor[source]

Calculate Loss

Parameters:

mi_pos (torch.Tensor) – \(D_w(h_i, x_i)\).
mi_neg (torch.Tensor) – \(D_w(h_i, x'_i)\).
local_mi_pos (torch.Tensor) – \(D_w(h_i, x_j)\).
local_mi_neg (torch.Tensor) – \(D_w(h_i, x'_j)\).
adj_rebuilt (torch.Tensor) – \(w_{ij}\)

Returns:

loss.

\[\begin{split}loss = & \alpha * sp(-D_w(h_i,x_i))+E(sp(D_w(h_i,x'_i))) \\ & + \beta * sp(-D_w(h_i,x_j))+E(sp(D_w(h_i,x'_j))) \\ & + \gamma * \text{reconstruct}_{loss} \\\end{split}\]

Return type:

(torch.Tensor)

forward(neg_sample_list: Tensor) → Tensor[source]

Forward Propagation

Parameters:: neg_sample_list (torch.Tensor) – negative sample list.
Returns:: loss.
Return type:: torch.Tensor

fit(features: lil_matrix, adj_orig: csr_matrix, neg_list_num: int = 5) → None[source]

Fit for Specific Graph

Parameters:

features (sp.lil_matrix) – 2D sparse features.
adj_orig (sp.csr_matrix) – 2D sparse adj.
neg_list_num (int, optional) – negative sample times. Defaults to 5.

set_features_norm(features_norm) → None[source]

Set the features row normalized

Parameters:: features_norm (torch.Tensor) – normalized 3D features tensor in shape of [1, xx, xx]

set_adj_norm(adj_norm) → None[source]

Set the adjacency symmetrically normalized

Parameters:: adj_norm (torch.Tensor) – symmetrically normalized 2D adjacency tensor

get_features_norm() → Tensor[source]

Get the features row normalized

Returns:: normalized 3D features tensor in shape of [1, xx, xx]
Return type:: features_norm (torch.Tensor)

get_adj_norm() → Tensor[source]

Get the adjacency symmetrically normalized

Returns:: symmetrically normalized 2D adjacency tensor
Return type:: adj_norm (torch.Tensor)

get_embedding() → Tensor[source]

Get the embeddings (graph or node level).

Returns:: embedding.
Return type:: (torch.Tensor)

training: bool

egc.model.node_embedding.igae module

IGAE Embedding

class egc.model.node_embedding.igae.IGAE(args: Namespace, device)[source]

Bases: Module

This is a symmetric improved graph autoencoder (IGAE). This network requires to reconstruct both the weighted attribute matrix and the adjacency matrix simultaneously

Parameters:: args (argparse.Namespace) – all parameters

forward(g, feat)[source]

Forward Propagation

Parameters:

g (dgl.DGLGraph) – Graph data in dgl
feat (torch.Tensor) – node’s features

Returns:

Latent embedding of IGAE z_hat (torch.Tensor):Reconstructed weighted attribute matrix generated by IGAE decoder adj_hat (torch.Tensor):Reconstructed adjacency matrix generated by IGAE decoder

Return type:

z_igae (torch.Tensor)

fit(g, data, adj)[source]

Fitting a IGAE clustering model.

Parameters:

g (dgl.DGLGraph) – Graph data in dgl
data (torch.Tensor) – node’s features
adj (sp.csr.csr_matrix) – adjacency matrix

training: bool

class egc.model.node_embedding.igae.IGAE_encoder(args: Namespace)[source]

Bases: Module

Encoder for IGAE

Parameters:: args (argparse.Namespace) – all parameters

forward(g, feat)[source]

Forward Propagation

Parameters:

g (dgl.DGLGraph) – Graph data in dgl
feat (torch.Tensor) – node’s features

Returns:

Latent embedding of IGAE z_igae_adj (torch.Tensor):Reconstructed adjacency matrix generated by IGAE encoder

Return type:

z_igae (torch.Tensor)

training: bool

class egc.model.node_embedding.igae.IGAE_decoder(args: Namespace)[source]

Bases: Module

Decoder for IGAE

Parameters:: args (argparse.Namespace) – all parameters

forward(g, z_igae)[source]

Forward Propagation

Parameters:

g (dgl.DGLGraph) – Graph data in dgl
z_igae (torch.Tensor) – Latent embedding of IGAE

Returns:

Reconstructed weighted attribute matrix generated by IGAE decoder z_hat_adj (torch.Tensor):Reconstructed adjacency matrix generated by IGAE decoder

Return type:

z_hat (torch.Tensor)

training: bool

egc.model.node_embedding.mvgrl module

Contrastive Multi-View Representation Learning on Graphs https://arxiv.org/abs/2006.05582

class egc.model.node_embedding.mvgrl.Readout[source]

Bases: Module

read out

static forward(seq, msk)[source]

Forward Propagation

Parameters:

seq (torch.Tensor) – features tensor.
msk (torch.Tensor) – node mask.

Returns:

graph-level representation

Return type:

(torch.Tensor)

training: bool

class egc.model.node_embedding.mvgrl.MVGRL(in_feats: int, n_clusters: int, n_h: int = 512, model_filename: str = 'mvgrl', sparse: bool = False, nb_epochs: int = 3000, patience: int = 20, lr: float = 0.001, weight_decay: float = 0.0, sample_size: int = 2000, batch_size: int = 4, dataset: str = 'Citeseer')[source]

Bases: Module

MVGRL:Contrastive Multi-View Representation Learning on Graphs

Parameters:

in_feats (int) – Input feature size.
n_clusters (int) – Num of clusters.
n_h (int,optional) – hidden units dimension. Defaults to 256.
model_filename (str,optional) – Path to store model parameters. Defaults to ‘mvgrl’.
sparse (bool,optional) – Use sparse tensor. Defaults to False.
nb_epochs (int,optional) – Maximum training epochs. Defaults to 3000.
patience (int,optional) – Early stopping patience. Defaults to 20.
lr (float,optional) – Learning rate. Defaults to 0.001.
weight_decay (float,optional) – Weight decay. Defaults to 0.0.
sample_size (int,optional) – Sample size. Defaults to 2000.
batch_size (int,optional) – Batch size. Defaults to 4.
dataset (str,optional) – Dataset. Defaults to ‘Citeseer’.

forward(seq1, seq2, adj, diff, sparse, msk)[source]

Forward Propagation

Parameters:

seq1 (torch.Tensor) – features of raw graph
seq2 (torch.Tensor) – shuffle features of diffuse graph
adj (torch.Tensor) – adj matrix of raw graph
diff (torch.Tensor) – ppr matrix of diffuse graph
sparse (bool) – if sparse
msk (torch.Tensor) – mask node

Returns:

probability of positive or negtive node h_1 (torch.Tensor): node embedding of raw graph by one gcn layer h_2 (torch.Tensor): node embedding of diffuse graph by one gcn layer

Return type:

ret (torch.Tensor)

fit(adj_csr, features)[source]

Fitting

Parameters:

adj_csr (sp.lil_matrix) – adj sparse matrix.
features (torch.Tensor) – features.

get_embedding()[source]

Get the embeddings (graph or node level).

Returns:: embedding of each node. (torch.Tensor): embedding of graph representations
Return type:: (torch.Tensor)

get_memberships()[source]

Get memberships

Returns:: memberships
Return type:: np.ndarray

training: bool

egc.model.node_embedding.saif module

A structure and attribute information fusion (SAIF) module

class egc.model.node_embedding.saif.SAIF(adj_orig_graph: DGLGraph, data: Tensor, train_loader: DataLoader, label: Tensor, adj: csr_matrix, n_clusters: int, n_node: int, device: device, args: Namespace)[source]

Bases: Module

A structure and attribute information fusion (SAIF) module

Parameters:

adj_orig_graph (dgl.DGLGraph) – Graph data in dgl
data (torch.Tensor) – node’s features
train_loader (DataLoader) – DataLoader of AE train
label (torch.Tensor) – node’s label
adj (sp.csr.csr_matrix) – adjacency matrix
n_clusters (int) – numbers of clusters
n_node (int, optional) – number of nodes. Defaults to None.
device (torch.device, optional) – device. Defaults to None.
args (argparse.Namespace) – all parameters

forward()[source]

Forward Propagation

Returns:: Reconstructed attribute matrix generated by AE decoder z_hat (torch.Tensor):Reconstructed weighted attribute matrix generated by IGAE decoder z_tilde (torch.Tensor):Clustering embedding adj_hat (torch.Tensor):Reconstructed adjacency matrix generated by IGAE decoder
Return type:: x_hat (torch.Tensor)

get_embedding()[source]

Get cluster embedding.

Returns:numpy.ndarray

fit(epochs)[source]

Fitting a SAIF clustering model.

Parameters:: epochs (int) – number of train epoch

training: bool

egc.model.node_embedding.sgc module

SGC

egc.model.node_embedding.sgc.eliminate_zeros(adj: spmatrix) → spmatrix[source]

Remove self-loops and edges with value of zero.

Parameters:: adj (sp.spmatrix) – adjacent matrix.
Returns:: adjacent matrix.
Return type:: sp.spmatrix

egc.model.node_embedding.sgc.scale(z)[source]

Feature Scale :param z: hidden embedding :type z: torch.Tensor

Returns:: scaled embedding
Return type:: z_scaled (torch.Tensor)

class egc.model.node_embedding.sgc.LinTrans(layers, dims)[source]

Bases: Module

Linear Transform Model

Parameters:

layers (int) – number of linear layers.
dims (list) – Number of units in hidden layers.

forward(x)[source]

Forward Propagation

Parameters:: x (torch.Tensor) – feature embedding
Returns:: hiddin embedding
Return type:: out (torch.Tensor)

training: bool

class egc.model.node_embedding.sgc.SGC(in_feats: int, hidden_units: ~typing.List, n_lin_layers: int = 1, n_gnn_layers: int = 10, lr: float = 0.001, n_epochs: int = 400, inner_act: ~typing.Callable = <function SGC.<lambda>>, early_stop: int = 10)[source]

Bases: Module

static bce_loss(preds, labels, norm=1.0, pos_weight=None)[source]

preprocess_graph(adj: csr_matrix, layer: int, norm: str = 'sym', renorm: bool = True, lbd: float = 0.6666666666666666) → Tensor[source]

Generalized Laplacian Smoothing Filter

Parameters:

adj (sp.csr_matrix) – 2D sparse adj without self-loops
layer (int) – numbers of linear layers
norm (str) – normalize mode of Laplacian matrix
renorm (bool) – If with the renormalization trick

Returns:

Laplacian Smoothing Filter

Return type:

adjs (sp.csr_matrix)

update_features(adj)[source]: Check whether adj matrix needs to remove self-loops

forward()[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

fit(graph: DGLGraph, device: device) → Tuple[Tensor, Tensor][source]

Fitting

Parameters:

adj (sp.csr_matrix) – 2D sparse adj.
features (torch.Tensor) – features.

get_embedding()[source]

training: bool

egc.model.node_embedding.vgae module

GAE & VGEA

class egc.model.node_embedding.vgae.Encoder(in_features: int, hidden_units_1: int = 32, hidden_units_2: int = 16, activation: str = 'relu')[source]

Bases: Module

Encoder for VGAE

Parameters:

in_features (int) – input feature dimension.
hidden_units_1 (int) – hidden units size of gcn_1. Defaults to 32.
hidden_units_2 (int) – hidden units size of gcn_2. Defaults to 16.
activation (str, optional) – activation of gcn layer_1. Defaults to ‘relu’.

forward(features_norm: Tensor, adj_norm: Tensor) → Tuple[Tensor][source]

Parameters:

features_norm (torch.Tensor) – features_norm
adj_norm (torch.Tensor) – adj_norm

Returns:

(mu, log_sigma, feat_hidden)

Return type:

Tuple[torch.Tensor]

training: bool

class egc.model.node_embedding.vgae.Decoder[source]

Bases: Module

Decoder for VGAE

forward(mu: Tensor, log_sigma: Tensor, training: bool = True) → Tensor[source]

Decoder

Parameters:

mu (torch.Tensor) – mu
log_sigma (torch.Tensor) – log_sigma
training (bool) – isTraining

Returns:

A_hat

Return type:

(torch.Tensor)

training: bool

class egc.model.node_embedding.vgae.VGAE(in_features: int, hidden_units_1: int = 32, hidden_units_2: int = 16, n_epochs: int = 200, early_stopping_epoch: int = 20, lr: float = 0.01, l2_coef: float = 0.0, activation: str = 'relu', model_filename: str = 'vgae')[source]

Bases: Module

Parameters:

in_features (int) – input feature dimension.
hidden_units_1 (int) – hidden units size of gcn_1. Defaults to 32.
hidden_units_2 (int) – hidden units size of gcn_2. Defaults to 16.
n_epochs (int, optional) – number of embedding training epochs. Defaults to 200.
early_stopping_epoch (int, optional) – early stopping threshold. Defaults to 20.
lr (float, optional) – learning rate.. Defaults to 0.001.
l2_coef (float, optional) – weight decay. Defaults to 0.0.
activation (str, optional) – activation of gcn layer_1. Defaults to ‘relu’.
model_filename (str, optional) – path to save best model parameters. Defaults to vgae.

forward()[source]

Returns:: loss
Return type:: loss (torch.Tensor)

fit(features: lil_matrix, adj_orig: csr_matrix) → None[source]

Parameters:

features (sp.lil_matrix) – 2D sparse features.
adj_orig (sp.csr_matrix) – 2D sparse adj.

get_embedding(model_filename: str | None = None) → Tensor[source]

Get the embeddings (graph or node level).

Parameters:: model_filename (str, optional) – Model file to load. Defaults to None.
Returns:: embedding.
Return type:: (torch.Tensor)

training: bool

egc.model.node_embedding.vgae.loss_function(preds, labels, mu, logvar, n_nodes, norm, pos_weight)[source]

class egc.model.node_embedding.vgae.DGL_VGAE(epochs: int, n_clusters: int, fead_dim: int, n_nodes: int, hidden_dim1: int = 32, hidden_dim2: int = 16, dropout: float = 0.0, lr: float = 0.01, early_stop: int = 10, activation: str = 'relu')[source]

Bases: Module

Parameters:

epochs (int, optional) – number of embedding training epochs. Defaults to 200.
n_clusters (int) – cluster num.
fead_dim (int) – dim of features
n_nodes (int) – number of nodes
hidden_dim1 (int) – hidden units size of gcn_1. Defaults to 32.
hidden_dim2 (int) – hidden units size of gcn_2. Defaults to 16.
dropout (int, optional) – Dropout rate (1 - keep probability).
lr (float, optional) – learning rate.. Defaults to 0.001.
early_stop (int, optional) – early stopping threshold. Defaults to 10.
activation (str, optional) – activation of gcn layer_1. Defaults to ‘relu’.

encode(g, feat)[source]

Encoder for VGAE

Parameters:

g (dgl.DGLGraph) – Graph data in dgl
feat (torch.Tensor) – node’s features

Returns:

latent mean self.gc3(g, hidden1) (torch.Tensor):latent log variance

Return type:

self.gc2(g, hidden1) (torch.Tensor)

reparameterize(mu, logvar)[source]

reparameterization trick

Parameters:

mu – (torch.Tensor):latent mean
logvar – (torch.Tensor):latent log variance

Returns:

(torch.Tensor):latent mean after reparameterization trick

Return type:

mu

forward()[source]

Forward Propagation

Returns:: Reconstructed adj matrix mu: (torch.Tensor):latent mean logvar: (torch.Tensor):latent log variance
Return type:: self.dc(z)

fit(adj_csr, features)[source]

Fitting a VGAE model

Parameters:

adj_csr (sp.lil_matrix) – 2D sparse features.
features (torch.Tensor) – node’s features

get_embedding()[source]

Get cluster embedding.

Returns:numpy.ndarray

training: bool

get_memberships()[source]

Get cluster membership.

Returns:numpy.ndarray

Module contents

Node Embedding Methods