egc.model.graph_clustering.disjoint package
Submodules
egc.model.graph_clustering.disjoint.ComE module
Model of ComE
- class egc.model.graph_clustering.disjoint.ComE.ComE(graph, n_clusters=7, size=2, down_sampling=0, table_size=100000000, labels=None, batch_size=50, num_workers=10, negative=5, lr=0.025, window_size=10, num_walks=10, walk_length=80, num_iter=1, output_file='Cora', alpha=0.1, beta=0.1, reg_covar=1e-05)[source]
Bases:
objectclass that keep track of all the parameters used during the learning of the embedding.
- Parameters:
nodes_degree – Dict with node_id: degree of node
size – projection space
down_sampling – perform down_sampling of common node
table_size – size of the negative table to generate
path_labels – location of the file containing the ground true (label for each node)
input_file – name of the file containing the ground true (label for each node)
- Returns:
- build_vocab_(nodes_degree)[source]
Build vocabulary from a sequence of paths (can be a once-only generator stream). Sorted by node id
- reset_weights()[source]
Reset all projection weights to an initial (untrained) state, but keep the existing vocabulary.
- reset_communities_weights()[source]
Reset all projection weights to an initial (untrained) state, but keep the existing vocabulary.
egc.model.graph_clustering.disjoint.SEComm module
SEComm implement
- class egc.model.graph_clustering.disjoint.SEComm.SEComm(n_clusters: int, n_nodes: int, num_features: int, activation: str, base_model: str, batch_size: int, num_hidden: int, num_layers: int, num_proj_hidden: int, tau: float, num_cl_hidden: int, dropout: float, pretrain_epochs: int, learning_rate: float, weight_decay: float, drop_edge_rate_1: float, drop_edge_rate_2: float, drop_feature_rate_1: float, drop_feature_rate_2: float, x_norm: bool, iterations: int, threshold: float, se_epochs: int, se_alpha: float, se_patience: int, se_lr: float, cluster_epochs: int, cluster_alpha: float, final_beta: float, cluster_patience: int)[source]
Bases:
Base,ModuleSEComm model
- Parameters:
function (see utils/argparser.py _SEComm_subparser) –
- forward()[source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- fit(graph, features, label)[source]
Fitting a SEComm model
- Parameters:
graph (dgl.DGLGraph) – data graph.
features (torch.Tensor) – features.
label (torch.Tensor) – label of node’s cluster
egc.model.graph_clustering.disjoint.SENet_kmeans module
SENet Kmeans
- class egc.model.graph_clustering.disjoint.SENet_kmeans.SENetKmeans(feature: FloatTensor, labels: IntTensor, adj: FloatTensor, n_clusters: int, hidden0: int = 16, hidden1: int = 16, lr: float = 0.03, epochs: int = 50, weight_decay: float = 0.0, lam: float = 1.0, n_iter: int = 3)[source]
Bases:
BaseSENet Kmeans
- Parameters:
feature (FloatTensor) – node’s feature.
labels (IntTensor) – node’s label.
adj (FloatTensor) – graph’s adjacency matrix
n_clusters (int) – clusters
hidden0 (int,optional) – hidden units size of gnn layer1. Defaults to 16,
hidden1 (int,optional) – hidden units size of gnn layer2. Defaults to 16,,
lr (float,optional) – learning rate. Defaults to 3e-2,
epochs (int,optional) – number of embedding training epochs.Defaults to 50,
weight_decay (float,optional) – weight decay.Defaults to 0.0,
lam (float,optional) – Used for construct improved graph . Defaults to 1.0,
n_iter (int,optional) – the times of convoluting feature . Defaults to 3,
egc.model.graph_clustering.disjoint.agc_kmeans module
AGC Kmeans
- class egc.model.graph_clustering.disjoint.agc_kmeans.AGC(adj: Tensor, feature: Tensor, labels: Tensor, epochs: int = 60, n_clusters: int = 7, rep: int = 10)[source]
Bases:
BaseSENet Kmeans
- Parameters:
feature (FloatTensor) – node’s feature.
labels (IntTensor) – node’s label.
adj (FloatTensor) – graph’s adjacency matrix
n_clusters (int) – clusters
epochs (int,optional) – number of embedding training epochs.Defaults to 60,
rep (int,optional) – times of calculate intra(c)
egc.model.graph_clustering.disjoint.agcn module
AGCN implement
- class egc.model.graph_clustering.disjoint.agcn.MLP_L(n_mlp)[source]
Bases:
ModuleUsed to reduce the dimension of features
- forward(mlp_in)[source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool
- class egc.model.graph_clustering.disjoint.agcn.MLP_1(n_mlp)[source]
Bases:
ModuleUsed to reduce the dimension of features
- forward(mlp_in)[source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool
- class egc.model.graph_clustering.disjoint.agcn.MLP_2(n_mlp)[source]
Bases:
ModuleUsed to reduce the dimension of features
- forward(mlp_in)[source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool
- class egc.model.graph_clustering.disjoint.agcn.MLP_3(n_mlp)[source]
Bases:
ModuleUsed to reduce the dimension of features
- forward(mlp_in)[source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool
- class egc.model.graph_clustering.disjoint.agcn.AGCN(graph: DGLGraph, X: FloatTensor, labels: IntTensor, n_input, n_clusters, hidden1: int = 500, hidden2: int = 500, hidden3: int = 2000, lr: float = 0.0001, epochs: int = 200, pretrain_lr: float = 0.001, pretrain_epochs: int = 100, n_z: int = 10, v: int = 1, gpu: int = 0)[source]
Bases:
Module- forward(graph, x)[source]
Calculate the distribution of p,q and z
- Parameters:
graph (dgl.DGLgraph) – graph
x (torch.FloatTensor) – node features
- Returns:
node features after AE reconstruction q (torch.FloatTensor): q-distribution predict (torch.FloatTensor): z-distribution, label predict p (torch.FloatTensor): p-distribution
- Return type:
x_bar (torch.FloatTensor)
- init_cluster_layer_parameter(features, n_init)[source]
Initialize the cluster center
- Parameters:
features (torch.FloatTensor) – node feature
n_init (int) – Number of kmeans iterations
- fit()[source]
Train model
- Returns:
the result of model predict
- Return type:
label_predict (ndarray)
- get_memberships()[source]
Get predicted label
- Parameters:
graph (dgl.DGLGraph) – graph
features (torch.FloatTensor) – node features
Returns:
- training: bool
egc.model.graph_clustering.disjoint.age_cluster module
Used for age model
- class egc.model.graph_clustering.disjoint.age_cluster.age_cluster(dims: list | None = None, feat_dim: int | None = None, gnnlayers_num: int = 3, linlayers_num: int = 1, lr: float = 0.001, upth_st: float = 0.0015, upth_ed: float = 0.001, lowth_st: float = 0.1, lowth_ed: float = 0.5, upd: float = 10, bs: int = 10000, epochs: int = 400, norm: str = 'sym', renorm: bool = True, estop_steps: int = 5, n_clusters: int | None = None)[source]
Bases:
BaseAGE Cluster Implement
- Parameters:
dims (list,optional) – Number of units in hidden layer 1.
feat_dim (int,optional) – input feature dimension.
gnnlayers_num (int) – Number of gnn layers
linlayers_num (int, optional) – Number of hidden layers
lr (float, optional) – learning rate.. Defaults to 0.001.
upth_st (float, optional) – Upper Threshold start.
upth_ed (float, optional) – Upper Threshold end.
lowth_st (float, optional) – Lower Threshold start.
lowth_ed (float, optional) – Lower Threshold end.
upd (float, optional) – Update epoch.
bs (int,optional) – Batchsize
epochs (int,optional) – Number of epochs to train.
norm (str,optional) – normalize mode of Laplacian matrix
renorm (bool,optional) – If with the renormalization trick
estop_steps (int,optional) – Number of early_stop steps.
n_cluster (int,optinal) – number of clusters
egc.model.graph_clustering.disjoint.cc module
Contrastive Clustering
Adapted from https://github.com/Yunfan-Li/Contrastive-Clustering
- class egc.model.graph_clustering.disjoint.cc.ContrastiveClustering(in_feats: int, out_feats_list: List[int], n_clusters: int, aggregator_type: str = 'gcn', bias: bool = True, batch_size: int = 1024, instance_temperature: float = 0.5, cluster_temperature: float = 1.0, aug_types: List | None = None, n_epochs: int = 1000, lr: float = 0.001, l2_coef: float = 0.0, early_stopping_epoch: int = 20, model_filename: str = 'cc')[source]
Bases:
Base,Module- Parameters:
in_feats (int) – Input feature size.
out_feats_list (List[int]) – List of hidden units dimensions.
n_clusters (int) – Num of clusters.
aggregator_type (str, optional) – Aggregator type to use (
mean,gcn,pool,lstm). Defaults to ‘gcn’.bias (bool, optional) – If True, adds a learnable bias to the output. Defaults to True.
batch_size (int, optional) – Batch size. Defaults to 1024.
instance_temperature (float, optional) – Instance Contrastive Head temperature. Defaults to 0.5.
cluster_temperature (float, optional) – Cluster Contrastive Head temperature. Defaults to 1.0.
aug_types (List, optional) – Augmentation types list. Defaults to [‘edge’, ‘edge’].
n_epochs (int, optional) – Maximum training epochs. Defaults to 1000.
lr (float, optional) – Learning Rate. Defaults to 0.001.
l2_coef (float, optional) – Weight decay. Defaults to 0.0.
early_stopping_epoch (int, optional) – Early stopping threshold. Defaults to 20.
model_filename (str, optional) – Path to store model parameters. Defaults to ‘cc’.
- forward(blocks_i, blocks_j)[source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- fit(graph: DGLGraph, device: device = device(type='cpu')) None[source]
- Parameters:
graph (dgl.DGLGraph) – graph.
device (torch.device, optional) – torch device. Defaults to torch.device(‘cpu’).
egc.model.graph_clustering.disjoint.clusternet module
ClusterNet Paper: https://proceedings.neurips.cc/paper/2019/file/8bd39eae38511daad6152e84545e504d-Paper.pdf Source Code: https://github.com/bwilder0/clusternet
- class egc.model.graph_clustering.disjoint.clusternet.ClusterNet(in_feats: int, out_feats_list: List[int], n_clusters: int, cluster_temp: float = 30, aggregator_type: str = 'gcn', bias: bool = True, dropout: float = 0.5, n_epochs: int = 1000, lr: float = 0.01, l2_coef: float = 1e-05, early_stopping_epoch: int = 20, model_filename: str = 'clusternet')[source]
Bases:
ModuleGCN ClusterNet. The ClusterNet architecture. The first step is a 2-layer GCN to generate embeddings. The output is the cluster means mu and soft assignments r, along with the embeddings and the the node similarities (just output for debugging purposes).
The forward pass inputs are x, a feature matrix for the nodes, and adj, a sparse adjacency matrix. The optional parameter num_iter determines how many steps to run the k-means updates for.
- Parameters:
in_feats (int) – Input feature size.
out_feats_list (List[int]) – List of hidden units dimensions.
n_clusters (int) – Num of clusters.
cluster_temp (float, optional) – softmax temperature. Defaults to 30.
aggregator_type (str, optional) – Aggregator type to use (
mean,gcn,pool,lstm). Defaults to ‘gcn’.bias (bool, optional) – If True, adds a learnable bias to the output. Defaults to True.
dropout (float, optional) – Percentage for dropping in GCN. Defaults to 0.5.
n_epochs (int, optional) – Maximum training epochs. Defaults to 1000.
lr (float, optional) – Learning Rate. Defaults to 0.01.
l2_coef (float, optional) – Weight decay. Defaults to 0.5.
early_stopping_epoch (int, optional) – Early stopping threshold. Defaults to 20.
model_filename (str, optional) – Path to store model parameters. Defaults to ‘clusternet’.
- forward(blocks) Tuple[Tensor][source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- fit(graph: DGLGraph, num_iter: int = 10, device: device = device(type='cpu')) None[source]
- Parameters:
graph (dgl.DGLGraph) – graph.
num_iter (int, optional) – clustering iteration. Defaults to 10.
device (torch.device, optional) – torch device. Defaults to torch.device(‘cpu’).
- get_embedding(graph: DGLGraph, device: device = device(type='cpu')) Tensor[source]
Get the embeddings (graph or node level).
- Returns:
embedding.
- Return type:
(torch.Tensor)
- get_memberships(graph: DGLGraph, device: device = device(type='cpu')) Tensor[source]
Get the memberships.
- Parameters:
graph (dgl.DGLGraph) – dgl graph.
device (torch.device, optional) – torch device. Defaults to torch.device(‘cpu’).
model_filename (str, optional) – Model file to load. Defaults to None.
- Returns:
Embeddings.
- Return type:
torch.Tensor
- training: bool
egc.model.graph_clustering.disjoint.daegc module
DAEGC implement ref:https://github.com/kouyongqi/DAEGC
- class egc.model.graph_clustering.disjoint.daegc.DAEGC(num_features: int, hidden_size: int, embedding_size: int, alpha: float, num_clusters: int, pretrain_lr: float, lr: float, weight_decay: float, pre_epochs: int, epochs: int, update_interval: int, estop_steps: int, t: int, v: int = 1)[source]
Bases:
Base,Module- Parameters:
num_features (int) – input feature dimension.
hidden_size (int) – number of units in hiddin layer.
embedding_size (int) – number of output emb dim.
alpha (float) – Alpha for the leaky_relu.
num_clusters (int) – cluster num.
pretrain_lr (float) – learning rate of pretrain model.
lr (float) – learning rate of final model.
weight_decay (float) – weight decay.
pre_epochs (int) – number of epochs to pretrain model.
epochs (int) – number of epochs to final model.
update_interval (int) – update interval of DAEGC.
estop_steps (int) – Number of early_stop steps.
v (int,optional) – Degrees of freedom of the student t-distribution.Defaults to 1.
- forward(x, adj, M)[source]
Forward Propagation
- Parameters:
x (torch.Tensor) – features of nodes
adj (torch.Tensor) – adj matrix
M (torch.Tensor) – the topological relevance of node j to node i up to t orders.
- Returns:
Reconstructed adj matrix z (torch.Tensor): latent representation q (torch.Tensor): Soft assignments
- Return type:
A_pred (torch.Tensor)
- fit(adj, feats, label)[source]
Fitting a DAEGC model
- Parameters:
adj (sp.lil_matrix) – adj sparse matrix.
feats (torch.Tensor) – features.
label (torch.Tensor) – label of node’s cluster
- get_Q(z)[source]
get soft clustering assignment distribution
- Parameters:
z (torch.Tensor) – node embedding
- Returns:
Soft assignments
- Return type:
torch.Tensor
egc.model.graph_clustering.disjoint.danmf module
DANMF
Adapted from: https://github.com/benedekrozemberczki/DANMF
- class egc.model.graph_clustering.disjoint.danmf.DANMF(graph, args)[source]
Bases:
objectDeep autoencoder-like non-negative matrix factorization class.
- Parameters:
graph – Networkx graph.
args – Arguments object.
egc.model.graph_clustering.disjoint.dfcn module
An implementation of “DFCN” from the AAAI’21 paper “Deep Fusion Clustering Network”.
An interdependency learning-based Structure and Attribute Information Fusion (SAIF) module is proposed to explicitly merge the representations learned by an autoencoder and a graph autoencoder for consensus representation learning.
Also, a reliable target distribution generation measure and a triplet self-supervision strategy, which facilitate cross-modality information exploitation, are designed for network training.
- class egc.model.graph_clustering.disjoint.dfcn.DFCN(graph: DGLGraph, data: Tensor, label: Tensor, n_clusters: int, n_node: int, device: device, args: Namespace)[source]
Bases:
Module,BaseDFCN.
- Parameters:
graph (dgl.DGLGraph) – Graph data in dgl
data (torch.Tensor) – node’s features
label (torch.Tensor) – node’s label
n_clusters (int) – numbers of clusters
n_node (int, optional) – number of nodes. Defaults to None.
device (torch.device, optional) – device. Defaults to None.
args (argparse.Namespace) – all parameters
- forward()[source]
Forward Propagation
- Returns:
Reconstructed attribute matrix generated by AE decoder z_hat (torch.Tensor):Reconstructed weighted attribute matrix generated by IGAE decoder adj_hat (torch.Tensor):Reconstructed adjacency matrix generated by IGAE decoder q (torch.Tensor):Soft assignment distribution of the fused representations q1 (torch.Tensor):Soft assignment distribution of IGAE q2 (torch.Tensor):Soft assignment distribution of AE z_tilde (torch.Tensor):Clustering embedding
- Return type:
x_hat (torch.Tensor)
- fit(epochs)[source]
Fitting a DFCN clustering model.
- Parameters:
epochs (int) – number of train epoch
- training: bool
egc.model.graph_clustering.disjoint.dgc module
Deep Graph Clustering
- class egc.model.graph_clustering.disjoint.dgc.DGC(in_feats: int, out_feats_list: List[int], n_clusters: int, aggregator_type: str = 'gcn', bias: bool = True, encoder_act: List[str] | None = None, dropout: float = 0.0, batch_size: int = 1024, n_epochs: int = 1000, lr: float = 0.01, l2_coef: float = 0.0, early_stopping_epoch: int = 20, model_filename: str = 'dgc')[source]
Bases:
Base,ModuleDeep Graph Clustering
- forward(blocks)[source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
egc.model.graph_clustering.disjoint.dgc_mlp module
Deep Graph Clustering
- class egc.model.graph_clustering.disjoint.dgc_mlp.DGC(in_feats: int, out_feats_list: List[int], n_clusters: int, classifier_hidden_list: List[int] | None = None, aggregator_type: str = 'gcn', bias: bool = True, encoder_act: List[str] | None = None, classifier_act: List[str] | None = None, dropout: float = 0.0, n_epochs: int = 1000, lr: float = 0.01, l2_coef: float = 0.0, early_stopping_epoch: int = 20, model_filename: str = 'dgc_mlp')[source]
Bases:
Base,ModuleDeep Graph Clustering
- forward(blocks)[source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
egc.model.graph_clustering.disjoint.dgc_mlp_gsl module
Deep Graph Clustering
- class egc.model.graph_clustering.disjoint.dgc_mlp_gsl.DGC(in_feats: int, out_feats_list: List[int], n_clusters: int, classifier_hidden_list: List[int] | None = None, aggregator_type: str = 'gcn', bias: bool = True, k: int = 20, tau: float = 0.9999, encoder_act: List[str] | None = None, classifier_act: List[str] | None = None, dropout: float = 0.0, n_epochs: int = 1000, n_pretrain_epochs: int = 800, lr: float = 0.01, l2_coef: float = 0.0, early_stopping_epoch: int = 20, model_filename: str = 'dgc_mlp_gsl')[source]
Bases:
Base,ModuleDeep Graph Clustering
- forward(blocks)[source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- pretrain(data_loader: NaiveDataLoader, adj_label: Tensor) None[source]
egc.model.graph_clustering.disjoint.dgi_kmeans module
DGI + Kmeans Graph Clustering
- class egc.model.graph_clustering.disjoint.dgi_kmeans.DGIKmeans(in_feats: int, out_feats_list: List[int], n_epochs: int = 10000, early_stopping_epoch: int = 20, batch_size: int = 1024, neighbor_sampler_fanouts: List[int] = -1, lr: float = 0.001, l2_coef: float = 0.0, activation: str = 'prelu', model_filename: str = 'dgi')[source]
Bases:
BaseDGI + Kmeans
- Parameters:
in_feats (int) – input feature dimension.
out_feats_list (List[int]) – List of hidden units dimensions.
n_epochs (int, optional) – number of embedding training epochs. Defaults to 10000.
early_stopping_epoch (int, optional) – early stopping threshold. Defaults to 20.
batch_size (int, optional) – batch size. Defaults to 1024.
neighbor_sampler_fanouts (List[int] or int, optional) –
List of neighbors to sample for each GNN layer, with the i-th element being the fanout for the i-th GNN layer. Defaults to -1.
If only a single integer is provided, DGL assumes that every layer will have the same fanout.
If -1 is provided on one layer, then all inbound edges will be included.
lr (float, optional) – learning rate. Defaults to 0.001.
l2_coef (float, optional) – weight decay. Defaults to 0.0.
activation (str) – activation of gcn layer. Defaults to prelu.
model_filename (str, optional) – path to save best model parameters. Defaults to dgi.
- fit(graph: DGLGraph, n_clusters: int, device: device = device(type='cpu'))[source]
Fit for Specific Graph
- Parameters:
graph (dgl.DGLGraph) – dgl graph.
n_clusters (int) – cluster num.
device (torch.device, optional) – torch device. Defaults to torch.device(‘cpu’).
- get_embedding(graph: DGLGraph, device: device = device(type='cpu'), model_filename: str | None = None) Tensor[source]
Get the embeddings.
- Parameters:
graph (dgl.DGLGraph) – dgl graph.
device (torch.device, optional) – torch device. Defaults to torch.device(‘cpu’).
model_filename (str, optional) – Model file to load. Defaults to None.
- Returns:
Embeddings.
- Return type:
torch.Tensor
- get_memberships(graph: DGLGraph, device: device = device(type='cpu'), model_filename: str | None = None) Tensor[source]
Get the memberships.
- Parameters:
graph (dgl.DGLGraph) – dgl graph.
device (torch.device, optional) – torch device. Defaults to torch.device(‘cpu’).
model_filename (str, optional) – Model file to load. Defaults to None.
- Returns:
Embeddings.
- Return type:
torch.Tensor
egc.model.graph_clustering.disjoint.gae_kmeans module
gae_kmeans
- class egc.model.graph_clustering.disjoint.gae_kmeans.DGL_GAEKmeans(epochs: int, n_clusters: int, fead_dim: int, n_nodes: int, hidden_dim1: int = 32, dropout: float = 0.0, lr: float = 0.01, early_stop: int = 10, activation: str = 'relu')[source]
Bases:
BaseGAE Kmeans implement using dgl
- Parameters:
epochs (int, optional) – number of embedding training epochs. Defaults to 200.
n_clusters (int) – cluster num.
fead_dim (int) – dim of features
n_nodes (int) – number of nodes
hidden_dim1 (int) – hidden units size of gcn_1. Defaults to 32.
dropout (int, optional) – Dropout rate (1 - keep probability).
lr (float, optional) – learning rate.. Defaults to 0.001.
early_stop (int, optional) – early stopping threshold. Defaults to 10.
activation (str, optional) – activation of gcn layer_1. Defaults to ‘relu’.
egc.model.graph_clustering.disjoint.gala module
GALA
- class egc.model.graph_clustering.disjoint.gala.GALA(adj: Tensor, X: Tensor, lr: float = 0.0001, epochs: int = 1000, hidden1: int = 800, hidden2: int = 700, n_clusters: int = 7)[source]
Bases:
Module- forward()[source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool
egc.model.graph_clustering.disjoint.gdcl module
Graph Debiased Contrastive Learning with Joint Representation Clustering https://www.ijcai.org/proceedings/2021/0473.pdf
- class egc.model.graph_clustering.disjoint.gdcl.Readout[source]
Bases:
Moduleread out
- static forward(seq, msk)[source]
Forward Propagation
- Parameters:
seq (torch.Tensor) – features tensor.
msk (torch.Tensor) – node mask.
- Returns:
graph-level representation
- Return type:
(torch.Tensor)
- training: bool
- class egc.model.graph_clustering.disjoint.gdcl.GDCL(in_feats, n_clusters, n_h: int = 512, nb_epochs: int = 1500, lr: float = 5e-05, alpha=0.0001, mask_num: int = 100, batch_size: int = 4, update_interval: int = 10, model_filename: str = 'gdcl', beta: float = 0.001, weight_decay: float = 0.0, pt_n_h: int = 512, pt_model_filename: str = 'mvgrl', pt_nb_epochs: int = 3000, pt_patience: int = 20, pt_lr: float = 0.001, pt_weight_decay: float = 0.0, pt_sample_size: int = 2000, pt_batch_size: int = 4, sparse: bool = False, dataset: str = 'Citeseer', device: device = device(type='cpu'))[source]
Bases:
ModuleGDCL: Graph Debiased Contrastive Learning with Joint Representation Clustering
- Parameters:
in_feats (int) – Input feature size.
n_clusters (int) – Num of clusters.
n_h (int) – hidden units dimension. Defaults to 512.
nb_epochs – epoch number of GDCL . Defaults to 1500.
lr – learning rate of GDCL. Defaults to 0.00005.
alpha – alpha parameter of distribution. Defaults to 0.0001.
mask_num – mask number. Defaults to 100.
batch_size – batch size of GDCL. Defaults to 4.
update_interval – update interval of GDCL. Defaults to 10.
model_filename – model filename of GDCL. Defaults to ‘gdcl’.
beta – balance factor. Defaults to 10e-4.
weight_decay – weight decay of GDCL. Defaults to 0.0.
pt_n_h – hidden units dimension of pretrained MVGRL. Defaults to 512.
pt_model_filename – model filename of pretrained MVGRL. Defaults to ‘mvgrl’.
pt_nb_epochs – epoch number of pretrained MVGRL. Defaults to 3000.
pt_patience – patience of pretrained MVGRL. Defaults to 20.
pt_lr – learning rate of pretrained MVGRL. Defaults to 0.001.
pt_weight_decay – weight decay of pretrained MVGRL. Defaults to 0.0.
pt_sample_size – sample size of pretrained MVGRL. Defaults to 2000.
pt_batch_size – batch size of pretrained MVGRL. Defaults to 4.
sparse – if sparse. Defaults to False.
dataset – dataset name. Defaults to ‘Citeseer’.
device – device. Defaults to torch.device(‘cpu’).
- embed(seq, adj, diff, sparse)[source]
Embed.
- Parameters:
seq (tensor.Tensor) – features of raw graph
adj (tensor.Tensor) – adj matrix of raw graph
diff (tensor.Tensor) – ppr matrix of diffuse graph
sparse (bool) – if sparse
- Returns:
node embedding
- Return type:
(tensor.Tensor)
- forward(bf, mask_fts, bd, sparse)[source]
Forward Propagation
- Parameters:
bf (tensor.Tensor) – features of raw graph
mask_fts (tensor.Tensor) – mask features
bd (tensor.Tensor) – ppr matrix of diffuse graph
sparse (bool) – if sparse
- Returns:
node embedding of mask features graph h (tensor.Tensor): node embedding of raw graph q (tensor.Tensor): soft assignment
- Return type:
h_mask (tensor.Tensor)
- fit(graph, labels)[source]
Fitting
- Parameters:
graph (dgl.DGLGraph) – graph.
labels (tensor.Tensor) – labels of each node
- get_embedding()[source]
Get the embeddings (graph or node level).
- Returns:
embedding of each node. (torch.Tensor): embedding of graph representations
- Return type:
(torch.Tensor)
- training: bool
egc.model.graph_clustering.disjoint.gmi_kmeans module
GMI Kmeans Graph Clustering
- class egc.model.graph_clustering.disjoint.gmi_kmeans.GMIKmeans(in_features: int, hidden_units: int = 512, n_epochs: int = 550, early_stopping_epoch: int = 20, lr: float = 0.001, l2_coef: float = 0.0, alpha: float = 0.8, beta: float = 1.0, gamma: float = 1.0, activation: str = 'prelu', gcn_depth: int = 2)[source]
Bases:
BaseGMI Kmeans
- Parameters:
in_features (int) – input feature dimension.
hidden_units (int, optional) – hidden units size of gcn. Defaults to 512.
n_epochs (int, optional) – number of embedding training epochs. Defaults to 550.
early_stopping_epoch (int, optional) – early stopping threshold. Defaults to 20.
lr (float, optional) – learning rate. Defaults to 0.001.
l2_coef (float, optional) – weight decay. Defaults to 0.0.
alpha (float, optional) – parameter for I(h_i; x_i). Defaults to 0.8.
beta (float, optional) – parameter for I(h_i; x_j). Defaults to 1.0.
gamma (float, optional) – parameter for I(w_ij; a_ij). Defaults to 1.0.
activation (str, optional) – activation of gcn layer. Defaults to “prelu”.
- fit(features_lil: lil_matrix, adj_csr: csr_matrix, n_clusters: int, neg_list_num: int = 5)[source]
Fit for Specific Graph
- Parameters:
features (sp.lil_matrix) – 2D sparse features.
adj_orig (sp.csr_matrix) – 2D sparse adj.
n_clusters (int) – cluster num.
neg_list_num (int, optional) – negative sample times. Defaults to 5.
egc.model.graph_clustering.disjoint.idec module
DEC / IDEC
Paper: Unsupervised Deep Embedding for Clustering Analysis
Code of the paper author: https://github.com/piiswrong/dec
Code for reference: https://github.com/XifengGuo/IDEC
- class egc.model.graph_clustering.disjoint.idec.IDEC(in_feats: int, out_feats_list: List[int], n_clusters: int, aggregator_type: str = 'gcn', bias: bool = True, batch_size: int = 1024, alpha: float = 1.0, beta: float = 10.0, n_epochs: int = 1000, n_pretrain_epochs: int = 400, lr: float = 0.001, l2_coef: float = 0.0, early_stopping_epoch: int = 20, model_filename: str = 'dec')[source]
Bases:
Base,ModuleDEC / IDEC. Set beta to 0.0 for DEC or to nonzero for IDEC.
- Parameters:
in_feats (int) – Input feature size.
out_feats_list (List[int]) – List of hidden units dimensions.
n_clusters (int) – Num of clusters.
aggregator_type (str, optional) – Aggregator type to use (
mean,gcn,pool,lstm). Defaults to ‘gcn’.bias (bool, optional) – If True, adds a learnable bias to the output. Defaults to True.
batch_size (int, optional) – Batch size. Defaults to 1024.
alpha (float, optional) – Alpha of student-T distribution. Defaults to 1.0.
beta (float, optional) – Coeffecient of reconstruction loss. 0.0 for DEC while nonzero for IDEC. Defaults to 10.0.
n_epochs (int, optional) – Maximum training epochs. Defaults to 1000.
n_pretrain_epochs (int, optional) – Maximum pretraining epochs. Defaults to 400.
lr (float, optional) – Learning Rate. Defaults to 0.001.
l2_coef (float, optional) – Weight decay. Defaults to 0.0.
early_stopping_epoch (int, optional) – Early stopping threshold. Defaults to 20.
model_filename (str, optional) – Path to store model parameters. Defaults to ‘dec’.
- clustering(h: Tensor, device: device = device(type='cpu')) None[source]
Clustering by miniBatchKmeans.
- Parameters:
h (torch.Tensor) – features.
device (torch.device, optional) – torch device. Defaults to torch.device(‘cpu’).
- get_distance(h: Tensor) Tensor[source]
Get the distance sum of all the point to each center.
- Parameters:
h (torch.Tensor) – features.
- Returns:
distance sum of all the point to each center.
- Return type:
distance (torch.Tensor)
- get_t_distribution(h: Tensor) Tuple[Tensor, Tensor][source]
Student t-distribution, as same as used in t-SNE algorithm. q_ij = 1/(1+dist(x_i, u_j)^2), then normalize it.
- Parameters:
h (torch.Tensor) – features.
- Returns:
(distance, q)
- Return type:
Tuple[torch.Tensor, torch.Tensor]
- pretrain(train_loader: NaiveDataLoader, features: Tensor) None[source]
- forward(blocks) Tuple[Tensor, Tensor][source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- fit(graph: DGLGraph, device: device = device(type='cpu')) None[source]
- Parameters:
graph (dgl.DGLGraph) – graph.
device (torch.device, optional) – torch device. Defaults to torch.device(‘cpu’).
egc.model.graph_clustering.disjoint.mnmf module
MNMF implement
- class egc.model.graph_clustering.disjoint.mnmf.MNMF(dimensions=128, clusters=10, lambd=0.2, alpha=0.05, beta=0.05, iterations=200, lower_control=1e-15, eta=5.0)[source]
Bases:
BaseAn implementation of “M-NMF” from the AAAI ‘17 paper “Community Preserving Network Embedding”. The procedure uses joint non-negative matrix factorization with modularity based regularization in order to learn a cluster membership distribution over nodes. The method can be used in an overlapping and non-overlapping way.
- Parameters:
dimensions (int) – Number of dimensions. Default is 128.
clusters (int) – Number of clusters. Default is 10.
lambd (float) – KKT penalty. Default is 0.2
alpha (float) – Clustering penalty. Default is 0.05.
beta (float) – Modularity regularization penalty. Default is 0.05.
iterations (int) – Number of power iterations. Default is 200.
lower_control (float) – Floating point overflow control. Default is 10**-15.
eta (float) – Similarity mixing parameter. Default is 5.0.
- get_memberships()[source]
Getting the cluster membership of nodes.
- Return types:
memberships (dict) - Node cluster memberships.
- get_embedding()[source]
Getting the node embedding.
- Return types:
embedding (Numpy array) - The embedding of nodes.
egc.model.graph_clustering.disjoint.pca_kmeans module
pca_kmeans
- egc.model.graph_clustering.disjoint.pca_kmeans.pca_kmeans(X: ndarray, n_clusters: int, n_components: int | None = None) ndarray[source]
Principal component analysis (PCA).
- Parameters:
X (np.ndarray) – array-like of shape (n_samples, n_features) Training data, where n_samples is the number of samples and n_features is the number of features.
n_clusters (int) – num of clusters.
n_components (int or float or str) – Number of components to keep. Defaults to None.
- Returns:
Community memberships.
- Return type:
np.ndarray
egc.model.graph_clustering.disjoint.sdcn module
SDCN implement
- class egc.model.graph_clustering.disjoint.sdcn.SDCN(graph: DGLGraph, X: FloatTensor, labels: IntTensor, n_input, n_clusters, hidden1: int = 500, hidden2: int = 500, hidden3: int = 200, lr: float = 0.0001, epochs: int = 200, pretrain_lr: float = 0.0005, pretrain_epochs: int = 100, n_z: int = 10, v: int = 1, gpu: int = 0)[source]
Bases:
Module- forward(graph, x)[source]
Calculate the distribution of p,q and z
- Parameters:
graph (dgl.DGLgraph) – graph
x (torch.FloatTensor) – node features
- Returns:
node features after AE reconstruction q (torch.FloatTensor): q-distribution predict (torch.FloatTensor): z-distribution, label predict p (torch.FloatTensor): p-distribution
- Return type:
x_bar (torch.FloatTensor)
- init_cluster_layer_parameter(features, n_init)[source]
Initialize the cluster center
- Parameters:
features (torch.FloatTensor) – node feature
n_init (int) – Number of kmeans iterations
- fit()[source]
Train model
- Returns:
the result of model predict
- Return type:
label_predict (ndarray)
- get_memberships()[source]
Get predicted label
- Parameters:
graph (dgl.DGLGraph) – graph
features (torch.FloatTensor) – node features
Returns:
- training: bool
egc.model.graph_clustering.disjoint.sgc_kmeans module
sgc_kmeans
- class egc.model.graph_clustering.disjoint.sgc_kmeans.SGCKmeans(in_feats: int, n_epochs: int = 400, hidden_units: ~typing.List = [500], lr: float = 0.01, early_stop: int = 10, inner_act: ~typing.Callable = <function SGCKmeans.<lambda>>, n_lin_layers: int = 1, n_gnn_layers: int = 10)[source]
Bases:
BaseGAE Kmeans implement using dgl
- Parameters:
epochs (int, optional) – number of embedding training epochs. Defaults to 200.
n_clusters (int) – cluster num.
fead_dim (int) – dim of features
n_nodes (int) – number of nodes
hidden_dim1 (int) – hidden units size of gcn_1. Defaults to 32.
dropout (int, optional) – Dropout rate (1 - keep probability).
lr (float, optional) – learning rate.. Defaults to 0.001.
early_stop (int, optional) – early stopping threshold. Defaults to 10.
activation (str, optional) – activation of gcn layer_1. Defaults to ‘relu’.
egc.model.graph_clustering.disjoint.vgae_kmeans module
vgae_kmeans
- class egc.model.graph_clustering.disjoint.vgae_kmeans.DGL_VGAEKmeans(epochs: int, n_clusters: int, fead_dim: int, n_nodes: int, hidden_dim1: int = 32, hidden_dim2: int = 16, dropout: float = 0.0, lr: float = 0.01, early_stop: int = 10, activation: str = 'relu')[source]
Bases:
BaseVGAE Kmeans implement using dgl
- Parameters:
epochs (int, optional) – number of embedding training epochs. Defaults to 200.
n_clusters (int) – cluster num.
fead_dim (int) – dim of features
n_nodes (int) – number of nodes
hidden_dim1 (int) – hidden units size of gcn_1. Defaults to 32.
hidden_dim2 (int) – hidden units size of gcn_2. Defaults to 16.
dropout (int, optional) – Dropout rate (1 - keep probability).
lr (float, optional) – learning rate.. Defaults to 0.001.
early_stop (int, optional) – early stopping threshold. Defaults to 10.
activation (str, optional) – activation of gcn layer_1. Defaults to ‘relu’.
- class egc.model.graph_clustering.disjoint.vgae_kmeans.VGAEKmeans(in_features: int, hidden_units_1: int = 32, hidden_units_2: int = 16, n_epochs: int = 400, early_stopping_epoch: int = 20, lr: float = 0.001, l2_coef: float = 0.0, activation: str = 'relu', model_filename: str = 'vgae_kmeans')[source]
Bases:
BaseVGAE Kmeans
- Parameters:
in_features (int) – input feature dimension.
hidden_units_1 (int, optional) – gcn_1 hidden units. Defaults to 32.
hidden_units_2 (int, optional) – gcn_2 hidden units. Defaults to 16.
n_epochs (int, optional) – node embedding epochs. Defaults to 400.
early_stopping_epoch (int, optional) – early stopping epoch number. Defaults to 20.
lr (float, optional) – learning rate. Defaults to 0.001.
l2_coef (float, optional) – l2 weight decay. Defaults to 0.0.
activation (str, optional) – activation of gcn layer. Defaults to ‘relu’.
model_filename – str = ‘vgae_kmeans’,
- fit(features_lil: lil_matrix, adj_csr: csr_matrix, n_clusters: int)[source]
Fit for Specific Graph
- Parameters:
features (sp.lil_matrix) – 2D sparse features.
adj_orig (sp.csr_matrix) – 2D sparse adj.
n_clusters (int) – cluster num.
neg_list_num (int, optional) – negative sample times. Defaults to 5.
egc.model.graph_clustering.disjoint.vgaecd module
VGAECD
- class egc.model.graph_clustering.disjoint.vgaecd.VGAECD(in_features: int, n_clusters: int, alpha: float = 25.0, beta: float = 1.0, hidden_units_1: int = 32, hidden_units_2: int = 16, n_epochs: int = 800, early_stopping_epoch: int = 20, n_epochs_pretrain: int = 200, lr: float = 0.01, l2_coef: float = 0.0, activation: str = 'relu')[source]
Bases:
Base,Module- Parameters:
in_features (int) – input feature dimension.
n_clusters (int) – cluster num.
alpha (float) – coefficient of reconstruction loss. Defaults to 25.0.
beta (float) – coefficient of the loss except reconstruction loss. Defaults to 1.0.
hidden_units_1 (int) – hidden units size of gcn_1. Defaults to 32.
hidden_units_2 (int) – hidden units size of gcn_2. Defaults to 16.
n_epochs (int, optional) – number of embedding training epochs. Defaults to 200.
early_stopping_epoch (int, optional) – early stopping threshold. Defaults to 20.
lr (float, optional) – learning rate. Defaults to 0.01.
l2_coef (float, optional) – weight decay. Defaults to 0.0.
activation (str, optional) – activation of gcn layer_1. Defaults to ‘relu’.
- forward() Tuple[Tensor, Tensor, Tensor][source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- fit(features: lil_matrix, adj_orig: csr_matrix) None[source]
- Parameters:
features (sp.lil_matrix) – 2D sparse features.
adj_orig (sp.csr_matrix) – 2D sparse adj.
Module contents
Graph Clustering Models