egc.utils package

Submodules

egc.utils.ComE_utils module

Utils for ComE model

egc.utils.ComE_utils.chunkize_serial(iterable, chunksize, as_numpy=False)[source]

Return elements from the iterable in chunksize-ed lists. The last returned element may be smaller (if length of collection is not divisible by chunksize).

>>> print(list(grouper(range(10), 3)))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]

egc.utils.ComE_utils.prepare_sentences(model, paths)[source]

Parameters:

model – current model containing the vocabulary and the index
paths – list of the random walks. we have to translate the node to the appropriate index and apply the dropout

Returns:

generator of the paths according to the dropout probability and the correct index

egc.utils.ComE_utils.batch_generator(iterable, batch_size=1)[source]

same as chunkize_serial, but without the usage of an infinite while

Parameters:

iterable – list that we want to convert in batches
batch_size – batch size

class egc.utils.ComE_utils.RepeatCorpusNTimes(corpus, n)[source]

Bases: object

Class used to repeat n-times the same corpus of paths

Parameters:

corpus – list of paths that we want to repeat
n – number of times we want to repeat our corpus

class egc.utils.ComE_utils.Vocab(**kwargs)[source]

Bases: object

A single vocabulary item, used internally for constructing binary trees (incl. both word leaves and inner nodes).

egc.utils.ComE_utils.xavier_normal(size, as_type=<class 'numpy.float32'>, gain=1)[source]

class egc.utils.ComE_utils.WriteWalksToDisk[source]

Bases: object

Used for writing rand walks to disk

write_walks_to_disk(G, filebase, num_paths, path_length, alpha=0, rand=<random.Random object>, num_workers=56)[source]

save the random walks on files so is not needed to perform the walks at each execution

Parameters:

G – graph to walks on
filebase – location where to save the final walks
num_paths – number of walks to do for each node
path_length – lenght of each walks
alpha – restart probability for the random walks
rand – generator of random numbers
num_workers – number of thread used to execute the job

Returns:

egc.utils.ComE_utils.combine_files_iter(file_list)[source]

egc.utils.ComE_utils.count_lines(f)[source]

egc.utils.ComE_utils.build_deepwalk_corpus_iter(G, num_paths, path_length, alpha=0, rand=<random.Random object>)[source]

egc.utils.ComE_utils.count_textfiles(files, workers=1)[source]

egc.utils.ComE_utils.count_words(file)[source]: Counts the word frequences in a list of sentences.

Note

This is a helper function for parallel execution of Vocabulary.from_text method.

egc.utils.ComE_utils.grouper(3, 'abcdefg', 'x') --> ('a', 'b', 'c'), ('d', 'e', 'f'), ('g', 'x', 'x')[source]

egc.utils.ComE_utils.judgeExist(utils_dir)[source]

egc.utils.ComE_utils.initComeEnv()[source]

egc.utils.ComE_utils.getFile(build_dir)[source]

egc.utils.SEComm_utils module

SEComm utils

egc.utils.SEComm_utils.enhance_sim_matrix(C: ndarray, K: int, d: int, alpha: float) → ndarray[source]

Enhance similarity matrix.

Parameters:

C (np.ndarray) – coefficient matrix.
K (int) – number of clusters.
d (int) – dimension of each subspace.
alpha (float) – coefficient.

Returns:

enhanced similarity matrix

Return type:

np.ndarray

egc.utils.SEComm_utils.drop_feature(x, drop_prob)[source]

egc.utils.SEComm_utils.dropout_adj0(g, num_nodes, p=0.5)[source]

egc.utils.SEComm_utils.repeat(n_times)[source]

egc.utils.SEComm_utils.prob_to_one_hot(y_pred)[source]

egc.utils.SEComm_utils.print_statistics(statistics, function_name)[source]

egc.utils.SEComm_utils.label_classification(embeddings, y, ratio)[source]

egc.utils.argparser module

Parse All Model Args

egc.utils.argparser.models: Dict = {'AGC': {'description': 'AGC', 'name': 'AGC', 'paper url': 'https://dl.acm.org/doi/abs/10.1145/3474085.3475276', 'source code': 'https://github.com/karenlatong/AGC-master'}, 'AGCN': {'description': 'AGCN', 'name': 'AGCN', 'paper url': '', 'source code': 'https://github.com/ZhihaoPENG-CityU/MM21---AGCN'}, 'AGE': {'description': 'AGE', 'name': 'AGE', 'paper url': 'https://dl.acm.org/doi/pdf/10.1145/3394486.3403140', 'source code': 'https://github.com/thunlp/AGE'}, 'ComE': {'description': 'ComE', 'name': 'ComE', 'paper url': 'https://dl.acm.org/doi/pdf/10.1145/3132847.3132925', 'source code': 'https://github.com/andompesta/ComE'}, 'CommunityGAN': {'description': 'CommunityGAN', 'name': 'CommunityGAN', 'paper url': 'https://dl.acm.org/doi/pdf/10.1145/3308558.3313564', 'source code': 'https://github.com/SamJia/CommunityGAN'}, 'DAEGC': {'description': 'DAEGC', 'name': 'DAEGC', 'paper url': 'https://www.ijcai.org/Proceedings/2019/0509.pdf', 'source code': 'https://github.com/Tiger101010/DAEGC'}, 'DANMF': {'description': 'DANMF', 'name': 'DANMF', 'paper url': 'https://dl.acm.org/doi/pdf/10.1145/3269206.3271697', 'source code': 'https://github.com/benedekrozemberczki/DANMF'}, 'DFCN': {'description': 'DFCN', 'name': 'DFCN', 'paper url': 'https://arxiv.org/pdf/2012.09600.pdf', 'source code': 'https://github.com/WxTu/DFCN'}, 'GALA': {'description': 'GALA', 'name': 'GALA', 'paper url': 'https://arxiv.org/pdf/1908.02441v1.pdf', 'source code': 'https://github.com/sseung0703/GALA_TF2.0'}, 'GDCL': {'description': 'GDCL', 'name': 'GDCL', 'paper url': 'https://www.ijcai.org/proceedings/2021/0473.pdf', 'source code': 'https://github.com/hzhao98/GDCL'}, 'MNMF': {'description': 'MNMF', 'name': 'MNMF', 'paper url': 'https://ojs.aaai.org/index.php/AAAI/article/view/10488', 'source code': 'https://github.com/AnryYang/M-NMF'}, 'MVGRL': {'description': 'MVGRL', 'name': 'MVGRL', 'paper url': 'https://arxiv.org/abs/2006.05582', 'source code': 'https://github.com/kavehhassani/mvgrl'}, 'SDCN': {'description': 'SDCN', 'name': 'SDCN', 'paper url': 'https://arxiv.org/pdf/2002.01633.pdf', 'source code': 'https://github.com/bdy9527/SDCN'}, 'SEComm': {'description': 'SEComm', 'name': 'SEComm', 'paper url': 'https://proceedings.mlr.press/v161/bandyopadhyay21a/bandyopadhyay21a.pdf', 'source code': 'https://github.com/viz27/SEComm'}, 'SENet_kmeans': {'description': 'SENEet with kmeans', 'name': 'SENet', 'paper url': 'https://www.sciencedirect.com/science/article/pii/S0893608021002227?via%3Dihub', 'source code': ''}, 'SUBLIME': {'description': 'SUBLIME', 'name': 'SUBLIME', 'paper url': 'https://arxiv.org/pdf/2201.06367.pdf', 'source code': 'https://github.com/GRAND-Lab/SUBLIME'}, 'VGAECD': {'description': 'VGAECD', 'name': 'VGAECD', 'paper url': 'https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8594831', 'source code': ''}, 'cc': {'description': 'Contrastive Clustering', 'name': 'CC', 'paper url': 'https://arxiv.org/pdf/2009.09687.pdf', 'source code': 'https://github.com/Yunfan-Li/Contrastive-Clustering'}, 'clusternet': {'description': 'ClusterNet', 'name': 'clusternet', 'paper url': 'https://arxiv.org/abs/1905.13732', 'source code': 'https://github.com/bwilder0/clusternet'}, 'dgi_kmeans': {'description': 'DGI with Kmeans', 'name': 'DGI', 'paper url': 'https://arxiv.org/abs/1809.10341', 'source code': 'https://github.com/PetarV-/DGI'}, 'gae_kmeans': {'description': 'GAE with Kmeans', 'name': 'GAE', 'paper url': 'https://arxiv.org/pdf/1611.07308.pdf', 'source code': 'https://github.com/tkipf/gae'}, 'gmi_kmeans': {'description': 'GMI with Kmeans', 'name': 'GMI', 'paper url': 'https://arxiv.org/pdf/1809.10341.pdf', 'source code': 'https://github.com/zpeng27/GMI'}, 'idec': {'description': 'IDEC', 'name': 'idec', 'paper url': 'https://dl.acm.org/doi/10.5555/3045390.3045442', 'source code': 'https://github.com/piiswrong/dec'}, 'pca_kmeans': {'description': 'PCA with Kmeans.', 'name': 'PCA', 'paper url': '', 'source code': ''}, 'sgc_kmeans': {'description': 'SGC with Kmeans.', 'name': 'SGC', 'paper url': 'https://arxiv.org/pdf/1902.07153.pdf', 'source code': 'https://github.com/Tiiiger/SGC'}, 'vgae_kmeans': {'description': 'VGAE with Kmeans', 'name': 'VGAE', 'paper url': 'https://arxiv.org/pdf/1611.07308.pdf', 'source code': 'https://github.com/tkipf/gae'}}: Info of the models supported.

egc.utils.argparser.parse_all_args() → Namespace[source]

egc.utils.argparser.get_default_args(model: str) → Dict[source]

Get default args of any model supported.

Parameters:: model (str) – name of the model.
Returns:: the default args of the model.
Return type:: Dict

egc.utils.clustering module

Clustering Methods.

egc.utils.clustering.sk_clustering(X: Tensor, n_clusters: int, name: str = 'kmeans') → ndarray[source]

sklearn clustering.

Parameters:

X (torch.Tensor) – data embeddings.
n_clusters (int) – num of clusters.
name (str, optional) – type name. Defaults to ‘kmeans’.

Raises:

NotImplementedError – clustering method not implemented.

Returns:

cluster assignments.

Return type:

np.ndarray

egc.utils.clustering.soft_kmeans_clustering(data: Tensor, miu: Tensor, num_iter: int = 1, cluster_temp: float = 5, dist_type: str = 'cosine_similarity') → Tuple[Tensor, Tensor, Tensor][source]

pytorch (differentiable) implementation of soft k-means clustering.

Parameters:

data (torch.Tensor) – data embeddings.
miu (torch.Tensor, optional) – cluster centers.
num_iter (int, optional) – num of iterations. Defaults to 1.
cluster_temp (float, optional) – softmax temperature. Defaults to 5.
dist_type (str, optional) – distance type. Defaults to ‘cosine_similarity’.

Returns:

[cluster_centers, soft_assignment_matrix, distance]

Return type:

Tuple[torch.Tensor, torch.Tensor, torch.Tensor]

egc.utils.common module

common utils

egc.utils.common.sparse_mx_to_torch_sparse_tensor(sparse_mx: spmatrix) → Tensor[source]

Convert a scipy sparse matrix to a torch sparse tensor

Parameters:: sparse_mx (<class 'scipy.sparse'>) – sparse matrix
Returns:: torch sparse tensor
Return type:: (torch.Tensor)

egc.utils.common.MF(X, dim, name='PCA')[source]

egc.utils.common.tab_printer(args: Dict, thead: List[str] | None = None) → None[source]

Function to print the logs in a nice tabular format.

Parameters:: args (Dict) – Parameters used for the model.

egc.utils.common.make_parent_dirs(target_path: PurePath) → None[source]

make all the parent dirs of the target path.

Parameters:: target_path (PurePath) – target path.

egc.utils.common.refresh_file(target_path: str | None = None) → None[source]

clear target path

Parameters:: target_path (str) – file path

egc.utils.common.csv2file(target_path: str, thead: Tuple[str] | None = None, tbody: Tuple | None = None, refresh: bool = False, is_dict: bool = False) → None[source]

save csv to target_path

Parameters:

target_path (str) – target path
thead (Tuple[str], optional) – csv table header, only written into the file when it is not None and file is empty. Defaults to None.
tbody (Tuple, optional) – csv table content. Defaults to None.
refresh (bool, optional) – whether to clean the file first. Defaults to False.

egc.utils.common.set_seed(seed: int = 4096) → None[source]

Set random seed.

NOTE:!!! conv and neighborSampler of dgl is somehow nondeterministic !!!

Set according to the pytorch doc: https://pytorch.org/docs/1.9.0/notes/randomness.html cudatoolkit doc: https://docs.nvidia.com/cuda/cublas/index.html#cublasApi_reproducibility dgl issue: https://github.com/dmlc/dgl/issues/3302

Parameters:: seed (int, optional) – random seed. Defaults to 4096.

egc.utils.common.set_device(gpu: str = '0') → device[source]

Set torch device.

Parameters:: gpu (str) – args.gpu. Defaults to ‘0’.
Returns:: torch device. device(type=’cuda: x’) or device(type=’cpu’).
Return type:: torch.device

egc.utils.common.print_model_parameters(model: Module) → None[source]

print model parameters.

Parameters:: model (torch.nn.Module) – Torch module.

egc.utils.common.run_subprocess_command(cmd: str, cwd_path: <module 'posixpath' from '/data3/guming/.pyenv/versions/3.8.16/lib/python3.8/posixpath.py'> = None) → None[source]

run shell command in subprocess.

Parameters:

cmd (str) – command string.
cwd_path (os.path, optional) – cwd path to run the cmd. Defaults to None.

egc.utils.common.dump_var(filename: str, variable: Any, relative_path: str = 'tmp') → None[source]

dump var using pickle.

Parameters:

filename (str) – varname.
variable (Any) – variable to dump.
relative_path (str, optional) – relative path of the dir to save the var. Defaults to ‘tmp’.

egc.utils.common.load_var(filename: str, relative_path: str = 'tmp') → Any[source]

load var using pickle.

Parameters:

filename (str) – varname.
relative_path (str, optional) – relative path of the dir to save the var. Defaults to ‘tmp’.

Returns:

variable.

Return type:

Any

egc.utils.common.load_or_dump(filename: str, func: Callable, args: Dict, relative_path: str = 'tmp') → Any[source]

load and return the variable if dumped. Otherwise calculate and dump before return.

Parameters:

filename (str) – varname.
func (Callable) – func to calculate the variable.
args (Dict) – parameter dict for the func.
relative_path (str, optional) – relative path of the dir to save the var. Defaults to ‘tmp’.

Returns:

variable.

Return type:

Any

egc.utils.common.torch_sparse_to_dgl_graph(torch_sparse_mx)[source]

Convert a torch sparse tensor matrix to dgl graph

Parameters:: torch_sparse_mx (torch.Tensor) – torch sparse tensor
Returns:: dgl graph
Return type:: (dgl.graph)

egc.utils.common.dgl_graph_to_torch_sparse(dgl_graph)[source]

egc.utils.construct_DGLgraph module

construct_DGLgraph

egc.utils.construct_DGLgraph.construct_DGLgraph_for_non_graph(x, labels, k=3, method='euclidean')[source]

egc.utils.construct_DGLgraph.construct_DGLgraph_for_non_graph_by_heat(x, labels, k=3)[source]

egc.utils.construct_DGLgraph.construct_DGLgraph_for_graph(x, labels, edges)[source]

egc.utils.construct_DGLgraph.process_edges_info(edges)[source]

egc.utils.construct_DGLgraph.build_graph(features, edges, num_nodes, labels)[source]

egc.utils.danmf_utils module

DANMF implement Repository: https://github.com/benedekrozemberczki/DANMF Author: benedekrozemberczki

egc.utils.danmf_utils.read_graph(args)[source]: Method to read graph and create a target matrix with matrix powers. :param args: Arguments object.

egc.utils.danmf_utils.loss_printer(losses)[source]: Printing the losses for each iteration. :param losses: List of losses in each iteration.

egc.utils.evaluation module

Author: Zhou Sheng Evaluation Metric for Graph Clustering ACC, NMI, ARI, F1 Score

egc.utils.evaluation.purity(y_true, y_pred)[source]

egc.utils.evaluation.best_mapping(labels_true: list, labels_pred: list) → Tuple[array, array][source]

Get best mapping between labels_true and labels_pred.

Parameters:

labels_true (list or np.array) – gnd labels.
labels_pred (list or np.array) – pred labels.

Raises:

ValueError – Labels must be in numpy format!

Returns:

best mapping.

Return type:

Tuple[np.array,np.array]

egc.utils.evaluation.evaluation(labels_true: Tensor, labels_pred: Tensor) → Tuple[float][source]

Clustering evaluation.

Parameters:

labels_true (torch.Tensor or np.ndarray) – Ground Truth Community.
labels_pred (torch.Tensor or np.ndarray) – Predicted Community.

Returns:

(ARI, NMI, AMI, ACC, Micro-F1, Macro-F1, purity)

Return type:

Tuple[float]

egc.utils.graph_diffusion module

utils of MVGRL

egc.utils.graph_diffusion.compute_ppr(adj: ndarray, alpha: float = 0.2, self_loop: bool = True)[source]

Compute Personalized PageRank (PPR) matrix

Parameters:

adj (np.ndarray) – adjacency matrix
alpha (float) – Restart probability,. Defaults to 0.2.
self_loop (bool) – add self loop. Defaults to True.

Returns:

diffusion graph adjacency matrix

Return type:

(np.ndarray)

egc.utils.graph_statistics module

Graph Statistics

egc.utils.graph_statistics.count_label(label: Tensor) → Dict[source]

count label

Parameters:: label (torch.Tensor) – label list Tensor
Returns:: label cnt dict
Return type:: Dict

egc.utils.graph_statistics.get_intra_class_edges(edges: Tuple[ndarray, ndarray], label: List) → Dict[source]

Get the Dict of intra-class edges index list

Parameters:

edges (Tuple[np.ndarray, np.ndarray]) – edges in the format of [(v1,v2,…,vn), (u1,u2,…un))]
label (Listornp.ndarray) – label list

Returns:

edges index list indexed by label

Return type:

Dict

egc.utils.graph_statistics.get_intra_class_mean_distance(embedding: Tensor, label: List) → Dict[source]

Get intra-class Mean distance between node embeddings and community embeddings

Parameters:

embedding (torch.Tensor) – node embedding matrix
label (Listornp.ndarray) – label

Returns:

mean distance matrix

Return type:

torch.Tensor

egc.utils.graph_statistics.get_neighbor_set(edges: Tuple[Tensor, Tensor]) → Dict[source]

get neighbor set from edges tuple

Parameters:: edges (Tuple[torch.Tensor, torch.Tensor]) – edges list
Returns:: neighbor set indexed by node id
Return type:: Dict

egc.utils.graph_statistics.get_motifs_with_one_more_node(motifs: Set[Tuple], neighbor_set: Dict) → Set[Tuple][source]

get motifs recursively

Parameters:

motifs (Set[Tuple]) – motifs set
neighbor_set (Dict) – neighbor set indexed by node id

Returns:

motifs set enlarged with one more node for each motif

Return type:

Set[Tuple]

egc.utils.graph_statistics.get_undireced_motifs(n_nodes: int, motif_size: int, edges: Tuple[Tensor, Tensor]) → Tuple[List[List[Tuple]], Dict, Set[Tuple]][source]

get motifs(n-clique) of undirected graph

Parameters:

n_nodes (int) – node num
motif_size (int) – motif size
edges (Tuple[torch.Tensor, torch.Tensor]) – edges tunple

Returns:

(motif list indexed by node id, neighbor set indexed by node id, set of notifs)

Return type:

Tuple[List[List[Tuple]], Dict, Set[Tuple]]

egc.utils.initialization module

Initialization

egc.utils.initialization.init_weights(module: Module) → None[source]

Init Module Weights

from utils import init_weights
# inside your module, do:
for module in self.modules():
    init_weights(module)

Parameters:: module (nn.Module) –

egc.utils.load_data module

Load dataset with DGL for Graph Clustering Author: Sheng Zhou

egc.utils.load_data.load_data(dataset_name: str, directory='./data') → Tuple[DGLGraph, Tensor, int][source]

Load datasets.

Parameters:

dataset_name (str) – Name of the dataset. Check README.md for supported datasets.
directory (str, optional) – path for the dataset to save. Defaults to ‘./data’.

Raises:

NotImplementedError – dataset not supported

Returns:

graph, label, n_clusters

Return type:

Tuple[dgl.DGLGraph, torch.Tensor, int]

egc.utils.load_data.load_ogb_data(dataset_name, directory='./data')[source]: graph:DGL graph ob+ject label: torch tensor of shape (num_nodes,num_tasks)

egc.utils.load_data.load_dgl_data(dataset_name, directory='./data')[source]: graph:DGL graph object label: form graph.ndata[‘label’]

egc.utils.load_data.allclose(a: Tensor, b: Tensor, rtol: float = 0.0001, atol: float = 0.0001) → bool[source]

This function checks if a and b satisfy the condition: |a - b| <= atol + rtol * |b|

Parameters:

a (torch.Tensor) – first tensor to compare
b (torch.Tensor) – second tensor to compare
rtol (float, optional) – relative tolerance. Defaults to 1e-4.
atol (float, optional) – absolute tolerance. Defaults to 1e-4.

Returns:

True for close, False for not

Return type:

bool

egc.utils.load_data.is_bidirected(g: DGLGraph) → bool[source]

Return whether the graph is a bidirected graph. A graph is bidirected if for any edge $(u, v)$ in $G$ with weight $w$, there exists an edge $(v, u)$ in $G$ with the same weight.

Parameters:: g (dgl.DGLGraph) – dgl.DGLGraph
Returns:: True for bidirected, False for not
Return type:: bool

class egc.utils.load_data.AE_LoadDataset(data)[source]: Bases: Dataset

egc.utils.load_data.load_mat_data2dgl(data_path, verbose=True)[source]

load data from .mat file

Parameters:

data_path (str) – the file to read in
verbose (bool, optional) – print info, by default True

Returns:

the graph read from data_path (torch.Tensor): label of node classes num_classes (int): number of node classes

Return type:

graph (DGL.graph)

egc.utils.load_data.bar_progress(current, total, _)[source]: create this bar_progress method which is invoked automatically from wget

egc.utils.load_data.load_BlogCatalog(raw_dir='./data')[source]

load BlogCatalog dgl graph

Parameters:: raw_dir (str) – Data path. Supports user customization.
Returns:: the graph read from data_path (torch.Tensor): label of node classes num_classes (int): number of node classes
Return type:: graph (DGL.graph)

Examples

>>> graph, label, n_clusters = load_BlogCatalog()

egc.utils.load_data.load_Flickr(raw_dir='./data')[source]

load Flickr dgl graph

Parameters:: raw_dir (str) – Data path. Supports user customization.
Returns:: the graph read from data_path (torch.Tensor): label of node classes num_classes (int): number of node classes
Return type:: graph (DGL.graph)

Examples

>>> graph, label, n_clusters = load_Flickr()

egc.utils.load_data.load_ACM(raw_dir='./data', verbose=True)[source]

load ACM dgl graph

Parameters:

raw_dir (str) – Data path. Supports user customization.
verbose (bool, optional) – print info, by default True

Returns:

the graph read from data_path (torch.Tensor): label of node classes num_classes (int): number of node classes

Return type:

graph (DGL.graph)

Examples

>>> graph, label, n_clusters = load_ACM()

egc.utils.load_data.load_DBLP(raw_dir='./data', verbose=True)[source]

load DBLP dgl graph

Parameters:

raw_dir (str) – Data path. Supports user customization.
verbose (bool, optional) – print info, by default True

Returns:

the graph read from data_path (torch.Tensor): label of node classes num_classes (int): number of node classes

Return type:

graph (DGL.graph)

Examples

>>> graph, label, n_clusters = load_DBLP()

egc.utils.metrics module

Metrics

egc.utils.metrics.get_soft_assignment_matrix(data: Tensor, miu: Tensor, cluster_temp: float = 30, dist_type: str = 'cosine_similarity') → Tensor[source]

Get soft assignment matrix from data points and cluster centers.

Parameters:

data (torch.Tensor) – data embeddings.
miu (torch.Tensor) – cluster center embeddings.
cluster_temp (float, optional) – softmax temperature. Defaults to 30.
dist_type (str, optional) – distance type. Defaults to ‘cosine_similarity’.

Returns:

soft assignment matrix.

Return type:

torch.Tensor

egc.utils.metrics.get_modularity_matrix(adj_nodia: Tensor) → Tensor[source]

Get Modularity Matrix.

\[A_{vw} - \frac{K_vk_w}{2m}\]

Parameters:: adj (torch.Tensor) – adjacency matrix without diag.
Returns:: modularity matrix.
Return type:: torch.Tensor

egc.utils.metrics.get_modularity_value(bin_adj_nodiag: Tensor, r: Tensor, mod: Tensor) → Tensor[source]

Get Modularity.

\[Q(r)=\frac{1}{2m}\sum_{u,v\in V}\sum_{k=1}^K[A_{uv}-\frac{d_ud_v}{2m}]r_{uk}r_{vk}\]

Parameters:

bin_adj_nodiag (torch.Tensor) – n x n. Boolean adj matrix without diag.
r (torch.Tensor) – n x k. Soft assignment probability matrix.
mod (torch.Tensor) – n x n. Modularity matrix.

Returns:

Modularity value.

Return type:

torch.Tensor

egc.utils.model_management module

Model Management

egc.utils.model_management.get_checkpoint_path(model_filename: str) → PurePath[source]

egc.utils.model_management.save_model(model_filename: str, model: Module, optimizer: Optimizer, current_epoch: int, loss: float) → None[source]

Save model, optimizer, current_epoch, loss to checkpoints/${model_filename}.pt.

Parameters:

model_filename (str) – filename to save model.
model (torch.nn.Module) – model.
optimizer (torch.optim.Optimizer) – optimizer.
current_epoch (int) – current epoch.
loss (float) – loss.

egc.utils.model_management.load_model(model_filename: str, model: Module, optimizer: Optimizer) → Tuple[Module, Optimizer, int, float][source]

Load model from checkpoints/${model_filename}.pt.

Parameters:

model_filename (str) – filename to load model.
model (torch.nn.Module) – model.
optimizer (torch.optim.Optimizer) – optimizer.

Returns:

[model, optimizer, epoch, loss]

Return type:

Tuple[torch.nn.Module, torch.optim.Optimizer, int, float]

egc.utils.normalization module

Normalization Utils

egc.utils.normalization.normalize_feature(features: lil_matrix) → array[source]

Row-normalize feature matrix.

Parameters:: features (scipy.sparse.lil.lil_matrix) – 2D sparse features
Returns:: 2D row-normalized features
Return type:: features_norm (numpy.matrix)

egc.utils.normalization.symmetrically_normalize_adj(adj: csr_matrix) → coo_matrix[source]

Symmetrically normalize adjacency matrix.

Parameters:: adj (scipy.sparse.csr.csr_matrix) – 2D sparse adjacency matrix
Returns:: 2D Symmetrically normalized sparse adjacency matrix
Return type:: daj_norm (scipy.sparse.coo.coo_matrix)

egc.utils.normalization.asymmetric_normalize_adj(adj, loop=True)[source]

Get convolution operator

Parameters:

adj (ndarray) – the adjacency matrix of improved graph
loop – add self loop

egc.utils.normalization.normalize_sublime(adj, mode, sparse=False)[source]

Normalize adjacency matrix for SUBLIME model

Parameters:

adj – adjacency matrix
mode (str) – mode of normalize adjacency matrix
sparse (boolean,optional) – if use sparse. Defaults to False.

Returns:

adj after normalize

egc.utils.sampling module

Sample Method

egc.utils.sampling.get_repeat_shuffle_nodes_list(n_nodes, sample_times)[source]

Get Negative Sample Nodes List By Repeatable Shuffle

Parameters:

n_nodes (int) – node number in all.
sample_times (int) – sample times.

Returns:

list of multiple repeatable nodes index shuffle lists.

Return type:

(List)

egc.utils.sampling.normal_reparameterize(mu: Tensor, logvar: Tensor, training: bool = True) → Tensor[source]

Reparameterization trick for normal distribution

Parameters:

mu (torch.Tensor) – mu
logvar (torch.Tensor) – logsigma
training (bool) – isTraining

Returns:

(torch.Tensor)

egc.utils.sampling.agm(x: ndarray) → ndarray[source]

AGM probability

Parameters:: x (np.ndarray) – 1-d array
Returns:: AGM probability
Return type:: np.ndarray

egc.utils.sampling.choice(samples: List[int], weight: ndarray) → int[source]

choose next node

Parameters:

samples (List[int]) – neighbors
weight (np.ndarray) – wights

Returns:

node chosen

Return type:

int

class egc.utils.sampling.CommunityGANSampling(n_threads: int, args: Tuple[int, int, bool], motif_size: int, total_motifs: List[List[Tuple]], theta_g: ndarray, neighbor_set: Dict)[source]

Bases: object

CommunityGAN Sampling

Parameters:

n_threads (int) – cores of multiprocessing.
args (Tuple[int, int, bool]) – root, n_sample, only_neg. root (int): root node id n_sample (int): num of motif sampled only_neg (bool): only return negative samples
motif_size (int) – motif size.
total_motifs (List[List[Tuple]]) – list of all motifs indexed by node id.
theta_g (np.ndarray) – node embedding of generator.
neighbor_set (Dict) – neighbor set Dict indexed by node id.

g_v(roots: List[int]) → Tuple[int, List[int]][source]

get next node

Parameters:: roots (List[int]) – list of node sampled before
Returns:: current_node, path walked
Return type:: Tuple[int, List[int]]

g_s(args: Tuple[int, int, bool]) → Tuple[List[Tuple], List[List[int]]][source]

sampling for community gan generator

Parameters:: args (Tuple[int, int, bool]) – root, n_sample, only_neg root (int): root node id n_sample (int): num of motif sampled only_neg (bool): only return negative samples
Returns:: motifs, paths
Return type:: Tuple[List[Tuple], List[List[int]]]

run() → Tuple[List[Tuple], List[List[int]]][source]

sampling for community gan

Returns:: motifs, paths.
Return type:: Tuple[List[Tuple], List[List[int]]]

egc.utils.sublime_utils module

Utils for SUBLIME model

egc.utils.sublime_utils.nearest_neighbors_pre_elu(X, k, metric, i)[source]

egc.utils.sublime_utils.knn_fast(X, k, b)[source]

egc.utils.sublime_utils.apply_non_linearity(tensor, non_linearity, i)[source]

egc.utils.sublime_utils.cal_similarity_graph(node_embeddings)[source]

egc.utils.sublime_utils.top_k(raw_graph, K)[source]

egc.utils.sublime_utils.get_feat_mask(features, mask_rate)[source]

egc.utils.sublime_utils.symmetrize(adj)[source]

egc.utils.sublime_utils.split_batch(init_list, batch_size)[source]

Module contents

Utils

egc.utils package

Subpackages

Submodules

egc.utils.ComE_utils module

egc.utils.SEComm_utils module

egc.utils.argparser module

egc.utils.clustering module

egc.utils.common module

egc.utils.construct_DGLgraph module

egc.utils.danmf_utils module

egc.utils.evaluation module

egc.utils.graph_diffusion module

egc.utils.graph_statistics module

egc.utils.initialization module

egc.utils.load_data module

egc.utils.metrics module

egc.utils.model_management module

egc.utils.normalization module

egc.utils.sampling module

egc.utils.sublime_utils module

Module contents