egc.utils package

Subpackages

Submodules

egc.utils.ComE_utils module

Utils for ComE model

egc.utils.ComE_utils.chunkize_serial(iterable, chunksize, as_numpy=False)[source]

Return elements from the iterable in chunksize-ed lists. The last returned element may be smaller (if length of collection is not divisible by chunksize).

>>> print(list(grouper(range(10), 3)))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]
egc.utils.ComE_utils.prepare_sentences(model, paths)[source]
Parameters:
  • model – current model containing the vocabulary and the index

  • paths – list of the random walks. we have to translate the node to the appropriate index and apply the dropout

Returns:

generator of the paths according to the dropout probability and the correct index

egc.utils.ComE_utils.batch_generator(iterable, batch_size=1)[source]

same as chunkize_serial, but without the usage of an infinite while

Parameters:
  • iterable – list that we want to convert in batches

  • batch_size – batch size

class egc.utils.ComE_utils.RepeatCorpusNTimes(corpus, n)[source]

Bases: object

Class used to repeat n-times the same corpus of paths

Parameters:
  • corpus – list of paths that we want to repeat

  • n – number of times we want to repeat our corpus

class egc.utils.ComE_utils.Vocab(**kwargs)[source]

Bases: object

A single vocabulary item, used internally for constructing binary trees (incl. both word leaves and inner nodes).

egc.utils.ComE_utils.xavier_normal(size, as_type=<class 'numpy.float32'>, gain=1)[source]
class egc.utils.ComE_utils.WriteWalksToDisk[source]

Bases: object

Used for writing rand walks to disk

write_walks_to_disk(G, filebase, num_paths, path_length, alpha=0, rand=<random.Random object>, num_workers=56)[source]

save the random walks on files so is not needed to perform the walks at each execution

Parameters:
  • G – graph to walks on

  • filebase – location where to save the final walks

  • num_paths – number of walks to do for each node

  • path_length – lenght of each walks

  • alpha – restart probability for the random walks

  • rand – generator of random numbers

  • num_workers – number of thread used to execute the job

Returns:

egc.utils.ComE_utils.combine_files_iter(file_list)[source]
egc.utils.ComE_utils.count_lines(f)[source]
egc.utils.ComE_utils.build_deepwalk_corpus_iter(G, num_paths, path_length, alpha=0, rand=<random.Random object>)[source]
egc.utils.ComE_utils.count_textfiles(files, workers=1)[source]
egc.utils.ComE_utils.count_words(file)[source]

Counts the word frequences in a list of sentences.

Note

This is a helper function for parallel execution of Vocabulary.from_text method.

egc.utils.ComE_utils.grouper(3, 'abcdefg', 'x') --> ('a', 'b', 'c'), ('d', 'e', 'f'), ('g', 'x', 'x')[source]
egc.utils.ComE_utils.judgeExist(utils_dir)[source]
egc.utils.ComE_utils.initComeEnv()[source]
egc.utils.ComE_utils.getFile(build_dir)[source]

egc.utils.SEComm_utils module

SEComm utils

egc.utils.SEComm_utils.enhance_sim_matrix(C: ndarray, K: int, d: int, alpha: float) ndarray[source]

Enhance similarity matrix.

Parameters:
  • C (np.ndarray) – coefficient matrix.

  • K (int) – number of clusters.

  • d (int) – dimension of each subspace.

  • alpha (float) – coefficient.

Returns:

enhanced similarity matrix

Return type:

np.ndarray

egc.utils.SEComm_utils.drop_feature(x, drop_prob)[source]
egc.utils.SEComm_utils.dropout_adj0(g, num_nodes, p=0.5)[source]
egc.utils.SEComm_utils.repeat(n_times)[source]
egc.utils.SEComm_utils.prob_to_one_hot(y_pred)[source]
egc.utils.SEComm_utils.print_statistics(statistics, function_name)[source]
egc.utils.SEComm_utils.label_classification(embeddings, y, ratio)[source]

egc.utils.argparser module

Parse All Model Args

egc.utils.argparser.models: Dict = {'AGC': {'description': 'AGC', 'name': 'AGC', 'paper url': 'https://dl.acm.org/doi/abs/10.1145/3474085.3475276', 'source code': 'https://github.com/karenlatong/AGC-master'}, 'AGCN': {'description': 'AGCN', 'name': 'AGCN', 'paper url': '', 'source code': 'https://github.com/ZhihaoPENG-CityU/MM21---AGCN'}, 'AGE': {'description': 'AGE', 'name': 'AGE', 'paper url': 'https://dl.acm.org/doi/pdf/10.1145/3394486.3403140', 'source code': 'https://github.com/thunlp/AGE'}, 'ComE': {'description': 'ComE', 'name': 'ComE', 'paper url': 'https://dl.acm.org/doi/pdf/10.1145/3132847.3132925', 'source code': 'https://github.com/andompesta/ComE'}, 'CommunityGAN': {'description': 'CommunityGAN', 'name': 'CommunityGAN', 'paper url': 'https://dl.acm.org/doi/pdf/10.1145/3308558.3313564', 'source code': 'https://github.com/SamJia/CommunityGAN'}, 'DAEGC': {'description': 'DAEGC', 'name': 'DAEGC', 'paper url': 'https://www.ijcai.org/Proceedings/2019/0509.pdf', 'source code': 'https://github.com/Tiger101010/DAEGC'}, 'DANMF': {'description': 'DANMF', 'name': 'DANMF', 'paper url': 'https://dl.acm.org/doi/pdf/10.1145/3269206.3271697', 'source code': 'https://github.com/benedekrozemberczki/DANMF'}, 'DFCN': {'description': 'DFCN', 'name': 'DFCN', 'paper url': 'https://arxiv.org/pdf/2012.09600.pdf', 'source code': 'https://github.com/WxTu/DFCN'}, 'GALA': {'description': 'GALA', 'name': 'GALA', 'paper url': 'https://arxiv.org/pdf/1908.02441v1.pdf', 'source code': 'https://github.com/sseung0703/GALA_TF2.0'}, 'GDCL': {'description': 'GDCL', 'name': 'GDCL', 'paper url': 'https://www.ijcai.org/proceedings/2021/0473.pdf', 'source code': 'https://github.com/hzhao98/GDCL'}, 'MNMF': {'description': 'MNMF', 'name': 'MNMF', 'paper url': 'https://ojs.aaai.org/index.php/AAAI/article/view/10488', 'source code': 'https://github.com/AnryYang/M-NMF'}, 'MVGRL': {'description': 'MVGRL', 'name': 'MVGRL', 'paper url': 'https://arxiv.org/abs/2006.05582', 'source code': 'https://github.com/kavehhassani/mvgrl'}, 'SDCN': {'description': 'SDCN', 'name': 'SDCN', 'paper url': 'https://arxiv.org/pdf/2002.01633.pdf', 'source code': 'https://github.com/bdy9527/SDCN'}, 'SEComm': {'description': 'SEComm', 'name': 'SEComm', 'paper url': 'https://proceedings.mlr.press/v161/bandyopadhyay21a/bandyopadhyay21a.pdf', 'source code': 'https://github.com/viz27/SEComm'}, 'SENet_kmeans': {'description': 'SENEet with kmeans', 'name': 'SENet', 'paper url': 'https://www.sciencedirect.com/science/article/pii/S0893608021002227?via%3Dihub', 'source code': ''}, 'SUBLIME': {'description': 'SUBLIME', 'name': 'SUBLIME', 'paper url': 'https://arxiv.org/pdf/2201.06367.pdf', 'source code': 'https://github.com/GRAND-Lab/SUBLIME'}, 'VGAECD': {'description': 'VGAECD', 'name': 'VGAECD', 'paper url': 'https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8594831', 'source code': ''}, 'cc': {'description': 'Contrastive Clustering', 'name': 'CC', 'paper url': 'https://arxiv.org/pdf/2009.09687.pdf', 'source code': 'https://github.com/Yunfan-Li/Contrastive-Clustering'}, 'clusternet': {'description': 'ClusterNet', 'name': 'clusternet', 'paper url': 'https://arxiv.org/abs/1905.13732', 'source code': 'https://github.com/bwilder0/clusternet'}, 'dgi_kmeans': {'description': 'DGI with Kmeans', 'name': 'DGI', 'paper url': 'https://arxiv.org/abs/1809.10341', 'source code': 'https://github.com/PetarV-/DGI'}, 'gae_kmeans': {'description': 'GAE with Kmeans', 'name': 'GAE', 'paper url': 'https://arxiv.org/pdf/1611.07308.pdf', 'source code': 'https://github.com/tkipf/gae'}, 'gmi_kmeans': {'description': 'GMI with Kmeans', 'name': 'GMI', 'paper url': 'https://arxiv.org/pdf/1809.10341.pdf', 'source code': 'https://github.com/zpeng27/GMI'}, 'idec': {'description': 'IDEC', 'name': 'idec', 'paper url': 'https://dl.acm.org/doi/10.5555/3045390.3045442', 'source code': 'https://github.com/piiswrong/dec'}, 'pca_kmeans': {'description': 'PCA with Kmeans.', 'name': 'PCA', 'paper url': '', 'source code': ''}, 'sgc_kmeans': {'description': 'SGC with Kmeans.', 'name': 'SGC', 'paper url': 'https://arxiv.org/pdf/1902.07153.pdf', 'source code': 'https://github.com/Tiiiger/SGC'}, 'vgae_kmeans': {'description': 'VGAE with Kmeans', 'name': 'VGAE', 'paper url': 'https://arxiv.org/pdf/1611.07308.pdf', 'source code': 'https://github.com/tkipf/gae'}}

Info of the models supported.

egc.utils.argparser.parse_all_args() Namespace[source]
egc.utils.argparser.get_default_args(model: str) Dict[source]

Get default args of any model supported.

Parameters:

model (str) – name of the model.

Returns:

the default args of the model.

Return type:

Dict

egc.utils.clustering module

Clustering Methods.

egc.utils.clustering.sk_clustering(X: Tensor, n_clusters: int, name: str = 'kmeans') ndarray[source]

sklearn clustering.

Parameters:
  • X (torch.Tensor) – data embeddings.

  • n_clusters (int) – num of clusters.

  • name (str, optional) – type name. Defaults to ‘kmeans’.

Raises:

NotImplementedError – clustering method not implemented.

Returns:

cluster assignments.

Return type:

np.ndarray

egc.utils.clustering.soft_kmeans_clustering(data: Tensor, miu: Tensor, num_iter: int = 1, cluster_temp: float = 5, dist_type: str = 'cosine_similarity') Tuple[Tensor, Tensor, Tensor][source]

pytorch (differentiable) implementation of soft k-means clustering.

Parameters:
  • data (torch.Tensor) – data embeddings.

  • miu (torch.Tensor, optional) – cluster centers.

  • num_iter (int, optional) – num of iterations. Defaults to 1.

  • cluster_temp (float, optional) – softmax temperature. Defaults to 5.

  • dist_type (str, optional) – distance type. Defaults to ‘cosine_similarity’.

Returns:

[cluster_centers, soft_assignment_matrix, distance]

Return type:

Tuple[torch.Tensor, torch.Tensor, torch.Tensor]

egc.utils.common module

common utils

egc.utils.common.sparse_mx_to_torch_sparse_tensor(sparse_mx: spmatrix) Tensor[source]

Convert a scipy sparse matrix to a torch sparse tensor

Parameters:

sparse_mx (<class 'scipy.sparse'>) – sparse matrix

Returns:

torch sparse tensor

Return type:

(torch.Tensor)

egc.utils.common.MF(X, dim, name='PCA')[source]
egc.utils.common.tab_printer(args: Dict, thead: List[str] | None = None) None[source]

Function to print the logs in a nice tabular format.

Parameters:

args (Dict) – Parameters used for the model.

egc.utils.common.make_parent_dirs(target_path: PurePath) None[source]

make all the parent dirs of the target path.

Parameters:

target_path (PurePath) – target path.

egc.utils.common.refresh_file(target_path: str | None = None) None[source]

clear target path

Parameters:

target_path (str) – file path

egc.utils.common.csv2file(target_path: str, thead: Tuple[str] | None = None, tbody: Tuple | None = None, refresh: bool = False, is_dict: bool = False) None[source]

save csv to target_path

Parameters:
  • target_path (str) – target path

  • thead (Tuple[str], optional) – csv table header, only written into the file when it is not None and file is empty. Defaults to None.

  • tbody (Tuple, optional) – csv table content. Defaults to None.

  • refresh (bool, optional) – whether to clean the file first. Defaults to False.

egc.utils.common.set_seed(seed: int = 4096) None[source]

Set random seed.

NOTE:!!! conv and neighborSampler of dgl is somehow nondeterministic !!!

Set according to the pytorch doc: https://pytorch.org/docs/1.9.0/notes/randomness.html cudatoolkit doc: https://docs.nvidia.com/cuda/cublas/index.html#cublasApi_reproducibility dgl issue: https://github.com/dmlc/dgl/issues/3302

Parameters:

seed (int, optional) – random seed. Defaults to 4096.

egc.utils.common.set_device(gpu: str = '0') device[source]

Set torch device.

Parameters:

gpu (str) – args.gpu. Defaults to ‘0’.

Returns:

torch device. device(type=’cuda: x’) or device(type=’cpu’).

Return type:

torch.device

egc.utils.common.print_model_parameters(model: Module) None[source]

print model parameters.

Parameters:

model (torch.nn.Module) – Torch module.

egc.utils.common.run_subprocess_command(cmd: str, cwd_path: <module 'posixpath' from '/data3/guming/.pyenv/versions/3.8.16/lib/python3.8/posixpath.py'> = None) None[source]

run shell command in subprocess.

Parameters:
  • cmd (str) – command string.

  • cwd_path (os.path, optional) – cwd path to run the cmd. Defaults to None.

egc.utils.common.dump_var(filename: str, variable: Any, relative_path: str = 'tmp') None[source]

dump var using pickle.

Parameters:
  • filename (str) – varname.

  • variable (Any) – variable to dump.

  • relative_path (str, optional) – relative path of the dir to save the var. Defaults to ‘tmp’.

egc.utils.common.load_var(filename: str, relative_path: str = 'tmp') Any[source]

load var using pickle.

Parameters:
  • filename (str) – varname.

  • relative_path (str, optional) – relative path of the dir to save the var. Defaults to ‘tmp’.

Returns:

variable.

Return type:

Any

egc.utils.common.load_or_dump(filename: str, func: Callable, args: Dict, relative_path: str = 'tmp') Any[source]

load and return the variable if dumped. Otherwise calculate and dump before return.

Parameters:
  • filename (str) – varname.

  • func (Callable) – func to calculate the variable.

  • args (Dict) – parameter dict for the func.

  • relative_path (str, optional) – relative path of the dir to save the var. Defaults to ‘tmp’.

Returns:

variable.

Return type:

Any

egc.utils.common.torch_sparse_to_dgl_graph(torch_sparse_mx)[source]

Convert a torch sparse tensor matrix to dgl graph

Parameters:

torch_sparse_mx (torch.Tensor) – torch sparse tensor

Returns:

dgl graph

Return type:

(dgl.graph)

egc.utils.common.dgl_graph_to_torch_sparse(dgl_graph)[source]

egc.utils.construct_DGLgraph module

construct_DGLgraph

egc.utils.construct_DGLgraph.construct_DGLgraph_for_non_graph(x, labels, k=3, method='euclidean')[source]
egc.utils.construct_DGLgraph.construct_DGLgraph_for_non_graph_by_heat(x, labels, k=3)[source]
egc.utils.construct_DGLgraph.construct_DGLgraph_for_graph(x, labels, edges)[source]
egc.utils.construct_DGLgraph.process_edges_info(edges)[source]
egc.utils.construct_DGLgraph.build_graph(features, edges, num_nodes, labels)[source]

egc.utils.danmf_utils module

DANMF implement Repository: https://github.com/benedekrozemberczki/DANMF Author: benedekrozemberczki

egc.utils.danmf_utils.read_graph(args)[source]

Method to read graph and create a target matrix with matrix powers. :param args: Arguments object.

egc.utils.danmf_utils.loss_printer(losses)[source]

Printing the losses for each iteration. :param losses: List of losses in each iteration.

egc.utils.evaluation module

Author: Zhou Sheng Evaluation Metric for Graph Clustering ACC, NMI, ARI, F1 Score

egc.utils.evaluation.purity(y_true, y_pred)[source]
egc.utils.evaluation.best_mapping(labels_true: list, labels_pred: list) Tuple[array, array][source]

Get best mapping between labels_true and labels_pred.

Parameters:
  • labels_true (list or np.array) – gnd labels.

  • labels_pred (list or np.array) – pred labels.

Raises:

ValueError – Labels must be in numpy format!

Returns:

best mapping.

Return type:

Tuple[np.array,np.array]

egc.utils.evaluation.evaluation(labels_true: Tensor, labels_pred: Tensor) Tuple[float][source]

Clustering evaluation.

Parameters:
  • labels_true (torch.Tensor or np.ndarray) – Ground Truth Community.

  • labels_pred (torch.Tensor or np.ndarray) – Predicted Community.

Returns:

(ARI, NMI, AMI, ACC, Micro-F1, Macro-F1, purity)

Return type:

Tuple[float]

egc.utils.graph_diffusion module

utils of MVGRL

egc.utils.graph_diffusion.compute_ppr(adj: ndarray, alpha: float = 0.2, self_loop: bool = True)[source]

Compute Personalized PageRank (PPR) matrix

Parameters:
  • adj (np.ndarray) – adjacency matrix

  • alpha (float) – Restart probability,. Defaults to 0.2.

  • self_loop (bool) – add self loop. Defaults to True.

Returns:

diffusion graph adjacency matrix

Return type:

(np.ndarray)

egc.utils.graph_statistics module

Graph Statistics

egc.utils.graph_statistics.count_label(label: Tensor) Dict[source]

count label

Parameters:

label (torch.Tensor) – label list Tensor

Returns:

label cnt dict

Return type:

Dict

egc.utils.graph_statistics.get_intra_class_edges(edges: Tuple[ndarray, ndarray], label: List) Dict[source]

Get the Dict of intra-class edges index list

Parameters:
  • edges (Tuple[np.ndarray, np.ndarray]) – edges in the format of [(v1,v2,…,vn), (u1,u2,…un))]

  • label (Listornp.ndarray) – label list

Returns:

edges index list indexed by label

Return type:

Dict

egc.utils.graph_statistics.get_intra_class_mean_distance(embedding: Tensor, label: List) Dict[source]

Get intra-class Mean distance between node embeddings and community embeddings

Parameters:
  • embedding (torch.Tensor) – node embedding matrix

  • label (Listornp.ndarray) – label

Returns:

mean distance matrix

Return type:

torch.Tensor

egc.utils.graph_statistics.get_neighbor_set(edges: Tuple[Tensor, Tensor]) Dict[source]

get neighbor set from edges tuple

Parameters:

edges (Tuple[torch.Tensor, torch.Tensor]) – edges list

Returns:

neighbor set indexed by node id

Return type:

Dict

egc.utils.graph_statistics.get_motifs_with_one_more_node(motifs: Set[Tuple], neighbor_set: Dict) Set[Tuple][source]

get motifs recursively

Parameters:
  • motifs (Set[Tuple]) – motifs set

  • neighbor_set (Dict) – neighbor set indexed by node id

Returns:

motifs set enlarged with one more node for each motif

Return type:

Set[Tuple]

egc.utils.graph_statistics.get_undireced_motifs(n_nodes: int, motif_size: int, edges: Tuple[Tensor, Tensor]) Tuple[List[List[Tuple]], Dict, Set[Tuple]][source]

get motifs(n-clique) of undirected graph

Parameters:
  • n_nodes (int) – node num

  • motif_size (int) – motif size

  • edges (Tuple[torch.Tensor, torch.Tensor]) – edges tunple

Returns:

(motif list indexed by node id, neighbor set indexed by node id, set of notifs)

Return type:

Tuple[List[List[Tuple]], Dict, Set[Tuple]]

egc.utils.initialization module

Initialization

egc.utils.initialization.init_weights(module: Module) None[source]

Init Module Weights

from utils import init_weights
# inside your module, do:
for module in self.modules():
    init_weights(module)
Parameters:

module (nn.Module) –

egc.utils.load_data module

Load dataset with DGL for Graph Clustering Author: Sheng Zhou

egc.utils.load_data.load_data(dataset_name: str, directory='./data') Tuple[DGLGraph, Tensor, int][source]

Load datasets.

Parameters:
  • dataset_name (str) – Name of the dataset. Check README.md for supported datasets.

  • directory (str, optional) – path for the dataset to save. Defaults to ‘./data’.

Raises:

NotImplementedError – dataset not supported

Returns:

graph, label, n_clusters

Return type:

Tuple[dgl.DGLGraph, torch.Tensor, int]

egc.utils.load_data.load_ogb_data(dataset_name, directory='./data')[source]

graph:DGL graph ob+ject label: torch tensor of shape (num_nodes,num_tasks)

egc.utils.load_data.load_dgl_data(dataset_name, directory='./data')[source]

graph:DGL graph object label: form graph.ndata[‘label’]

egc.utils.load_data.allclose(a: Tensor, b: Tensor, rtol: float = 0.0001, atol: float = 0.0001) bool[source]

This function checks if a and b satisfy the condition: |a - b| <= atol + rtol * |b|

Parameters:
  • a (torch.Tensor) – first tensor to compare

  • b (torch.Tensor) – second tensor to compare

  • rtol (float, optional) – relative tolerance. Defaults to 1e-4.

  • atol (float, optional) – absolute tolerance. Defaults to 1e-4.

Returns:

True for close, False for not

Return type:

bool

egc.utils.load_data.is_bidirected(g: DGLGraph) bool[source]

Return whether the graph is a bidirected graph. A graph is bidirected if for any edge \((u, v)\) in \(G\) with weight \(w\), there exists an edge \((v, u)\) in \(G\) with the same weight.

Parameters:

g (dgl.DGLGraph) – dgl.DGLGraph

Returns:

True for bidirected, False for not

Return type:

bool

class egc.utils.load_data.AE_LoadDataset(data)[source]

Bases: Dataset

egc.utils.load_data.load_mat_data2dgl(data_path, verbose=True)[source]

load data from .mat file

Parameters:
  • data_path (str) – the file to read in

  • verbose (bool, optional) – print info, by default True

Returns:

the graph read from data_path (torch.Tensor): label of node classes num_classes (int): number of node classes

Return type:

graph (DGL.graph)

egc.utils.load_data.bar_progress(current, total, _)[source]

create this bar_progress method which is invoked automatically from wget

egc.utils.load_data.load_BlogCatalog(raw_dir='./data')[source]

load BlogCatalog dgl graph

Parameters:

raw_dir (str) – Data path. Supports user customization.

Returns:

the graph read from data_path (torch.Tensor): label of node classes num_classes (int): number of node classes

Return type:

graph (DGL.graph)

Examples

>>> graph, label, n_clusters = load_BlogCatalog()
egc.utils.load_data.load_Flickr(raw_dir='./data')[source]

load Flickr dgl graph

Parameters:

raw_dir (str) – Data path. Supports user customization.

Returns:

the graph read from data_path (torch.Tensor): label of node classes num_classes (int): number of node classes

Return type:

graph (DGL.graph)

Examples

>>> graph, label, n_clusters = load_Flickr()
egc.utils.load_data.load_ACM(raw_dir='./data', verbose=True)[source]

load ACM dgl graph

Parameters:
  • raw_dir (str) – Data path. Supports user customization.

  • verbose (bool, optional) – print info, by default True

Returns:

the graph read from data_path (torch.Tensor): label of node classes num_classes (int): number of node classes

Return type:

graph (DGL.graph)

Examples

>>> graph, label, n_clusters = load_ACM()
egc.utils.load_data.load_DBLP(raw_dir='./data', verbose=True)[source]

load DBLP dgl graph

Parameters:
  • raw_dir (str) – Data path. Supports user customization.

  • verbose (bool, optional) – print info, by default True

Returns:

the graph read from data_path (torch.Tensor): label of node classes num_classes (int): number of node classes

Return type:

graph (DGL.graph)

Examples

>>> graph, label, n_clusters = load_DBLP()

egc.utils.metrics module

Metrics

egc.utils.metrics.get_soft_assignment_matrix(data: Tensor, miu: Tensor, cluster_temp: float = 30, dist_type: str = 'cosine_similarity') Tensor[source]

Get soft assignment matrix from data points and cluster centers.

Parameters:
  • data (torch.Tensor) – data embeddings.

  • miu (torch.Tensor) – cluster center embeddings.

  • cluster_temp (float, optional) – softmax temperature. Defaults to 30.

  • dist_type (str, optional) – distance type. Defaults to ‘cosine_similarity’.

Returns:

soft assignment matrix.

Return type:

torch.Tensor

egc.utils.metrics.get_modularity_matrix(adj_nodia: Tensor) Tensor[source]

Get Modularity Matrix.

\[A_{vw} - \frac{K_vk_w}{2m}\]
Parameters:

adj (torch.Tensor) – adjacency matrix without diag.

Returns:

modularity matrix.

Return type:

torch.Tensor

egc.utils.metrics.get_modularity_value(bin_adj_nodiag: Tensor, r: Tensor, mod: Tensor) Tensor[source]

Get Modularity.

\[Q(r)=\frac{1}{2m}\sum_{u,v\in V}\sum_{k=1}^K[A_{uv}-\frac{d_ud_v}{2m}]r_{uk}r_{vk}\]
Parameters:
  • bin_adj_nodiag (torch.Tensor) – n x n. Boolean adj matrix without diag.

  • r (torch.Tensor) – n x k. Soft assignment probability matrix.

  • mod (torch.Tensor) – n x n. Modularity matrix.

Returns:

Modularity value.

Return type:

torch.Tensor

egc.utils.model_management module

Model Management

egc.utils.model_management.get_checkpoint_path(model_filename: str) PurePath[source]
egc.utils.model_management.save_model(model_filename: str, model: Module, optimizer: Optimizer, current_epoch: int, loss: float) None[source]

Save model, optimizer, current_epoch, loss to checkpoints/${model_filename}.pt.

Parameters:
  • model_filename (str) – filename to save model.

  • model (torch.nn.Module) – model.

  • optimizer (torch.optim.Optimizer) – optimizer.

  • current_epoch (int) – current epoch.

  • loss (float) – loss.

egc.utils.model_management.load_model(model_filename: str, model: Module, optimizer: Optimizer) Tuple[Module, Optimizer, int, float][source]

Load model from checkpoints/${model_filename}.pt.

Parameters:
  • model_filename (str) – filename to load model.

  • model (torch.nn.Module) – model.

  • optimizer (torch.optim.Optimizer) – optimizer.

Returns:

[model, optimizer, epoch, loss]

Return type:

Tuple[torch.nn.Module, torch.optim.Optimizer, int, float]

egc.utils.normalization module

Normalization Utils

egc.utils.normalization.normalize_feature(features: lil_matrix) array[source]

Row-normalize feature matrix.

Parameters:

features (scipy.sparse.lil.lil_matrix) – 2D sparse features

Returns:

2D row-normalized features

Return type:

features_norm (numpy.matrix)

egc.utils.normalization.symmetrically_normalize_adj(adj: csr_matrix) coo_matrix[source]

Symmetrically normalize adjacency matrix.

Parameters:

adj (scipy.sparse.csr.csr_matrix) – 2D sparse adjacency matrix

Returns:

2D Symmetrically normalized sparse adjacency matrix

Return type:

daj_norm (scipy.sparse.coo.coo_matrix)

egc.utils.normalization.asymmetric_normalize_adj(adj, loop=True)[source]

Get convolution operator

Parameters:
  • adj (ndarray) – the adjacency matrix of improved graph

  • loop – add self loop

egc.utils.normalization.normalize_sublime(adj, mode, sparse=False)[source]

Normalize adjacency matrix for SUBLIME model

Parameters:
  • adj – adjacency matrix

  • mode (str) – mode of normalize adjacency matrix

  • sparse (boolean,optional) – if use sparse. Defaults to False.

Returns:

adj after normalize

egc.utils.sampling module

Sample Method

egc.utils.sampling.get_repeat_shuffle_nodes_list(n_nodes, sample_times)[source]

Get Negative Sample Nodes List By Repeatable Shuffle

Parameters:
  • n_nodes (int) – node number in all.

  • sample_times (int) – sample times.

Returns:

list of multiple repeatable nodes index shuffle lists.

Return type:

(List)

egc.utils.sampling.normal_reparameterize(mu: Tensor, logvar: Tensor, training: bool = True) Tensor[source]

Reparameterization trick for normal distribution

Parameters:
  • mu (torch.Tensor) – mu

  • logvar (torch.Tensor) – logsigma

  • training (bool) – isTraining

Returns:

(torch.Tensor)

egc.utils.sampling.agm(x: ndarray) ndarray[source]

AGM probability

Parameters:

x (np.ndarray) – 1-d array

Returns:

AGM probability

Return type:

np.ndarray

egc.utils.sampling.choice(samples: List[int], weight: ndarray) int[source]

choose next node

Parameters:
  • samples (List[int]) – neighbors

  • weight (np.ndarray) – wights

Returns:

node chosen

Return type:

int

class egc.utils.sampling.CommunityGANSampling(n_threads: int, args: Tuple[int, int, bool], motif_size: int, total_motifs: List[List[Tuple]], theta_g: ndarray, neighbor_set: Dict)[source]

Bases: object

CommunityGAN Sampling

Parameters:
  • n_threads (int) – cores of multiprocessing.

  • args (Tuple[int, int, bool]) – root, n_sample, only_neg. root (int): root node id n_sample (int): num of motif sampled only_neg (bool): only return negative samples

  • motif_size (int) – motif size.

  • total_motifs (List[List[Tuple]]) – list of all motifs indexed by node id.

  • theta_g (np.ndarray) – node embedding of generator.

  • neighbor_set (Dict) – neighbor set Dict indexed by node id.

g_v(roots: List[int]) Tuple[int, List[int]][source]

get next node

Parameters:

roots (List[int]) – list of node sampled before

Returns:

current_node, path walked

Return type:

Tuple[int, List[int]]

g_s(args: Tuple[int, int, bool]) Tuple[List[Tuple], List[List[int]]][source]

sampling for community gan generator

Parameters:

args (Tuple[int, int, bool]) – root, n_sample, only_neg root (int): root node id n_sample (int): num of motif sampled only_neg (bool): only return negative samples

Returns:

motifs, paths

Return type:

Tuple[List[Tuple], List[List[int]]]

run() Tuple[List[Tuple], List[List[int]]][source]

sampling for community gan

Returns:

motifs, paths.

Return type:

Tuple[List[Tuple], List[List[int]]]

egc.utils.sublime_utils module

Utils for SUBLIME model

egc.utils.sublime_utils.nearest_neighbors_pre_elu(X, k, metric, i)[source]
egc.utils.sublime_utils.knn_fast(X, k, b)[source]
egc.utils.sublime_utils.apply_non_linearity(tensor, non_linearity, i)[source]
egc.utils.sublime_utils.cal_similarity_graph(node_embeddings)[source]
egc.utils.sublime_utils.top_k(raw_graph, K)[source]
egc.utils.sublime_utils.get_feat_mask(features, mask_rate)[source]
egc.utils.sublime_utils.symmetrize(adj)[source]
egc.utils.sublime_utils.split_batch(init_list, batch_size)[source]

Module contents

Utils