References
[1] Valejo, Alan and Goes, F. and Romanetto, L. M. and Oliveira, Maria C. F. and Lopes, A. A., A benchmarking tool for the generation of bipartite network models with overlapping communities, in Knowledge and information systems, vol. 62, p. 1641-1669, 2019, doi: https://doi.org/10.1007/s10115-019-01411-9
Warning
-------
The original implementation (i.e. paper version [1]) is deprecated.
There may be divergences between this version and the original algorithm.
If you looking for the original version used in the paper don't hesitate to contact the authors.
This software is a new version, more robust and fast.
It is a beta version and has some bugs and inconsistencies.
The final version of this tool will be released is coming soon.
For now, you can use this version without guarantee of the results.
BNOC: A benchmarking tool to generate bipartite, k-partite and heterogeneous network models with overlapping communities
About
BNOC is a tool for synthesizing bipartite, k-partite and heterogeneous network models with varied features representative of properties from real networks. Multiple input parameters can be manipulated to create networks of varying sizes and with distinct community patterns in terms of number, size, balance, edge distribution intra- and inter-communities, degree of overlapping and cohesion, and degree of noise in the connection patterns.
Usage
BNOC may operate in two modes: 1. using explicit command line parameters (or options) or 2. using a JSON config file (JavaScript Object Notation).
Command line parameters
$ python bnoc.py [options]
Option | Domain | Default | Description |
---|---|---|---|
-dir --directory | str [DIR] | '.' | directory of output file |
-out --output | str [FILE] | 'out' | filename |
-o --output_objects | boolean | False | return python objects dictionary and don't write files |
-cnf --conf | str [FILE] | None | Input parameters in .json format |
-v, --vertices | int array | [10, 10, 10] | number of vertices for each layer |
-d, --dispersion | float array | [0.3, 0.3, 0.3] | dispersion of gamma mixing distribution for each layer |
-m, --mu | float array | [0.3, 0.3, 0.3] | dispersion or range of wieght values for each layer |
-c, --communities | int array | [2, 2, 2] | number of communities |
-x, --x | int array | [1, 1, 1] | number of vertices from V1 that participate of overlaping |
-z, --z | int array | [2, 2, 2] | number of vertices of overlapping communities |
-p, --p | int array of array | [[0.5, 0.5], [0.5, 0.5]] | probability of vertices in each community for each layer |
-e, --scheme | int array of array | [[0, 1], [1,2]] | connections type |
-n, --noise | float array | [0.1, 0.1] | noise for each connections type |
-b, --balanced | boolean | False | boolean balancing flag that suppresses -p parameter |
-u, --unweighted | boolean | False | unweighted networks |
-no, --normalize | boolean | False | scale input vectors individually to unit norm (vector length) |
-hd, --hard | boolean | False | hard noise |
--save_npy | boolean | False | save numpy object |
--save_ncol | boolean | False | save ncol format |
--save_gml | boolean | False | save gml format |
--save_arff | boolean | False | save arff format |
--save_cover | boolean | False | save communities in cover form |
--save_membership | boolean | False | save communities in a membership format |
--save_type | boolean | False | save vertex type |
--save_overlap | boolean | False | save save overlap vertices |
--show_timing | boolean | False | show timing |
--save_timing_json | boolean | False | save timing in json |
--save_timing_csv | boolean | False | save timing in csv |
--unique_key | boolean | False | output date and time as unique_key |
Parameters -d
, -m
, -c
, -x
, -y
and -z
are arrays of size L, where L is the number of layers. Parameter p
is an array of array the probability of vertices in each community for each layer. Parameter e
define the scheme of
the networks, i.e., the connections type.
JSON option
$ python bnoc.py -cnf options.json
JSON format: Data is in name/value pairs, separated by commas, curly braces hold objects and square brackets hold arrays.
{
"option": "value"
}
Examples
You can use a config file (.json) to specify the parameters, for instance:
A bipartite network with communities, overlapping and a small level of noise:
$ python bnoc/bnoc.py -cnf input/bipartite-1.json
To help you visualize the networks generated by BNOC you can use the PyNetViewer software. Then, it is possible to plot the network using a bipartite layout. Line widths reflect the corresponding edge weights; red squares depict overlapping vertices; and colored circles indicate non-overlapping vertices and their assigned community. Layout file is in the PyNetViewer repository.
$ python pynetviewer.py -cnf input/bipartite-1-layout-1.json
The same network with standard layout. Only overlapping vertices are highlighted.
$ python pynetviewer.py -cnf input/bipartite-1-layout-2.json
A bipartite network with hard level of noise, unbalanced community sizes and no overlapping.
$ python bnoc.py -cnf input/bipartite-2.json
$ python pynetviewer.py -cnf input/bipartite-2.json
A bipartite network with small level of noise, balanced community sizes, no overlapping and many communities.
$ python bnoc.py -cnf input/bipartite-3.json
$ python pynetviewer.py -cnf input/bipartite-3.json
A k-partite network with k=4 and overlapping.
$ python bnoc.py -cnf input/kpartite.json
$ python pynetviewer.py -cnf input/kpartite.json
A heterogeneous network with k=3 layers and no overlapping.
$ python bnoc.py -cnf input/heterogeneous.json
$ python pynetviewer.py -cnf input/heterogeneous.json
Notes
A graph is called disconnected if it has more than one component, i.e. if it is not connected. Following this concept, consider two observations:
-
Dispersion (
d
andm
parameters) controls the number of edges in the network based on the number of vertices. Large networks, with tens of thousands of vertices, can be connected with a low dispersion, e.g., d = 0.3. In contrast, networks with a few hundred vertices need a higher dispersion to be connected, e.g., d = 0.6. The number of communities need be considered too. E.g., a network with 1,000 vertices and 100 communities can have a low number of valid edges, since edges between communities is unlikely. Therefore, in this scenario, the graph needs a higher dispersion to be connected. -
The noise (
n
parameter) depends on the number of edges in the network. E.g., n = 0.01 is low and easy if applied in a small network with a few hundreds or thousands of edges and vertices. In contrast, n = 0.01 is high and hard if applied in a large network, with tens of thousands of vertices.
Scalability
BNOC can generate large-scale bipartite networks with tens or even hundreds of thousands of vertices and hundreds of millions of edges in a timely manner. See the original article [1] for details about complexity and scalability.
Important, save the output files in text format is slow. It is recommended save the result with numpy .npy
object,
see numpy.save for details.
For instance, a bipartite network with twenty thousand vertices (use --show_timing
to print timing values):
$ python bnoc.py -cnf input/input_bipartite_time_ncol.json
$ Snippet Time [m] Time [s]
$ Pre-processing 0.0 0.0213
$ Build BNOC 0.0 2.6187
$ Save 1.0 34.596
Note, the bottleneck of the Bnoc execution time is to save the output in a text format. To suppress this limitation you
can process the network directly in the memory or save a .npy
object using --output_npy
or -onpy
parameter.
$ python bnoc.py -cnf input/input_bipartite_time_npy.json
$ Snippet Time [m] Time [s]
$ Pre-processing 0.0 0.0223
$ Build BNOC 0.0 2.4116
$ Save 1.0 0.2627
Install
Pip
$ pip install -r requirements.txt
Or Anaconda env
$ conda env create -f environment.yml
$ conda activate bnoc
Or Anaconda create
$ conda create --name bnoc python=3.7.2
$ conda activate bnoc
$ conda install -c anaconda pyyaml
$ conda install -c conda-forge pypdf2
$ conda install -c anaconda scipy
Testing
To check that routines are (probably) working properly:
$ make test
Writing files is tested by exercising the script mode described in the Usage section above.
Created objects should probably be better tested, as directions of Bnoc usage.
File I/O can be tested by e.g. appropriatelly replacing the bnoc.open
function.
Release History
- 0.1.0
- The first proper release
- 0.0.1
- Work in progress
Contributing
- Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
- Please make sure to update tests as appropriate.
- Fork it (https://github.com/alanvalejo/bnoc/fork)
- Create your feature branch (git checkout -b feature/fooBar)
- Commit your changes (git commit -am 'Add some fooBar')
- Push to the branch (git push origin feature/fooBar)
- Create a new Pull Request
Known Bugs
- Please contact the author for problems and bug report.
Contact
- Alan Valejo
- [email protected]
- www.alanvalejo.com.br
- Postdoctoral research fellow at the University of São Paulo (USP), Brazil
License and credits
- Giving credit to the author by citing the papers [1]
- The GNU General Public License v3.0
- This program comes with ABSOLUTELY NO WARRANTY. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU.
- Owner or contributors are not liable for any direct, indirect, incidental, special, exemplary, or consequential damages, (such as loss of data or profits, and others) arising in any way out of the use of this software, even if advised of the possibility of such damage.
- This program is free software and distributed in the hope that it will be useful: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.
To-do list
- Explicitly seed a global variable or parameter to achieve reproducibility
- Improve usage section