Scikit-network: Graph analysis in Python


Scikit-network is a Python package for the analysis of large graphs like social networks, Web graphs and relational data, developped since May 2018 at Télécom Paris. The package offers state-of-the-art algorithms for processing these graphs, understanding their structure, extracting their main clusters and their most representative nodes. It also includes visualization tools for exporting vectorial images of graphs, in SVG format.

The scikit-network project is guided by two requirements, often contradictory in practice: ergonomy and performance. Regarding ergonomy, the package can be installed easily through the standard Python package manager and relies on the same API as scikit-learn, the standard Python package in machine learning. Regarding performance, the code relies on the efficient matrix-vector products of SciPy, some compiled code based on the Cython language, and parallel processing. The result is a both easy-to-use and efficient package, making it very attractive compared to its main competitors (NetworkX, graph-tool and iGraph).

The applications of scikit-network are very diverse: content recommendation, classification of documents, cohort selection for medical research, etc. The corresponding graphs (here between users and contens, documents and words, patients and medical codes, respectively) can have thousands or even millions of nodes. Using scikit-network, a few lines of code are enough to understand their structure and extract relevant information.

The scikit-network package is developped at Télécom Paris in the DIG team by Professor Thomas Bonald and his PhD students Nathan de Lara and Quentin Lutz. The documentation includes tutorials to quickly understand the main algorithms and their applications. Following the open-source policy of the school, this Python library is now used in teaching, for both students and professionals.