CDLib – Letizia Milli
Community Discovery – i.e., how to decompose down a graph into densely connected clusters of nodes – is one of the most critical issues in network analysis and a mandatory step for countless data analysis tasks. Although many algorithms have been proposed to address this task, the absence of a framework standardizing their use makes it time-consuming to identify the best one for the specific data to analyze. To overcome such a shortcoming and to support researchers, we developed CDlib (Community Discovery library). It exposes more than 80 algorithms along with tools to evaluate, compare and visualize their clustering.
CDlib peculiarities are: a high number of Community algorithms; integrated evaluation and visualization tools; interoperability with graph data structures offered by the main python network libraries (networkx, igraph, graph-tool); standardized input/output/analysis pipeline.
The research has been conducted within the project SoBigData++.
DeSR – Giuseppe Attardi
DeSR is a Dependency Shift Reduce parser for multiple languages. It generates dependency parse trees for natural language sentences. The parser has been trained on all the 18 languages of the CoNLL-X Shared task and CoNLL Shared task 2007. Dependency structures are built scanning the input from left to right and deciding at each step whether to perform a shift or to create a dependency between two adjacent tokens. The parser algorithm is deterministic and highly efficient while still achieving state of the art accuracy. More details and the software for download are available at the DeSR Parser home page.
FastFlow – Massimo Torquati
FastFlow is a C++ parallel programming library currently targeting multi-/many-core platforms and clusters of multi-cores. It offers both a set of high-level ready-to-use parallel pattern implementations and a set of mechanisms and components (called building blocks) to support low-latency and high-throughput data-flow streaming networks. FastFlow simplifies the development of parallel applications modeled as a structured, directed graph of processing nodes. The graph of concurrent nodes is constructed by the assembly of sequential and parallel building blocks and higher-level components (i.e., parallel patterns) modeling recurrent schemas of parallel computations (e.g., pipeline, task-farm, parallel-for, map-reduce, etc.). FastFlow efficiency stems from the optimized implementation of the base communication and synchronization mechanisms and its layered software design. FastFlow has been used in two distinct FP7 STREP projects (ParaPhrase and Repara) and one H2020 project (RePhrase), as well as in some software projects in different application domains (numerical computation, image processing, financial applications).
LA-vector – Giorgio Vinciguerra
LA-vector is a compressed bitvector supporting efficient random access and rank queries. It uses novel ways of compressing and accessing data by learning and adapting to data regularities.
The research has been conducted within the projects Multicriteria data structures, SoBigData++, HumanE-AI-Net.
NDLib – Letizia Milli
Nowadays, modeling and analyzing how diffusive phenomena unfold on top of complex systems is a significant task able to capture growing interests from several fields. So far, few analytical platforms, typically focusing on a narrow set of classic models, have tried to offer out-of-the-box solutions to analysts and researchers. However, such tools are often characterized by complex interfaces that make their use unsuitable for non-technical audiences.
To address such an issue, we designed an analytical ecosystem that allows the widest audience to create and perform diffusion-related experiments: NDlib (Network Diffusion library), a library available for Python 3.x.
NDlib consists of three modules: the NDlib core library, a remote RESTful experiment server accessible through API calls, and, finally, a web-oriented visual interface able to abstract from the low-level description and definition of diffusion simulation. Moreover, NDlib is shipped with a SQL-like language – namely NDQL – designed to make it accessible even to non-programmers.
The research has been conducted within the project SoBigData++.
Optimization Tools – Antonio Frangioni
This is a (somewhat heterogeneous) collection of C++ codes for solving various kinds of optimization problems. These are typically polynomial-time problems, often with some graph structure and/or at the interface between continuous and combinatorial optimization, that need to be solved very many times within approaches for much more complex (NP-hard) ones. As such it is relevant that they be solved efficiently, but comparatively little regard has been paid to the implementation of solution methods for these “easy” problems, leading to the risk of re-inventing the wheel many times over. By releasing reasonable, portable implementations of efficient solvers for these problems under a clean interface we can spare other researchers and practitioners of implementation efforts that can therefore be more productively employed. Whenever possible the solvers adopt an abstract interface approach, where a pure virtual base C++ class is defined to set up the interface, and then multiple different implementations are provided as derived classes. Also, collections of relevant instances sets for these (and others) optimization problems, tools for reading them, and various testing and debugging tools are often included in the distributions to help perspective users to optimally exploit them and/or develop their own versions.
PGM-index – Giorgio Vinciguerra
The Piecewise Geometric Model index (PGM-index) is a data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes while providing the same worst-case query time guarantees.
The research has been conducted within the projects Multicriteria data structures, SoBigData++.
pysoccer – Paolo Cintia
Pysoccer offers a standardized data model designed to make data-driven soccer analytics easy. It aims to be the base format to build on all the community made analysis and make them comparable and able to speak the same language.
The research has been conducted within the project SoBigData.
soccerLogger – Paolo Cintia
Event Tagging Interface is a web application which allows a user to define temporal window of events from soccer video broadcasts using a gamepad. The research has been conducted within the project SoBigData++.
Scikit-Mobility – Roberto Pellungrini
Scikit-mobility is a library for human mobility analysis in Python. The library allows to: represent trajectories and mobility flows with proper data structures, TrajDataFrame and FlowDataFrame; manage and manipulate mobility data of various formats (call detail records, GPS data, data from social media, survey data, etc.); extract mobility metrics and patterns from data, both at individual and collective level (e.g., length of displacements, characteristic distance, origin-destination matrix, etc.); generate synthetic individual trajectories using standard mathematical models (random walk models, exploration and preferential return model, etc.); generate synthetic mobility flows using standard migration models (gravity model, radiation model, etc.); assess the privacy risk associated with a mobility data set.
The research has been conducted within the project SoBigData++