Recent Submissions

  • Exact and sampling methods for mining higher-order motifs in large hypergraphs

    Lotito, Quintino Francesco; Musciotto, Federico; Battiston, Federico; Montresor, Alberto; Department of Network and Data Science (Springer, 2023)
    Network motifs are recurrent, small-scale patterns of interactions observed frequently in a system. They shed light on the interplay between the topology and the dynamics of complex networks across various domains. In this work, we focus on the problem of counting occurrences of small sub-hypergraph patterns in very large hypergraphs, where higher-order interactions connect arbitrary numbers of system units. We show how directly exploiting higher-order structures speeds up the counting process compared to traditional data mining techniques for exact motif discovery. Moreover, with hyperedge sampling, performance is further improved at the cost of small errors in the estimation of motif frequency. We evaluate our method on several real-world datasets describing face-to-face interactions, co-authorship and human communication. We show that our approximated algorithm allows us to extract higher-order motifs faster and on a larger scale, beyond the computational limits of an exact approach.
  • Crowdsourcing Subjective Annotations Using Pairwise Comparisons Reduces Bias and Error Compared to the Majority-vote Method

    Narimanzadeh, Hasti; Badie-Modiri, Arash; Smirnova, Iuliia G.; Chen, Ted Hsuan Yun; Department of Network and Data Science (Association for Computing Machinery, 2023)
    How to better reduce measurement variability and bias introduced by subjectivity in crowdsourced labelling remains an open question. We introduce a theoretical framework for understanding how random error and measurement bias enter into crowdsourced annotations of subjective constructs. We then propose a pipeline that combines pairwise comparison labelling with Elo scoring, and demonstrate that it outperforms the ubiquitous majority-voting method in reducing both types of measurement error. To assess the performance of the labelling approaches, we constructed an agent-based model of crowdsourced labelling that lets us introduce different types of subjectivity into the tasks. We find that under most conditions with task subjectivity, the comparison approach produced higher f1 scores. Further, the comparison approach is less susceptible to inflating bias, which majority voting tends to do. To facilitate applications, we show with simulated and real-world data that the number of required random comparisons for the same classification accuracy scales log-linearly O(N log N) with the number of labelled items. We also implemented the Elo system as an open-source Python package.
  • Universal Basic Income in a Blockchain-Based Community Currency

    Avanzo, Sowelu; Criscione, Teodoro; Linares, Julio; Schifanella, Claudio; Department of Network and Data Science (Association for Computing MachineryNew York, 2023)
    Recent advancements of blockchain technologies ensure security and trustability of Community Currency Systems (CCSs), enabling their increasingly widespread adoption. These systems aim at empowering the local economies by virtue of a medium of exchange whose governance and circulation are local. Smart contracts enable the enforcement of token economy policies, which facilitate the experimentation of radically new economic models. Recent studies investigated blockchain-based CCSs. Still, to the best of our knowledge, this is the first study analyzing a CCS providing a token-based Universal Basic Income (UBI). We evaluate the Circles UBI decentralised application utility in delivering an unconditional income to its users, focusing on its main pilot project running in Berlin. We analyse the structural changes in the network, especially in relation to a subsidy program, involving local businesses. We also identify prominent users based on centrality measures, and investigate how the UBI was effectively spent. We adopt a method agnostic to the economic context to identify optimal aggregation windows for the temporal network of CCS transactions based on the Causal Fidelity (CF) index. This aims to provide static representations as accurate as possible in terms of sequential order of edges, which aspect was not considered in previous research on CCSs. Our findings suggest that the pilot project sustained the expansion of the economic network and the system facilitated trade in urban communities in Berlin. Future research is needed to identify methods to ensure sustainability of self-organised CCSs adopting a UBI issuance scheme and to further decentralise their governance.
  • Hyper-cores promote localization and efficient seeding in higher-order processes

    Mancastroppa, Marco; Iacopini, Iacopo; Petri, Giovanni; Barrat, Alain; Department of Network and Data Science (Nature Publishing Group, 2023)
    Going beyond networks, to include higher-order interactions of arbitrary sizes, is a major step to better describe complex systems. In the resulting hypergraph representation, tools to identify structures and central nodes are scarce. We consider the decomposition of a hypergraph in hyper-cores, subsets of nodes connected by at least a certain number of hyperedges of at least a certain size. We show that this provides a fingerprint for data described by hypergraphs and suggests a novel notion of centrality, the hypercoreness. We assess the role of hyper-cores and nodes with large hypercoreness in higher-order dynamical processes: such nodes have large spreading power and spreading processes are localized in central hyper-cores. Additionally, in the emergence of social conventions very few committed individuals with high hypercoreness can rapidly overturn a majority convention. Our work opens multiple research avenues, from comparing empirical data to model validation and study of temporally varying hypergraphs.
  • Content-based comparison of communities in social networks: Ex-Yugoslavian reactions to the Russian invasion of Ukraine

    Evkoski, Bojan; Kralj Novak, Petra; Ljubešić, Nikola; Department of Network and Data Science (Springer Science and Business Media LLC, 2023-06-28)
  • Universal patterns in egocentric communication networks

    Iñiguez, Gerardo; Heydari, Sara; Kertész, János; Saramäki, Jari; Department of Network and Data Science (Springer Science and Business Media LLC, 2023-08-26)
    Tie strengths in social networks are heterogeneous, with strong and weak ties playing different roles at the network and individual levels. Egocentric networks, networks of relationships around an individual, exhibit few strong ties and more weaker ties, as evidenced by electronic communication records. Mobile phone data has also revealed persistent individual differences within this pattern. However, the generality and driving mechanisms of social tie strength heterogeneity remain unclear. Here, we study tie strengths in egocentric networks across multiple datasets of interactions between millions of people during months to years. We find universality in tie strength distributions and their individual-level variation across communication modes, even in channels not reflecting offline social relationships. Via a simple model of egocentric network evolution, we show that the observed universality arises from the competition between cumulative advantage and random choice, two tie reinforcement mechanisms whose balance determines the diversity of tie strengths. Our results provide insight into the driving mechanisms of tie strength heterogeneity in social networks and have implications for the understanding of social network structure and individual behavior.
  • Dynamics of cascades on burstiness-controlled temporal networks

    Unicomb, Samuel; Iñiguez, Gerardo; Gleeson, James P.; Karsai, Márton; Department of Network and Data Science (Springer Nature, 2021)
    Burstiness, the tendency of interaction events to be heterogeneously distributed in time, is critical to information diffusion in physical and social systems. However, an analytical framework capturing the effect of burstiness on generic dynamics is lacking. Here we develop a master equation formalism to study cascades on temporal networks with burstiness modelled by renewal processes. Supported by numerical and data-driven simulations, we describe the interplay between heterogeneous temporal interactions and models of threshold-driven and epidemic spreading. We find that increasing interevent time variance can both accelerate and decelerate spreading for threshold models, but can only decelerate epidemic spreading. When accounting for the skewness of different interevent time distributions, spreading times collapse onto a universal curve. Our framework uncovers a deep yet subtle connection between generic diffusion mechanisms and underlying temporal network structures that impacts a broad class of networked phenomena, from spin interactions to epidemic contagion and language dynamics.
  • Priority areas for protection of plant-pollinator interaction networks in the Atlantic Forest

    Pereira, Juliana; Battiston, Federico; Jordán, Ferenc; Department of Network and Data Science (Elsevier, 2022)
    Quantitative methods of prioritization are necessary to optimize the selection of protected areas for biodiversity conservation. Reserve selection is traditionally based on single species, considers representative habitats or, occasionally, spatial configuration but mostly the needs of the society. However, protecting particular species as independent entities is not enough to ensure effective conservation of ecological communities, since their functioning depends on the interactions between species. We propose a strategy to identify priority areas for protection based on species interaction networks. Similar local networks are grouped according to two different sets of network features: interacting species pairs and overall network structure. These groups or clusters of networks are used to delimitate ecological subregions, which are then compared to current nature reserves. Subregions with a lower proportion of protected area are given higher priority. Results from species pairs and network structure are finally combined to obtain the network protection priority index. We present a case study applying this strategy to the Brazilian Atlantic Forest, using plant-pollinator networks. We found that subregions based on network structure show a more grainy pattern, and approach spatial patterns related to forest formation types, while subregions based on species pairs show more distinct patches and a higher level of detail in the division, especially for interior forests. Highest priority is given to portions of the seasonal semi-deciduous and deciduous forest, especially NE S ̃ao Paulo, NW Paran ́a, N Rio Grande do Sul and E MinasGerais and, secondarily, W S ̃ao Paulo and the S ̃ao Francisco region. The approach we suggest here goes beyond the level of species, seeking to perpetuate the ecological interactions and networks that make up biological communities. It is our hope and conviction that this strategy contributes to the development of more effective conservation planning.
  • A systematic comprehensive longitudinal evaluation of dietary factors associated with acute myocardial infarction and fatal coronary heart disease

    Milanlouei, Soodabeh; Menichetti, Giulia; Li, Yanping; Loscalzo, Joseph; Willett, Walter C.; Barabási, Albert-László; Department of Network and Data Science (Springer Nature, 2020)
    Environmental factors, and in particular diet, are known to play a key role in the development of Coronary Heart Disease. Many of these factors were unveiled by detailed nutritional epidemiology studies, focusing on the role of a single nutrient or food at a time. Here, we apply an Environment-Wide Association Study approach to Nurses’ Health Study data to explore comprehensively and agnostically the association of 257 nutrients and 117 foods with coronary heart disease risk (acute myocardial infarction and fatal coronary heart disease). After accounting for multiple testing, we identify 16 food items and 37 nutrients that show statistically significant association – while adjusting for potential confounding and control variables such as physical activity, smoking, calorie intake, and medication use – among which 38 associations were validated in Nurses’ Health Study II. Our implementation of Environment-Wide Association Study successfully reproduces prior knowledge of diet-coronary heart disease associations in the epidemiological literature, and helps us detect new associations that were only marginally studied, opening potential avenues for further extensive experimental validation. We also show that Environment-Wide Association Study allows us to identify a bipartite food-nutrient network, highlighting which foods drive the associations of specific nutrients with coronary heart disease risk.
  • Elites, communities and the limited benefits of mentorship in electronic music

    Janosov, Milán; Musciotto, Federico; Battiston, Federico; Iñiguez, Gerardo; Department of Network and Data Science (Springer Nature, 2020)
    While the emergence of success in creative professions, such as music, has been studied extensively, the link between individual success and collaboration is not yet fully uncovered. Here we aim to fill this gap by analyzing longitudinal data on the co-releasing and mentoring patterns of popular electronic music artists appearing in the annual Top 100 ranking of DJ Magazine. We find that while this ranking list of popularity publishes 100 names, only the top 20 is stable over time, showcasing a lock-in effect on the electronic music elite. Based on the temporal co-release network of top musicians, we extract a diverse community structure characterizing the electronic music industry. These groups of artists are temporally segregated, sequentially formed around leading musicians, and represent changes in musical genres. We show that a major driving force behind the formation of music communities is mentorship: around half of musicians entering the top 100 have been mentored by current leading figures before they entered the list. We also find that mentees are unlikely to break into the top 20, yet have much higher expected best ranks than those who were not mentored. This implies that mentorship helps rising talents, but becoming an all-time star requires more. Our results provide insights into the intertwined roles of success and collaboration in electronic music, highlighting the mechanisms shaping the formation and landscape of artistic elites in electronic music.
  • Bridging the gap between graphs and networks

    Iñiguez, Gerardo; Battiston, Federico; Karsai, Márton; Department of Network and Data Science (Springer Nature, 2020)
    Network science has become a powerful tool to describe the structure and dynamics of real-world complex physical, biological, social, and technological systems. Largely built on empirical observations to tackle heterogeneous, temporal, and adaptive patterns of interactions, its intuitive and flexible nature has contributed to the popularity of the field. With pioneering work on the evolution of random graphs, graph theory is often cited as the mathematical foundation of network science. Despite this narrative, the two research communities are still largely disconnected. In this commentary, we discuss the need for further crosspollination between fields – bridging the gap between graphs and networks – and how network science can benefit from such influence. A more mathematical network science may clarify the role of randomness in modeling, hint at underlying laws of behavior, and predict yet unobserved complex networked phenomena in nature.
  • Exploring food contents in scientific literature with FoodMine

    Hooton, Forrest; Menichetti, Giulia; Barabási, Albert-László; Department of Network and Data Science (Springer Nature, 2020)
    Thanks to the many chemical and nutritional components it carries, diet critically affects human health. However, the currently available comprehensive databases on food composition cover only a tiny fraction of the total number of chemicals present in our food, focusing on the nutritional components essential for our health. Indeed, thousands of other molecules, many of which have well documented health implications, remain untracked. To explore the body of knowledge available on food composition, we built FoodMine, an algorithm that uses natural language processing to identify papers from PubMed that potentially report on the chemical composition of garlic and cocoa. After extracting from each paper information on the reported quantities of chemicals, we find that the scientific literature carries extensive information on the detailed chemical components of food that is currently not integrated in databases. Finally, we use unsupervised machine learning to create chemical embeddings, finding that the chemicals identified by FoodMine tend to have direct health relevance, reflecting the scientific community’s focus on health-related chemicals in our food.
  • Temporal social network reconstruction using wireless proximity sensors: model selection and consequences

    Dai, Sicheng; Bouchet, Hélène; Nardy, Aurélie; Fleury, Eric; Chevrot, Jean-Pierre; Karsai, Márton; Department of Network and Data Science (Springer, 2020)
    The emerging technologies of wearable wireless devices open entirely new ways to record various aspects of human social interactions in a broad range of settings. Such technologies allow to log the temporal dynamics of face-to-face interactions by detecting the physical proximity of participants. However, despite the wide usage of this technology and the collected datasets, precise reconstruction methods transforming the raw recorded communication data packets to social interactions are still missing. In this study we analyse a proximity dataset collected during a longitudinal social experiment aiming to understand the co-evolution of children’s language development and social network. Physical proximity and verbal communication of hundreds of pre-school children and their teachers are recorded over three years using autonomous wearable low power wireless devices. The dataset is accompanied with three annotated ground truth datasets, which record the time, distance, relative orientation, and interaction state of interacting children for validation purposes. We use this dataset to explore several pipelines of dynamical event reconstruction including earlier applied naïve approaches, methods based on Hidden Markov Model, or on Long Short-Term Memory models, some of them combined with supervised pre-classification of interaction packets. We find that while naïve models propose the worst reconstruction, Long Short-Term Memory models provide the most precise way to reconstruct real interactions up to ${\sim} 90\%$∼90% accuracy. Finally, we simulate information spreading on the reconstructed networks obtained by the different methods. Results indicate that small improvement of network reconstruction accuracy may lead to significantly different spreading dynamics, while sometimes large differences in accuracy have no obvious effects on the dynamics. This not only demonstrates the importance of precise network reconstruction but also the careful choice of the reconstruction method in relation with the data collected. Missing this initial step in any study may seriously mislead conclusions made about the emerging properties of the observed network or any dynamical process simulated on it.
  • “Born in Rome” or “Sleeping Beauty”: Emergence of hashtag popularity on the Chinese microblog Sina Weibo

    Cui, Hao; Kertész, János; Department of Network and Data Science (Elsevier, 2023)
    To understand the emergence of hashtag popularity in online social networking complex systems, we study the largest Chinese microblogging site Sina Weibo, which has a Hot Search List (HSL) showing in real time the ranking of the 50 most popular hashtags based on search activity. We investigate the prehistory of successful hashtags from 17 July 2020 to 17 September 2020 by mapping out the related interaction network preceding the selection to HSL. We have found that the circadian activity pattern has an impact on the time needed to get to the HSL. When analyzing this time we distinguish two extreme categories: (a) “Born in Rome”, which means hashtags are mostly first created by superhubs or reach superhubs at an early stage during their propagation and thus gain immediate wide attention from the broad public, and (b) “Sleeping Beauty”, meaning the hashtags gain little attention at the beginning and reach system-wide popularity after a considerable time lag. The evolution of the repost networks of successful hashtags before getting to the HSL show two types of growth patterns: “smooth” and “stepwise”. The former is usually dominated by a superhub and the latter results from consecutive waves of contributions of smaller hubs. The repost networks of unsuccessful hashtags exhibit a simple evolution pattern.
  • Networks beyond pairwise interactions: Structure and dynamics

    Battiston, Federico; Cencetti, Giulia; Iacopini, Iacopo; Latora, Vito; Lucas, Maxime; Patania, Alice; Young, Jean-Gabriel; Petri, Giovanni; Department of Network and Data Science (Elsevier, 2020)
    The complexity of many biological, social and technological systems stems from the richness of the interactions among their units. Over the past decades, a variety of complex systems has been successfully described as networks whose interacting pairs of nodes are connected by links. Yet, from human communications to chemical reactions and ecological systems, interactions can often occur in groups of three or more nodes and cannot be described simply in terms of dyads. Until recently little attention has been devoted to the higher-order architecture of real complex systems. However, a mounting body of evidence is showing that taking the higher-order structure of these systems into account can enhance our modeling capacities and help us understand and predict their dynamical behavior. Here we present a complete overview of the emerging field of networks beyond pairwise interactions. We discuss how to represent higherorder interactions and introduce the different frameworks used to describe higher-order systems, highlighting the links between the existing concepts and representations. We review the measures designed to characterize the structure of these systems and the models proposed to generate synthetic structures, such as random and growing bipartite graphs, hypergraphs and simplicial complexes. We introduce the rapidly growing research on higher-order dynamical systems and dynamical topology, discussing the relations between higher-order interactions and collective behavior. We focus in particular on new emergent phenomena characterizing dynamical processes, such as diffusion, synchronization, spreading, social dynamics and games, when extended beyond pairwise interactions. We conclude with a summary of empirical applications, and an outlook on current modeling and conceptual frontiers.
  • Automating Terror: The Role and Impact of Telegram Bots in the Islamic State’s Online Ecosystem

    Alrhmoun, Abdullah; Winter, Charlie; Kertész, János; Department of Network and Data Science (Taylor & Francis, 2023)
    In this article, we use network science to explore the topology of the Islamic State’s “terrorist bot” network on the online social media platform Telegram, empirically identifying its connections to the Islamic State supporter-run groups and channels that operate across the platform, with which these bots form bipartite structures. As part of this, we examine the diverse activities of the bots to determine the extent to which they operate in synchrony with one another as well as explore their impacts. We show that these bots are mainly clustered around two communities of Islamic State supporters, or “munasirun,” with one community focusing on facilitating discussion and exchange, and the other one augmenting content distribution efforts. Operating as such, this network of bots is used to lubricate and augment the Islamic State’s influence activities, including facilitating content amplification and community cultivation efforts, and connecting people with the movement based on common behaviors, shared interests, and/or ideological proximity while minimizing risk for the broader organization.
  • Stability of Imbalanced Triangles in Gene Regulatory Networks of Cancerous and Normal Cells

    Rizi, Abbas Karimi; Zamani, Mina; Shirazi, Amirhossein; Jafari, G. Reza; Kertész, János; Department of Network and Data Science (Frontiers, 2021)
    Genes communicate with each other through different regulatory effects, which lead to the emergence of complex network structures in cells, and such structures are expected to be different for normal and cancerous cells. To study these differences, we have investigated the Gene Regulatory Network (GRN) of cells as inferred from RNA-sequencing data. The GRN is a signed weighted network corresponding to the inductive or inhibitory interactions. Here we focus on a particular of motifs in the GRN, the triangles, which are imbalanced if the number of negative interactions is odd. By studying the stability of imbalanced triangles in the GRN, we show that the network of cancerous cells has fewer imbalanced triangles compared to normal cells. Moreover, in the normal cells, imbalanced triangles are isolated from the main part of the network, while such motifs are part of the network's giant component in cancerous cells. Our result demonstrates that due to genes' collective behavior the structure of the complex networks is different in cancerous cells from those in normal ones.
  • Revealing Consensus and Dissensus between Network Partitions

    Peixoto, Tiago P.; Department of Network and Data Science (American Physical Society, 2021)
    Community detection methods attempt to divide a network into groups of nodes that share similar properties, thus revealing its large-scale structure. A major challenge when employing such methods is that they are often degenerate, typically yielding a complex landscape of competing answers. As an attempt to extract understanding from a population of alternative solutions, many methods exist to establish a consensus among them in the form of a single partition “point estimate” that summarizes the whole distribution. Here, we show that it is, in general, not possible to obtain a consistent answer from such point estimates when the underlying distribution is too heterogeneous. As an alternative, we provide a comprehensive set of methods designed to characterize and summarize complex populations of partitions in a manner that captures not only the existing consensus but also the dissensus between elements of the population. Our approach is able to model mixed populations of partitions, where multiple consensuses can coexist, representing different competing hypotheses for the network structure. We also show how our methods can be used to compare pairs of partitions, how they can be generalized to hierarchical divisions, and how they can be used to perform statistical model selection between competing hypotheses.
  • The anatomy of social dynamics in escape rooms

    O. Szabo, Rebeka; Department of Network and Data Science (Springer Nature, 2022)
    From sport and science production to everyday life, higher-level pursuits demand collaboration. Despite an increase in the number of data-driven studies on human behavior, the social dynamics of collaborative problem solving are still largely unexplored with network science and other computational and quantitative tools. Here we introduce escape rooms as a non-interventional and minimally biased social laboratory, which allows us to capture at a high resolution real-time communications in small project teams. Our analysis portrays a nuanced picture of different dimensions of social dynamics. We reveal how socio-demographic characteristics impact problem solving and the importance of prior relationships for enhanced interactions. We extract key conversation rules from motif analysis and discuss turn-usurping gendered behavior, a phenomenon particularly strong in male-dominated teams. We investigate the temporal evolution of signed and group interactions, finding that a minimum level of tense communication might be beneficial for collective problem solving, and revealing differences in the behavior of successful and failed teams. Our work unveils the innovative potential of escape rooms to study teams in their complexity, contributing to a deeper understanding of the micro-dynamics of collaborative team processes.