# «MODELING AND SIMULATING THE PROPAGATION OF INFECTIOUS DISEASES USING COMPLEX NETWORKS A Thesis Presented to The Academic Faculty by Rick Quax In ...»

## MODELING AND SIMULATING THE PROPAGATION

## OF INFECTIOUS DISEASES USING COMPLEX

NETWORKS

A Thesis

Presented to

The Academic Faculty

by

Rick Quax

In Partial Fulﬁllment

of the Requirements for the Degree

Master of Science in

Computer Science

College of Computing

Georgia Institute of Technology

August 1, 2008

## MODELING AND SIMULATING THE PROPAGATION

## OF INFECTIOUS DISEASES USING COMPLEX

NETWORKS**Approved by:**

David Bader, Ph.D., Committee Chair Computational Science and Engineering Georgia Institute of Technology Peter Sloot, Ph.D., Advisor Section of Computational Science University of Amsterdam Richard Vuduc, Ph.D.

Computational Science and Engineering Georgia Institute of Technology Date Approved: July 7, 2008

## ACKNOWLEDGEMENTS

I would like to thank my thesis advisors, who have guided me in the presentation of my thesis, and Shan Mei, with whom I have had many insightful discussions.iii Contents ACKNOWLEDGEMENTS............................ iii LIST OF TABLES................................. vii LIST OF FIGURES................................ viii SUMMARY..................................... ix I INTRODUCTION.............................. 1

1.1 The thesis................................ 1

1.2 Problem statement........................... 1

1.3 Using complex networks........................ 2 1.3.1 Overview............................ 2 1.3.2 The SEECN framework.................... 2

1.4 Applicability to other domains.................... 3

1.5 Our contribution............................ 4 II PREVIOUS WORK............................. 5

2.1 Graph properties............................ 5

2.2 Kronecker and RMAT......................... 6

2.3 Simulating ep

Appendix A A LISTING OF THE SIMULATION MODEL PARAMETERS 57 Appendix B CALCULATION OF THE KRONECKER PARAMETERS FOR A HOMOSEXUAL COMMUNITY..................... 62

For explanation and prediction of the evolution of infectious diseases in populations, researchers often use simpliﬁed mathematical models for simulation. We believe that the results from these models are often questionable when the epidemic dynamics becomes more complex, and that developing more realistic models is intractable.

In this dissertation we propose to simulate infectious disease propagation using dynamic and complex networks. We present the Simulator of Epidemic Evolution using Complex Networks (SEECN), an expressive and high-performance framework that combines algorithms for graph generation and various operators for modeling temporal dynamics. For graph generation we use the Kronecker algorithm, derive its underlying statistical structure and exploit it for a variety of purposes. Then the epidemic is evolved over the network by simulating the dynamics of the population and the epidemic simultaneously, where each type of dynamics is performed by a separate operator. All dynamics operators can be fully and independently parameterized, facilitating incremental model development and enabling diﬀerent inﬂuences to be toggled for diﬀerential analysis.

As a prototype, we simulate two relatively complex models for the HIV epidemic and ﬁnd a remarkable ﬁt to reported data for AIDS incidence and prevalence. Our most important conclusion is that the mere dynamics of the HIV epidemic is sufcient to produce rather complex trends in the incidence and prevalence statistics, e.g. without the introduction of particularly eﬀective treatments at speciﬁc times.

We show that this invalidates assumptions and conclusions made previously in the literature, and argue that simulations used for explanation and prediction of trends

than is currently done. In addition, we substantiate a previously predicted paradox that the availability of Highly Active Anti-Retroviral Treatment likely causes an increased HIV incidence.

1.1 The thesis Our thesis is that realistic simulations of epidemics over population networks are more easily and more adequately performed using annotated complex networks and independent dynamics operators. Traditional mathematical models are suﬃcient for relatively simple simulations but quickly become intractable as more inﬂuences and interactions are added. In contrast, using an explicit representation of a population network annotated with variables, where an epidemic is evolved through the application of separate operators, models can easily be extended in an incremental way, be speciﬁed in arbitrary detail and diﬀerent eﬀects can be toggled for diﬀerential analysis. We also claim that this method can be practical in terms of running time.

1.2 Problem statement Understanding patterns of prevalence and incidence of infectious diseases is a major challenge for various reasons. Static population networks consist of heterogeneous agents in multiple respects, and are characterized by distinctive global properties such as power-law degree distribution, community-structure and small diameter. In addition, sociological processes change the network over time but depend on the epidemic status, and conversely the propagation of an epidemic depends on both the static and dynamic properties of the population network.

At present, most simulations of epidemics are performed using mathematical models of complex real-world dynamics with only a handful of parameters. Popular methods are coupled diﬀerential equations and discrete Markov models. However, such approaches have signiﬁcant drawbacks: the model complexity quickly becomes intractable for more realistic simulations, incrementally extending existing models is non-trivial and simplifying assumptions may have unexpected but dramatic eﬀects on the simulation results. We argue that these models are adequate in limited cases, but that diﬀerent methods should be used for more detailed and realistic knowledge discovery.

1.3 Using complex networks 1.3.1 Overview We propose to explicitly simulate the dynamics of virus propagation using annotated complex networks. Firstly, a graph representative of the target population is generated where the nodes represent agents and the edges represent possible infection pathways in a time step. A number of operators is then applied to the graph to model temporal epidemiological and sociological dynamics of the population. Nodes and edges are annotated with static and dynamic variables, and operators are parameterized independent of each other. This enables the incremental development of models of an epidemic, toggling of diﬀerent inﬂuences for diﬀerential analysis and allows for models of arbitrary complexity. For instance, an agent’s infection stage and gender could inﬂuence its social behavior, infectivity, connectivity and life expectancy; at the same time, an agent’s behavior and connectivity could inﬂuence the probability of becoming infected and infecting someone else.

1.3.2 The SEECN framework We present SEECN1, a simulation framework that is designed to be expressive, modular and high-performance. The temporal operators support addition and deletion of nodes and edges, propagation of a disease over edges and its local progression within

**nodes. They can be fully parameterized by the variables of the involved entities:**

SEECN is available at www.science.uva.nl/∼rquax and easily pronounced as “season”.

nodes have a static ‘type’ and a dynamic ‘status’, and edges have a static ‘type’ and store the node variables at both sides. Time is discretized in time steps and all model parameters can be changed in each time step, for instance to reﬂect the introduction of a new treatment.

SEECN is comprehensively engineered for performance. It is written in C++ and parallelized with OpenMP, and both the graph generation and temporal dynamics algorithms are optimized for NUMA and cache eﬃciency using novel theory of Kronecker graphs. Even though the algorithms are bounded by memory-bandwidth and have inherently unpredictable access patterns, we were able to achieve a speed-up of one order of magnitude in single-processor performance and almost perfect scalability in the number of threads.

1.4 Applicability to other domains SEECN was developed for simulating infectious diseases, but we expect that it can be used for many other applications as well. The paradigm of studying the spread of some ‘virus’ over a domain of nodes and edges is a general one, although we currently only generate social interaction networks which are characterized by distinctive properties such as power-law degree distribution, small diameter and community structure. Such

**properties and dynamics are found in e.g.:**

• Information dissemination in blog communities;

• Interaction between proteins;

• Connectivity simulations of ﬁle-sharing networks;

• Robustness of autonomous system networks against failures.

1.5 Our contribution The contribution of this dissertation is twofold. In chapter 3 we choose Kronecker graphs for modeling social networks, derive novel and complete statistical properties of the total edge count, degrees of individual nodes and edge conﬁguration, and exploit these in a variety of applications that were impossible or impractical before. RMAT is an algorithm that has been proposed to generate Kronecker graphs eﬃciently, and is described in chapter 2. In particular, we show how the results enable load-balancing and cache eﬃciency of the RMAT algorithm; combined with a refactored graph data structure and the use of the same techniques in the dynamics operators, we expect that SEECN is practical. We also suggest how the posterior probability of a graph can be computed eﬃciently and how a simple mathematical model can incorporate the statistical structure of a Kronecker graph.

Combining the graph generator with modules for network dynamics, infection propagation and local virus progression, we present and validate our framework in chapter 4. In chapter 5 we then showcase its expressiveness by performing a nontrivial simulation of the HIV epidemic. Firstly, we ﬁnd a surprising phenomenon that could have caused the observed drop in AIDS annual incidence that invalidates earlier assumptions and conclusions, namely that the HIV epidemic itself is suﬃcient to produce such complex trends in reported data, and demonstrate that the use of simplistic mathematical models is insuﬃcient as basis for complex predictions.

As a corollary we also add further evidence to support a much-debated paradoxical prediction that the introduction of treatment could be counter eﬀective and ﬁnd indeed that it likely results in higher stabilized HIV incidence.

Here we introduce some terminology, review important graph properties and summarize previous work on graph generation and modeling and simulation of epidemics.

2.1 Graph properties In this dissertation we deﬁne a graph (or ‘network’) G as a set of nodes and edges V, E, and we deﬁne |V | = N = 2n where n is an integer. All our graphs are undirected, i.e. ∀x,y∈V (x, y) ∈ E ⇒ (y, x) ∈ E; if this does not hold the graph is said to be directed. A graph has a power-law distribution when the number of nodes that has a particular degree k decreases monomially as k increases, i.e. pk ∝ k −γ where γ is the power-law exponent and typically 1 γ 3. For increasing N, a graph follows the densiﬁcation law if |E| ∝ N α, α 1 where typically 1 α 2. In other words, the average degree increases as the graph size increases. This was recently observed to be true for various ubiquitous graphs such as social networks and the World Wide Web [16] and appears to coincide with a shrinking diameter (while it was long debated whether these networks have e.g. O(log N ) or O(log log N ) diameter growth). The diameter of a graph is typically deﬁned as the maximum path length of all possible shortest paths, although in practical situations eﬀective diameter is often used; this states a diameter such that some considerable fraction of shortest paths is shorter than this, where the fraction is typically ≥ 90%. This accommodates isolated nodes and excessively rare long paths.

Figure 1: An illustration of a step in the Kronecker algorithm. The scalar entries of matrix Gi are multiplied with the matrix itself and replaced with the result, to yield Gi+1.

2.2 Kronecker and RMAT Leskovec et al. [17], proposes to adopt the Kronecker matrix multiplication to generate realistic and mathematically tractable graphs. A large body of mathematical theory has already been developed for this operator, which they intended to exploit.

Conceptually, the Kronecker multiplication of a q ×q matrix with itself yields a q 2 ×q 2 matrix where each original (scalar) entry is multiplied by the entire original matrix, and replaced with the result. When a small matrix G1 is repeatedly Kroneckermultiplied with itself, the result can be interpreted as an adjacency matrix; with the right choice of parameters, properties such as a power-law distribution and hierarchical community-structure can be obtained. See ﬁgure 1 for an illustration.

In the stochastic case, a small initiator matrix contains edge probabilities 0 ≤ Θi,j ≤ 1. The result P = Θt after t Kronecker time steps is a probabilistic adjacency

the probability of edge (x, y) being present [15].1 This initiator matrix is typically very small and often of size 2 × 2, so that the bit-representations of two unique ids i, j ∈ [0, N ) of two particular nodes uniquely identify such a permutation with n factors.