Peter S. Dodds, Flint Professor
Department of Mathematics and Statistics
University of Vermont
Complex systems often comprise many kinds of components which vary dramatically in size: numbers of organisms in species in ecologies, populations of cities and towns in countries, individual and corporate wealth in economies, and word frequency in natural language. Comparisons of component size distributions for two complex systems, or a system with itself at different time points, generally employ information-theoretic instruments, such as the Jensen-Shannon divergence. We argue that these methods are poorly motivated for many complex systems, lack transparency and adjustability, and should not be applied when component probabilities are non-sensible or are problematic to estimate. Here, we introduce rank turbulence divergence, a tunable instrument for comparing any two (Zipfian) ranked lists of components. We analytically develop our rank-based divergence in a series of steps, and realize the divergence as a 'rank turbulence divergence graph' which pairs a map-like histogram for rank-rank pairs with an ordered list of components according to divergence contribution. We explore the performance of rank turbulence divergence for four distinct settings: day-scale language use on Twitter; US baby names from 1880 to 2018; market cap US corporations from 1979 to 2018; and n-gram frequencies from the Google Books corpus. We provide a series of supplementary flip books' which demonstrate the tunability and storytelling power of our divergence. For systems where probabilities (or rates) are partially available, we put forward an analogous probability turbulence divergence. Finally, we compare our rank-based divergence to a family of generalized entropy divergences which includes the Jenson-Shannon Divergence.
Host: Joshua Weitz, Ph.D.