3:00pm - 5:00pmCombinatorics and algorithms in decision and reason
Chair(s): Liam Solus (KTH Royal Institute of Technology, Sweden), Svante Linusson (KTH Royal Institute of Technology)
Combinatorial, or discrete, structures are a fundamental tool for modeling decision-making processes in a wide variety of fields including machine learning, biology, economics, sociology, and causality. Within these various contexts, the goal of key problems can often be phrased in terms of learning or manipulating a combinatorial object, such as a network, permutation, or directed acyclic graph, that exhibits pre-specified optimal features. In recent decades, major break-throughs in each of these fields can be attributed to the development of effective algorithms for learning and analyzing combinatorial models. Many of these advancements are tied to new developments connecting combinatorics, algebra, geometry, and statistics, particularly through the introduction of geometric and algebraic techniques to the development of combinatorial algorithms. The goal of this session is to bring together researchers from each of these fields who are using combinatorial or discrete models in data science so as to encourage further breakthroughs in this important area of mathematical research.
(25 minutes for each presentation, including questions, followed by a 5-minute break; in case of x<4 talks, the first x slots are used unless indicated otherwise)
From random forests to regulatory rules: extracting interactions in high-dimensional genomic data
Karl Kumbier
University of California, Berkeley
Individual genomic assays measure elements that interact in vivo as components of larger molecular machines. Understanding the connections between such high-order interactions and complex biological processes from gene regulation to organ development presents a substantial statistical challenge. Namely, to identify high-quality interaction candidates from combinatorial search spaces in genome-scale data. Building on Random Forests (RFs), Random Intersection Trees (RITs), and through extensive, biologically inspired simulations, we developed the iterative Random Forest algorithm (iRF). iRF trains a feature-weighted ensemble of decision trees to detect stable, high-order interactions with same order of computational cost as RF. We define a functional relationship between interacting features and responses that decomposes RF predictions into a collection of interpretable rules, which can be used to evaluate interactions in terms of their stability and predictive accuracy. We demonstrate the utility of iRF for high-order interaction discovery in several genomics problems, where iRF recovers well-known interactions and posits novel, high-order interactions associated with gene regulation. By refining the process of interaction recovery, our approach has the potential to guide mechanistic inquiry into systems whose scale and complexity is beyond human comprehension.
Probabilistic tensors and opportunistic Boolean matrix multiplication
Petteri Kaski
Aalto University
We introduce probabilistic extensions of classical deterministic measures of algebraic complexity of a tensor, such as the rank and the border rank. We show that these probabilistic extensions satisfy various natural and algorithmically serendipitous properties, such as submultiplicativity under taking of Kronecker products. Furthermore, the probabilistic extensions enable strictly lower rank over their deterministic counterparts for specific tensors of interest, starting from the tensor <2,2,2> that represents 2-by-2 matrix multiplication. By submultiplicativity, this leads immediately to novel randomized algorithm designs, such as algorithms for Boolean matrix multiplication as well as detecting and estimating the number of triangles and other subgraphs in graphs. Joint work with Matti Karppa (Aalto University).
Reference: https://doi.org/10.1137/1.9781611975482.31
Discrete Models with Total Positivity
Dane Wilburne
York University
We consider the case of a discrete graphical loglinear model whose underlying distribution is assumed to be multivariate totally positive of order 2. In particular, we study the implications of total positivity on interactions between the random variables, the marginal polytope associated to the model, and model selection through maximum likelihood estimation. We also compare these results to recent work in Gaussian setting. This is joint work with Helene Massam.