Publications | The Adaptive Intelligence Lab

2026

arXiv
Distributional Active Inference

A. Akgül, G. Baykal, M. Haußmann, and 2 more authors

arXiv Preprint, 2026

Abs Bib HTML

Optimal control of complex environments with robotic systems faces two complementary and intertwined challenges: efficient organization of sensory state information and far-sighted action planning. Because the reinforcement learning framework addresses only the latter, it tends to deliver sample-inefficient solutions. Active inference is the state-of-the-art process theory that explains how biological brains handle this dual problem. However, its applications to artificial intelligence have thus far been limited to extensions of existing model-based approaches. We present a formal abstraction of reinforcement learning algorithms that spans model-based, distributional, and model-free approaches. This abstraction seamlessly integrates active inference into the distributional reinforcement learning framework, making its performance advantages accessible without transition dynamics modeling.
@article{akgul2026distributional, title = {Distributional Active Inference}, author = {Akg{\"u}l, A. and Baykal, G. and Hau{\ss}mann, M. and {\c{C}}elikok, M. M. and Kandemir, M.}, year = {2026}, journal = {arXiv Preprint}, url = {https://arxiv.org/pdf/2601.20985}, }
ICLR
Bridging the performance-gap between target-free and target-based reinforcement learning

T. Vincent, Y. Tripathi, T. Faust, and 5 more authors

In International Conference on Learning Representations, 2026

Abs Bib HTML

The use of target networks in deep reinforcement learning is a widely popular solution to mitigate the brittleness of semi-gradient approaches and stabilize learning. However, target networks notoriously require additional memory and delay the propagation of Bellman updates compared to an ideal target-free approach. In this work, we step out of the binary choice between target-free and target-based algorithms. We introduce a new method that uses a copy of the last linear layer of the online network as a target network, while sharing the remaining parameters with the up-to-date online network. This simple modification enables us to keep the target-free’s low-memory footprint while leveraging the target-based literature. We find that combining our approach with the concept of iterated Q-learning, which consists of learning consecutive Bellman updates in parallel, helps improve the sample-efficiency of target-free approaches. Our proposed method, iterated Shared Q-Learning (iS-QL), bridges the performance gap between target-free and target-based approaches across various problems while using a single Q-network, thus stepping towards resource-efficient reinforcement learning algorithms.
@inproceedings{vincent2026bridging, title = {Bridging the performance-gap between target-free and target-based reinforcement learning}, author = {Vincent, T. and Tripathi, Y. and Faust, T. and Akg{\"u}l, A. and Oren, Y. and Kandemir, M. and Peters, P. and D'Eramo, C.}, booktitle = {International Conference on Learning Representations}, year = {2026}, url = {https://openreview.net/pdf?id=ltcxS7JE0c}, }

2025

arXiv
Directional Ensemble Aggregation for Actor-Critics

N. Werge, Y.S. Wu, B. Tasdighi, and 1 more author

arXiv Preprint, 2025

Abs Bib HTML

Off-policy reinforcement learning in continuous control tasks depends critically on accurate Q-value estimates. Conservative aggregation over ensembles, such as taking the minimum, is commonly used to mitigate overestimation bias. However, these static rules are coarse, discard valuable information from the ensemble, and cannot adapt to task-specific needs or different learning regimes. We propose Directional Ensemble Aggregation (DEA), an aggregation method that adaptively combines Q-value estimates in actor-critic frameworks. DEA introduces two fully learnable directional parameters: one that modulates critic-side conservatism and another that guides actor-side policy exploration. Both parameters are learned using ensemble disagreement-weighted Bellman errors, which weight each sample solely by the direction of its Bellman error. This directional learning mechanism allows DEA to adjust conservatism and exploration in a data-driven way, adapting aggregation to both uncertainty levels and the phase of training. We evaluate DEA across continuous control benchmarks and learning regimes - from interactive to sample-efficient - and demonstrate its effectiveness over static ensemble strategies.
@article{werge2025directionalensemble, title = {Directional Ensemble Aggregation for Actor-Critics}, author = {Werge, N. and Wu, Y.S. and Tasdighi, B. and Kandemir, M.}, year = {2025}, journal = {arXiv Preprint}, url = {https://arxiv.org/abs/2507.23501}, }
SciPost
Accurate Surrogate Amplitudes with Calibrated Uncertainties

H. Bahl, E. Nina, L. Favaro, and 3 more authors

SciPost Physics Core, 2025

Abs Bib HTML

Neural networks for LHC physics have to be accurate, reliable, and controlled. Using neural surrogates for the prediction of loop amplitudes as a use case, we first show how activation functions are systematically tested with Kolmogorov-Arnold Networks. Then, we train neural surrogates to simultaneously predict the target amplitude and an uncertainty for the prediction. We disentangle systematic uncertainties, learned by a well-defined likelihood loss, from statistical uncertainties, which require the introduction of Bayesian neural networks or repulsive ensembles. We test the coverage of the learned uncertainties using pull distributions to quantify the calibration of cutting-edge neural surrogates.
@article{bahl2025accurate, title = {Accurate Surrogate Amplitudes with Calibrated Uncertainties}, author = {Bahl, H. and Nina, E. and Favaro, L. and Haussmann, M. and Plehn, T. and Winterhalder, R.}, year = {2025}, journal = {SciPost Physics Core}, url = {https://www.scipost.org/SciPostPhysCore.8.4.073}, }
arXiv
ObjectRL: An Object-Oriented Reinforcement Learning Codebase

G. Baykal, A. Akgül, M. Haussmann, and 4 more authors

arXiv Preprint, 2025

Abs Bib HTML Code

ObjectRL is an open-source Python codebase for deep reinforcement learning (RL), designed for research-oriented prototyping with minimal programming effort. Unlike existing codebases, ObjectRL is built on Object-Oriented Programming (OOP) principles, providing a clear structure that simplifies the implementation, modification, and evaluation of new algorithms. ObjectRL lowers the entry barrier for deep RL research by organizing best practices into explicit, clearly separated components, making them easier to understand and adapt. Each algorithmic component is a class with attributes that describe key RL concepts and methods that intuitively reflect their interactions. The class hierarchy closely follows common ontological relationships, enabling data encapsulation, inheritance, and polymorphism, which are core features of OOP. We demonstrate the efficiency of ObjectRL’s design through representative use cases that highlight its flexibility and suitability for rapid prototyping. The documentation and source code are available at https://objectrl.readthedocs.io and https://github.com/adinlab/objectrl .
@article{baykal2025objectrl, title = {ObjectRL: An Object-Oriented Reinforcement Learning Codebase}, author = {Baykal, G. and Akg{\"u}l, A. and Haussmann, M. and Tasdighi, B. and Werge, N. and Wu, Y.S. and Kandemir, M.}, year = {2025}, journal = {arXiv Preprint}, url = {https://arxiv.org/abs/2507.03487}, }
arXiv
Deep Actor-Critics with Tight Risk Certificates

B. Tasdighi, M. Haußmann, Y.S. Wu, and 2 more authors

arXiv Preprint, 2025

Abs Bib HTML

After a period of research, deep actor-critic algorithms have reached a level where they influence our everyday lives. They serve as the driving force behind the continual improvement of large language models through user-collected feedback. However, their deployment in physical systems is not yet widely adopted, mainly because no validation scheme that quantifies their risk of malfunction. We demonstrate that it is possible to develop tight risk certificates for deep actor-critic algorithms that predict generalization performance from validation-time observations. Our key insight centers on the effectiveness of minimal evaluation data. Surprisingly, a small feasible of evaluation roll-outs collected from a pretrained policy suffices to produce accurate risk certificates when combined with a simple adaptation of PAC-Bayes theory. Specifically, we adopt a recently introduced recursive PAC-Bayes approach, which splits validation data into portions and recursively builds PAC-Bayes bounds on the excess loss of each portion’s predictor, using the predictor from the previous portion as a data-informed prior. Our empirical results across multiple locomotion tasks and policy expertise levels demonstrate risk certificates that are tight enough to be considered for practical use.
@article{tasdighi2025deepactorcriticstightrisk, title = {Deep Actor-Critics with Tight Risk Certificates}, author = {Tasdighi, B. and Hau{\ss}mann, M. and Wu, Y.S. and A.R., Masegosa. and Kandemir, M.}, year = {2025}, journal = {arXiv Preprint}, url = {https://arxiv.org/abs/2505.19682}, }
TMLR
Latent mixed-effect models for high-dimensional longitudinal data

Priscilla Ong, Manuel Haußmann, Otto Lönnroth, and 1 more author

Transactions on Machine Learning Research, 2025

Abs Bib HTML

Modelling longitudinal data is an important yet challenging task. These datasets can be high-dimensional, contain non-linear effects and feature time-varying covariates. Gaussian process (GP) prior-based variational autoencoders (VAEs) have emerged as a promising approach due to their ability to model time-series data. However, they are costly to train and struggle to fully exploit the rich covariates characteristic of longitudinal data, making them difficult for practitioners to use effectively. In this work, we leverage linear mixed models (LMMs) and amortized variational inference to provide conditional priors for VAEs, and propose LMM-VAE, a scalable, interpretable and identifiable model. We highlight theoretical connections between it and GP-based techniques, providing a unified framework for this class of methods. Our proposal performs competitively compared to existing approaches across simulated and real-world datasets.
@article{ong2025latent, title = {Latent mixed-effect models for high-dimensional longitudinal data}, author = {Ong, Priscilla and Hau{\ss}mann, Manuel and L{\"o}nnroth, Otto and L{\"a}hdesm{\"a}ki, Harri}, year = {2025}, journal = {Transactions on Machine Learning Research}, url = {https://openreview.net/forum?id=7A96yteeF9}, }
JMLR
On Adaptive Stochastic Optimization for Streaming Data: A Newton’s Method with O (dN) Operations

Antoine Godichon-Baggioni and Nicklas Werge

Journal of Machine Learning Research, 2025

Abs Bib HTML

Stochastic optimization methods face new challenges in the realm of streaming data, characterized by a continuous flow of large, high-dimensional data. While first-order methods, like stochastic gradient descent, are the natural choice for such data, they often struggle with ill-conditioned problems. In contrast, second-order methods, such as Newton’s method, offer a potential solution but are computationally impractical for large-scale streaming applications. This paper introduces adaptive stochastic optimization methods that effectively address ill-conditioned problems while functioning in a streaming context. Specifically, we present adaptive inversion-free stochastic quasi-Newton methods with computational complexity matching that of first-order methods, O(dN), where d represents the number of dimensions/features and N the number of data points. Theoretical analysis establishes their asymptotic efficiency, and empirical studies demonstrate their effectiveness in scenarios with complex covariance structures and poor initializations. In particular, we demonstrate that our adaptive quasi-Newton methods can outperform or match existing first- and second-order methods.
@article{godichon2025adaptive, title = {On Adaptive Stochastic Optimization for Streaming Data: A Newton's Method with O (dN) Operations}, author = {Godichon-Baggioni, Antoine and Werge, Nicklas}, year = {2025}, journal = {Journal of Machine Learning Research}, url = {https://www.jmlr.org/papers/volume26/23-1565/23-1565.pdf}, }
TMLR
Overcoming Non-stationary Dynamics with Evidential Proximal Policy Optimization

A. Akgül, G. Baykal, M. Haußmann, and 1 more author

Transactions on Machine Learning Research, 2025

Abs Bib HTML

Continuous control of non-stationary environments is a major challenge for deep reinforcement learning algorithms. The time-dependency of the state transition dynamics aggravates the notorious stability problems of model-free deep actor-critic architectures. We posit that two properties will play a key role in overcoming non-stationarity in transition dynamics: (i) preserving the plasticity of the critic network and (ii) directed exploration for rapid adaptation to changing dynamics. We show that performing on-policy reinforcement learning with an evidential critic provides both. The evidential design ensures a fast and accurate approximation of the uncertainty around the state value, which maintains the plasticity of the critic network by detecting the distributional shifts caused by changes in dynamics. The probabilistic critic also makes the actor training objective a random variable, enabling the use of directed exploration approaches as a by-product. We name the resulting algorithm \emphEvidential Proximal Policy Optimization (EPPO) due to the integral role of evidential uncertainty quantification in both policy evaluation and policy improvement stages. Through experiments on non-stationary continuous control tasks, where the environment dynamics change at regular intervals, we demonstrate that our algorithm outperforms state-of-the-art on-policy reinforcement learning variants in both task-specific and overall return.
@article{akgul2025overcoming, title = {Overcoming Non-stationary Dynamics with Evidential Proximal Policy Optimization}, author = {Akg{\"u}l, A. and Baykal, G. and Hau{\ss}mann, M. and Kandemir, M.}, year = {2025}, journal = {Transactions on Machine Learning Research}, url = {https://openreview.net/forum?id=KTfTwxsVNE}, }
NeuCom
Disentanglement with Factor Quantized Variational Autoencoders

G. Baykal, M. Kandemir, and G. Unal

Neurocomputing, 2025

Abs Bib HTML

Disentangled representation learning aims to represent the underlying generative factors of a dataset in a latent representation independently of one another. In our work, we propose a discrete variational autoencoder (VAE) based model where the ground truth information about the generative factors are not provided to the model. We demonstrate the advantages of learning discrete representations over learning continuous representations in facilitating disentanglement. Furthermore, we propose incorporating an inductive bias into the model to further enhance disentanglement. Precisely, we propose scalar quantization of the latent variables in a latent representation with scalar values from a global codebook, and we add a total correlation term to the optimization as an inductive bias. Our method called FactorQVAE is the first method that combines optimization based disentanglement approaches with discrete representation learning, and it outperforms the former disentanglement methods in terms of two disentanglement metrics (DCI and InfoMEC) while improving the reconstruction performance. Our code can be found at https://github.com/ituvisionlab/FactorQVAE.
@article{baykal2025disentanglement, title = {Disentanglement with Factor Quantized Variational Autoencoders}, author = {Baykal, G. and Kandemir, M. and Unal, G.}, year = {2025}, journal = {Neurocomputing}, url = {https://arxiv.org/abs/2409.14851}, }
ICLR
High-Dimensional Bayesian Optimisation with Gaussian Process Prior Variational Autoencoders

S. Ramchandran, M. Haussmann, and H. Lähdesmäki

In International Conference on Learning Representations, 2025

Abs Bib HTML

Bayesian optimisation (BO) using a Gaussian process (GP)-based surrogate model is a powerful tool for solving black-box optimisation problems but does not scale well to high-dimensional data. Previous works have proposed to use variational autoencoders (VAEs) to project high-dimensional data onto a low-dimensional latent space and to implement BO in the inferred latent space. In this work, we propose a conditional generative model for efficient high-dimensional BO that uses a GP surrogate model together with GP prior VAEs. A GP prior VAE extends the standard VAE by conditioning the generative and inference model on auxiliary covariates, capturing complex correlations across samples with a GP. Our model incorporates the observed target quantity values as auxiliary covariates learning a structured latent space that is better suited for the GP-based BO surrogate model. It handles partially observed auxiliary covariates using a unifying probabilistic framework and can also incorporate additional auxiliary covariates that may be available in real-world applications. We demonstrate that our method improves upon existing latent space BO methods on simulated datasets as well as on commonly used benchmarks.
@inproceedings{ramchandran2025highdimensional, title = {High-Dimensional Bayesian Optimisation with Gaussian Process Prior Variational Autoencoders}, author = {Ramchandran, S. and Haussmann, M. and L{\"a}hdesm{\"a}ki, H.}, year = {2025}, booktitle = {International Conference on Learning Representations}, url = {https://openreview.net/forum?id=SIuD7CySb4}, }
ECAI
Deep Exploration with PAC-Bayes

B. Tasdighi, M. Haussmann, N. Werge, and 2 more authors

In European Conference on Artificial Intelligence, 2025

Abs Bib HTML Code

Reinforcement learning (RL) for continuous control under delayed rewards is an under-explored problem despite its significance in real life. Many complex skills build on intermediate ones as prerequisites. For instance, a humanoid locomotor has to learn how to stand before it can learn to walk. To cope with delayed reward, a reinforcement learning agent has to perform deep exploration. However, existing deep exploration methods are designed for small discrete action spaces, and their successful generalization to state-of-the-art continuous control remains unproven. We address the deep exploration problem for the first time from a PAC-Bayesian perspective in the context of actor-critic learning. To do this, we quantify the error of the Bellman operator through a PAC-Bayes bound, where a bootstrapped ensemble of critic networks represents the posterior distribution, and their targets serve as a data-informed function-space prior. We derive an objective function from this bound and use it to train the critic ensemble. Each critic trains an individual soft actor network, implemented as a shared trunk and critic-specific heads. The agent performs deep exploration by acting epsilon-softly on a randomly chosen actor head. Our proposed algorithm, named PAC-Bayesian Actor-Critic (PBAC), is the only algorithm to consistently discover delayed rewards on a diverse set of continuous control tasks with varying difficulty.
@inproceedings{tasdighi2025pbac, title = {Deep Exploration with PAC-Bayes}, author = {Tasdighi, B. and Haussmann, M. and Werge, N. and Wu, Y. and Kandemir, M.}, year = {2025}, booktitle = {European Conference on Artificial Intelligence}, url = {https://arxiv.org/abs/2402.03055}, }
ECAI
Improving Actor-Critic Training with Steerable Action-Value Approximation Errors

B. Tasdighi, N. Werge, Y.S. Wu, and 1 more author

In European Conference on Artificial Intelligence, 2025

Abs Bib HTML

Off-policy actor-critic algorithms have shown promise in deep reinforcement learning for continuous control tasks. Their success largely stems from leveraging pessimistic state-action value function updates, which effectively address function approximation errors and improve performance. However, such pessimism can lead to under-exploration, constraining the agent’s ability to explore/refine its policies. Conversely, optimism can counteract under-exploration, but it also carries the risk of excessive risk-taking and poor convergence if not properly balanced. Based on these insights, we introduce Utility Soft Actor-Critic (USAC), a novel framework within the actor-critic paradigm that enables independent control over the degree of pessimism/optimism for both the actor and the critic via interpretable parameters. USAC adapts its exploration strategy based on the uncertainty of critics through a utility function that allows us to balance between optimism and pessimism separately. By going beyond binary choices of optimism and pessimism, USAC represents a significant step towards achieving balance within off-policy actor-critic algorithms. Our experiments across various continuous control problems show that the degree of pessimism or optimism depends on the nature of the task. Furthermore, we demonstrate that USAC can outperform state-of-the-art algorithms for appropriately configured pessimism/optimism parameters.
@inproceedings{tasdighi2025improving, title = {Improving Actor-Critic Training with Steerable Action-Value Approximation Errors}, author = {Tasdighi, B. and Werge, N. and Wu, Y.S. and Kandemir, M.}, year = {2025}, booktitle = {European Conference on Artificial Intelligence}, url = {https://arxiv.org/abs/2406.03890}, }

2024

NeurIPS
Deterministic Uncertainty Propagation for Improved Model-Based Offline Reinforcement Learning

A. Akgül, M. Haussmann, and M. Kandemir

In Neural Information Processing Systems, 2024

Abs Bib HTML Code

Current approaches to model-based offline reinforcement learning often incorporate uncertainty-based reward penalization to address the distributional shift problem. These approaches, commonly known as pessimistic value iteration, use Monte Carlo sampling to estimate the Bellman target to perform temporal difference based policy evaluation. We find out that the randomness caused by this sampling step significantly delays convergence. We present a theoretical result demonstrating the strong dependency of suboptimality on the number of Monte Carlo samples taken per Bellman target calculation. Our main contribution is a deterministic approximation to the Bellman target that uses progressive moment matching, a method developed originally for deterministic variational inference. The resulting algorithm, which we call Moment Matching Offline Model-Based Policy Optimization (MOMBO), propagates the uncertainty of the next state through a nonlinear Q-network in a deterministic fashion by approximating the distributions of hidden layer activations by a normal distribution. We show that it is possible to provide tighter guarantees for the suboptimality of MOMBO than the existing Monte Carlo sampling approaches. We also observe MOMBO to converge faster than these approaches in a large set of benchmark tasks.
@inproceedings{akgul2024deterministic, title = {Deterministic Uncertainty Propagation for Improved Model-Based Offline Reinforcement Learning}, author = {Akgül, A. and Haussmann, M. and Kandemir, M.}, year = {2024}, booktitle = {Neural Information Processing Systems}, url = {https://arxiv.org/abs/2406.04088}, }
NeurIPS
Recursive PAC-Bayes: A Frequentist Approach to Sequential Prior Updates with No Information Loss

Y. Wu, Y. Zhang, B. Chérief-Abdellatif, and 1 more author

In Neural Information Processing Systems, 2024

Abs Bib HTML

PAC-Bayesian analysis is a frequentist framework for incorporating prior knowledge into learning. It was inspired by Bayesian learning, which allows sequential data processing and naturally turns posteriors from one processing step into priors for the next. However, despite two and a half decades of research, the ability to update priors sequentially without losing confidence information along the way remained elusive for PAC-Bayes. While PAC-Bayes allows construction of data-informed priors, the final confidence intervals depend only on the number of points that were not used for the construction of the prior, whereas confidence information in the prior, which is related to the number of points used to construct the prior, is lost. This limits the possibility and benefit of sequential prior updates, because the final bounds depend only on the size of the final batch. We present a novel and, in retrospect, surprisingly simple and powerful PAC-Bayesian procedure that allows sequential prior updates with no information loss. The procedure is based on a novel decomposition of the expected loss of randomized classifiers. The decomposition rewrites the loss of the posterior as an excess loss relative to a downscaled loss of the prior plus the downscaled loss of the prior, which is bounded recursively. As a side result, we also present a generalization of the split-kl and PAC-Bayes-split-kl inequalities to discrete random variables, which we use for bounding the excess losses, and which can be of independent interest. In empirical evaluation the new procedure significantly outperforms state-of-the-art.
@inproceedings{wu2024recursive, title = {Recursive {PAC}-Bayes: A Frequentist Approach to Sequential Prior Updates with No Information Loss}, author = {Wu, Y. and Zhang, Y. and Ch{\'e}rief-Abdellatif, B. and Seldin, Y.}, year = {2024}, booktitle = {Neural Information Processing Systems}, url = {https://arxiv.org/abs/2405.14681}, }
ICML
Latent variable model for high-dimensional point process with structured missingness

M. Sinelnikov, M. Haussmann, and H. Lähdesmäki

In International Conference on Machine Learning, 2024

Abs Bib HTML

Longitudinal data are important in numerous fields, such as healthcare, sociology and seismology, but real-world datasets present notable challenges for practitioners because they can be high-dimensional, contain structured missingness patterns, and measurement time points can be governed by an unknown stochastic process. While various solutions have been suggested, the majority of them have been designed to account for only one of these challenges. In this work, we propose a flexible and efficient latent-variable model that is capable of addressing all these limitations. Our approach utilizes Gaussian processes to capture correlations between samples and their associated missingness masks as well as to model the underlying point process. We construct our model as a variational autoencoder together with deep neural network parameterised decoder and encoder models, and develop a scalable amortised variational inference approach for efficient model training. We demonstrate competitive performance using both simulated and real datasets.
@inproceedings{sinelnikov2024latent, title = {Latent variable model for high-dimensional point process with structured missingness}, author = {Sinelnikov, M. and Haussmann, M. and L{\"a}hdesm{\"a}ki, H.}, year = {2024}, booktitle = {International Conference on Machine Learning}, url = {https://openreview.net/forum?id=g1Gf0hoPSz}, }
SPIGM
Learning high-dimensional mixed models via amortized variational inference

P. Ong, M. Haussmann, and H. Lähdesmäki

In Structured Probabilistic Inference & Generative Modeling, 2024

Abs Bib HTML

Modelling longitudinal data is an important yet challenging task. These datasets can be high-dimensional, consist of non-linear effects, and contain time-varying covariates. In this work, we leverage linear mixed models (LMMs) and amortized variational inference to provide conditional priors for VAEs, and propose LMM-VAE, a model that is scalable, interpretable, and shares theoretical connections to the GP-based VAEs. We empirically demonstrate that LMM-VAE performs competitively compared to existing approaches.
@inproceedings{ong2024learning, title = {Learning high-dimensional mixed models via amortized variational inference}, author = {Ong, P. and Haussmann, M. and Lähdesmäki, H.}, year = {2024}, booktitle = {Structured Probabilistic Inference & Generative Modeling}, url = {https://openreview.net/forum?id=6huQApLcJK}, }
AABI
PAC-Bayesian Soft Actor-Critic Learning

B. Tasdighi, A. Akgül, M. Haussmann, and 2 more authors

In Advances in Approximate Bayesian Inference Symposium, 2024

Abs Bib HTML

Actor-critic algorithms address the dual goals of reinforcement learning, policy evaluation and improvement, via two separate function approximators. The practicality of this approach comes at the expense of training instability, caused mainly by the destructive effect of the approximation errors of the critic on the actor. We tackle this bottleneck by employing an existing Probably Approximately Correct (PAC) Bayesian bound for the first time as the critic training objective of the Soft Actor-Critic (SAC) algorithm. We further demonstrate that the online learning performance improves significantly when a stochastic actor explores multiple futures by critic-guided random search. We observe our resulting algorithm to compare favorably to the state of the art on multiple classical control and locomotion tasks in both sample efficiency and asymptotic performance.
@inproceedings{bahareh2024pac4sac, title = {PAC-Bayesian Soft Actor-Critic Learning}, author = {Tasdighi, B. and Akgül, A. and Haussmann, M. and Brink, K.K. and Kandemir, M.}, year = {2024}, booktitle = {Advances in Approximate Bayesian Inference Symposium}, url = {https://arxiv.org/abs/2301.12776}, }
L4DC
Continual Learning of Multi-modal Dynamics with External Memory

A. Akgül, G. Unal, and M. Kandemir

In Learning for Dynamics and Control, 2024

Abs Bib HTML

We study the problem of fitting a model to a dynamical environment when new modes of behavior emerge sequentially. The learning model is aware when a new mode appears, but it does not have access to the true modes of individual training sequences. We devise a novel continual learning method that maintains a descriptor of the mode of an encountered sequence in a neural episodic memory. We employ a Dirichlet Process prior on the attention weights of the memory to foster efficient storage of the mode descriptors. Our method performs continual learning by transferring knowledge across tasks by retrieving the descriptors of similar modes of past tasks to the mode of a current sequence and feeding this descriptor into its transition kernel as control input. We observe the continual learning performance of our method to compare favorably to the mainstream parameter transfer approach.
@inproceedings{akgul2024cddp, title = {Continual Learning of Multi-modal Dynamics with External Memory}, author = {Akgül, A. and Unal, G. and Kandemir, M.}, year = {2024}, booktitle = {Learning for Dynamics and Control}, url = {https://arxiv.org/abs/2203.00936}, }
PR
EdVAE: Mitigating codebook collapse with evidential discrete variational autoencoders

G. Baykal, M. Kandemir, and G. Unal

In Pattern Recognition, 2024

Abs Bib HTML

Codebook collapse is a common problem in training deep generative models with discrete representation spaces like Vector Quantized Variational Autoencoders (VQ-VAEs). We observe that the same problem arises for the alternatively designed discrete variational autoencoders (dVAEs) whose encoder directly learns a distribution over the codebook embeddings to represent the data. We hypothesize that using the softmax function to obtain a probability distribution causes the codebook collapse by assigning overconfident probabilities to the best matching codebook elements. In this paper, we propose a novel way to incorporate evidential deep learning (EDL) through a hierarchical Bayesian modeling instead of softmax to combat the codebook collapse problem of dVAE. We evidentially monitor the significance of attaining the probability distribution over the codebook embeddings, in contrast to softmax usage. Our experiments using various datasets show that our model, called EdVAE, mitigates codebook collapse while improving the reconstruction performance, and enhances the codebook usage compared to dVAE and VQ-VAE based models. Our code can be found at https://github.com/ituvisionlab/EdVAE.
@inproceedings{baykal2024edvae, title = {EdVAE: Mitigating codebook collapse with evidential discrete variational autoencoders}, author = {Baykal, G. and Kandemir, M. and Unal, G.}, year = {2024}, booktitle = {Pattern Recognition}, url = {https://www.sciencedirect.com/science/article/pii/S0031320324005430}, }
TMLR
The Cold Posterior Effect Indicates Underfitting, and Cold Posteriors Represent a Fully Bayesian Method to Mitigate It

Y. Zhang, Y. Wu, L.A. Ortega, and 1 more author

Transactions on Machine Learning Research, 2024

Abs Bib HTML

The cold posterior effect (CPE) (Wenzel et al., 2020) in Bayesian deep learning shows that, for posteriors with a temperature T < 1, the resulting posterior predictive could have better performance than the Bayesian posterior (T=1). As the Bayesian posterior is known to be optimal under perfect model specification, many recent works have studied the presence of CPE as a model misspecification problem, arising from the prior and/or from the likelihood. In this work, we provide a more nuanced understanding of CPE as we show that misspecification leads to CPE only when the resulting Bayesian posterior underfits. In fact, we theoretically show that if there is no underfitting, there is no CPE. Furthermore, we show that these tempered posteriors with T < 1are indeed proper Bayesian posteriors with a different combination of likelihoods and priors parameterized by T. This observation validates the adjustment of the temperature hyperparameter T as a straightforward approach to mitigate underfitting in the Bayesian posterior. In essence, we show that by fine-tuning the temperature T we implicitly utilize alternative Bayesian posteriors, albeit with less misspecified likelihood and prior distributions. The code for replicating the experiments can be found at https://github.com/pyijiezhang/cpe-underfit.
@article{zhang2024the, title = {The Cold Posterior Effect Indicates Underfitting, and Cold Posteriors Represent a Fully Bayesian Method to Mitigate It}, author = {Zhang, Y. and Wu, Y. and Ortega, L.A. and Masegosa, A.R.}, year = {2024}, journal = {Transactions on Machine Learning Research}, url = {https://openreview.net/forum?id=GZORXGxHHT}, }
ICLR
Calibrating Bayesian UNet++ for Sub-Seasonal Forecasting

B. Asan, A. Akgül, A. Unal, and 2 more authors

In Tackling Climate Change with Machine Learning at ICLR 2024, 2024

Abs Bib HTML

Seasonal forecasting is a crucial task when it comes to detecting the extreme heat and colds that occur due to climate change. Confidence in the predictions should be reliable since a small increase in the temperatures in a year has a big impact on the world. Calibration of the neural networks provides a way to ensure our confidence in the predictions. However, calibrating regression models is an under-researched topic, especially in forecasters. We calibrate a UNet++ based architecture, which was shown to outperform physics-based models in temperature anomalies. We show that with a slight trade-off between prediction error and calibration error, it is possible to get more reliable and sharper forecasts. We believe that calibration should be an important part of safety-critical machine learning applications such as weather forecasters.
@inproceedings{asan2024calibrating, title = {Calibrating Bayesian UNet++ for Sub-Seasonal Forecasting}, author = {Asan, B. and Akgül, A. and Unal, A. and Kandemir, M. and Unal, G.}, year = {2024}, booktitle = {Tackling Climate Change with Machine Learning at ICLR 2024}, url = {https://arxiv.org/abs/2403.16612}, }

2023

arXiv
Demystifying the Myths and Legends of Nonconvex Convergence of SGD

A. Dutta, E.H. Bergou, S. Boucherouite, and 3 more authors

arXiv Preprint, 2023

Abs Bib HTML

Stochastic gradient descent (SGD) and its variants are the main workhorses for solving large-scale optimization problems with nonconvex objective functions. Although the convergence of SGDs in the (strongly) convex case is well-understood, their convergence for nonconvex functions stands on weak mathematical foundations. Most existing studies on the nonconvex convergence of SGD show the complexity results based on either the minimum of the expected gradient norm or the functional sub-optimality gap (for functions with extra structural property) by searching the entire range of iterates. Hence the last iterations of SGDs do not necessarily maintain the same complexity guarantee. This paper shows that _an ε-stationary point exists in the final iterates of SGDs,_ given a large enough total iteration budget, T, not just anywhere in the entire range of iterates — a much stronger result than the existing one. Additionally, our analyses allow us to measure the _density of the ε-stationary points_ in the final iterates of SGD, and we recover the classical O(\frac1\sqrtT) asymptotic rate under various existing assumptions on the objective function and the bounds on the stochastic gradient. As a result of our analyses, we addressed certain myths and legends related to the nonconvex convergence of SGD and posed some thought-provoking questions that could set new directions for research.
@article{dutta2023demystifying, title = {Demystifying the Myths and Legends of Nonconvex Convergence of SGD}, author = {Dutta, A. and Bergou, E.H. and Boucherouite, S. and Werge, N. and Kandemir, M. and Li, X.}, year = {2023}, journal = {arXiv Preprint}, url = {https://arxiv.org/pdf/2310.12969}, }
arXiv
BOF-UCB: A Bayesian-Optimistic Frequentist Algorithm for Non-Stationary Contextual Bandits

N. Werge, A. Akgül, and M. Kandemir

arXiv Preprint, 2023

Abs Bib HTML

We propose a novel Bayesian-Optimistic Frequentist Upper Confidence Bound (BOF-UCB) algorithm for stochastic contextual linear bandits in non-stationary environments. This unique combination of Bayesian and frequentist principles enhances adaptability and performance in dynamic settings. The BOF-UCB algorithm utilizes sequential Bayesian updates to infer the posterior distribution of the unknown regression parameter, and subsequently employs a frequentist approach to compute the Upper Confidence Bound (UCB) by maximizing the expected reward over the posterior distribution. We provide theoretical guarantees of BOF-UCB’s performance and demonstrate its effectiveness in balancing exploration and exploitation on synthetic datasets and classical control tasks in a reinforcement learning setting. Our results show that BOF-UCB outperforms existing methods, making it a promising solution for sequential decision-making in non-stationary environments.
@article{werge2023bof, title = {BOF-UCB: A Bayesian-Optimistic Frequentist Algorithm for Non-Stationary Contextual Bandits}, author = {Werge, N. and Akgül, A. and Kandemir, M.}, year = {2023}, journal = {arXiv Preprint}, url = {https://arxiv.org/abs/2307.03587}, }
arXiv
If there is no underfitting, there is no Cold Posterior Effect

Y. Zhang, Y. Wu, L.A. Ortega, and 1 more author

arXiv Preprint, 2023

Abs Bib HTML

The cold posterior effect (CPE) \citepWRVS+20 in Bayesian deep learning shows that, for posteriors with a temperature T<1, the resulting posterior predictive could have better performances than the Bayesian posterior (T=1). As the Bayesian posterior is known to be optimal under perfect model specification, many recent works have studied the presence of CPE as a model misspecification problem, arising from the prior and/or from the likelihood function. In this work, we provide a more nuanced understanding of the CPE as we show that _misspecification leads to CPE only when the resulting Bayesian posterior underfits_. In fact, we theoretically show that if there is no underfitting, there is no CPE.
@article{zhang2023if, title = {If there is no underfitting, there is no Cold Posterior Effect}, author = {Zhang, Y. and Wu, Y. and Ortega, L.A. and Masegosa, A.R.}, year = {2023}, journal = {arXiv Preprint}, url = {https://arxiv.org/abs/2310.01189}, }
NeurIPS
Improved Algorithms for Stochastic Linear Bandits Using Tail Bounds for Martingale Mixtures

H. Flynn, D. Reeb, M. Kandemir, and 1 more author

In Neural Information Processing Systems, 2023

Abs Bib HTML

We present improved algorithms with worst-case regret guarantees for the stochastic linear bandit problem. The widely used "optimism in the face of uncertainty" principle reduces a stochastic bandit problem to the construction of a confidence sequence for the unknown reward function. The performance of the resulting bandit algorithm depends on the size of the confidence sequence, with smaller confidence sets yielding better empirical performance and stronger regret guarantees. In this work, we use a novel tail bound for adaptive martingale mixtures to construct confidence sequences which are suitable for stochastic bandits. These confidence sequences allow for efficient action selection via convex programming. We prove that a linear bandit algorithm based on our confidence sequences is guaranteed to achieve competitive worst-case regret. We show that our confidence sequences are tighter than competitors, both empirically and theoretically. Finally, we demonstrate that our tighter confidence sequences give improved performance in several hyperparameter tuning tasks.
@inproceedings{flynn2023improved, title = {Improved Algorithms for Stochastic Linear Bandits Using Tail Bounds for Martingale Mixtures}, author = {Flynn, H. and Reeb, D. and Kandemir, M. and Peters, J.}, year = {2023}, booktitle = {Neural Information Processing Systems}, url = {https://arxiv.org/abs/2309.14298}, }
ACML
Estimation of Counterfactual Interventions under Uncertainties

J. Weilbach, S. Gerwinn, M. Kandemir, and 1 more author

In Asian Conference on Machine Learning, 2023

Abs Bib HTML

Counterfactual analysis is intuitively performed by humans on a daily basis eg. "What should I have done differently to get the loan approved?". Such counterfactual questions also steer the formulation of scientific hypotheses. More formally it provides insights about potential improvements of a system by inferring the effects of hypothetical interventions into a past observation of the system’s behaviour which plays a prominent role in a variety of industrial applications. Due to the hypothetical nature of such analysis, counterfactual distributions are inherently ambiguous. This ambiguity is particularly challenging in continuous settings in which a continuum of explanations exist for the same observation. In this paper, we address this problem by following a hierarchical Bayesian approach which explicitly models such uncertainty. In particular, we derive counterfactual distributions for a Bayesian Warped Gaussian Process thereby allowing for non-Gaussian distributions and non-additive noise. We illustrate the properties our approach on a synthetic and on a semi-synthetic example and show its performance when used within an algorithmic recourse downstream task.
@inproceedings{weilbach2023estimation, title = {Estimation of Counterfactual Interventions under Uncertainties}, author = {Weilbach, J. and Gerwinn, S. and Kandemir, M. and Fraenzle, M.}, year = {2023}, booktitle = {Asian Conference on Machine Learning}, url = {https://arxiv.org/abs/2309.08332}, }
T-PAMI
PAC-Bayes Bounds for Bandit Problems: A Survey and Experimental Comparison

H. Flynn, D. Reeb, M. Kandemir, and 1 more author

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023

Abs Bib HTML

PAC-Bayes has recently re-emerged as an effective theory with which one can derive principled learning algorithms with tight performance guarantees. However, applications of PAC-Bayes to bandit problems are relatively rare, which is a great misfortune. Many decision-making problems in healthcare, finance and natural sciences can be modelled as bandit problems. In many of these applications, principled algorithms with strong performance guarantees would be very much appreciated. This survey provides an overview of PAC-Bayes performance bounds for bandit problems and an experimental comparison of these bounds. Our experimental comparison has revealed that available PAC-Bayes upper bounds on the cumulative regret are loose, whereas available PAC-Bayes lower bounds on the expected reward can be surprisingly tight. We found that an offline contextual bandit algorithm that learns a policy by optimising a PAC-Bayes bound was able to learn randomised neural network polices with competitive expected reward and non-vacuous performance guarantees.
@article{flyn2022pacbayes, title = {PAC-Bayes Bounds for Bandit Problems: A Survey and Experimental Comparison}, author = {Flynn, H. and Reeb, D. and Kandemir, M. and Peters, J.}, year = {2023}, journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence}, url = {https://arxiv.org/abs/2211.16110}, }
MDPI
ALReg: Registration of 3D Point Clouds Using Active Learning

Y.H. Sahin, O. Karabacak, M. Kandemir, and 1 more author

MDPI Applied Sciences, 2023

Abs Bib HTML

After the success of deep learning in point cloud segmentation and classification tasks, it has also been adopted as common practice in point cloud registration applications. State-of-the-art point cloud registration methods generally deal with this problem as a regression task to find the underlying rotation and translation between two point clouds. However, given two point clouds, the transformation between them could be calculated using only definitive point subsets from each cloud. Furthermore, training time is still a major problem among the current registration networks, whereas using a selective approach to define the informative point subsets can lead to reduced network training times. To that end, we developed ALReg, an active learning procedure to select a limited subset of point clouds to train the network. Each of the point clouds in the training set is divided into superpoints (small pieces of each cloud) and the training process is started with a small amount of them. By actively selecting new superpoints and including them in the training process, only a prescribed amount of data is used, hence the time needed to converge drastically decreases. We used DeepBBS, FMR, and DCP methods as our baselines to prove our proposed ALReg method. We trained DeepBBS and DCP on the ModelNet40 dataset and FMR on the 7Scenes dataset. Using 25% of the training data for ModelNet and 4% for the 7Scenes, better or similar accuracy scores are obtained in less than 20% of their original training times. The trained models are also tested on the 3DMatch dataset and better results are obtained than the original FMR training procedure.
@article{sahin2023alreg, title = {ALReg: Registration of 3D Point Clouds Using Active Learning}, author = {Sahin, Y.H. and Karabacak, O. and Kandemir, M. and Unal, G.}, year = {2023}, journal = {MDPI Applied Sciences}, url = {https://www.mdpi.com/2076-3417/13/13/7422}, }
TMLR
Cheap and Deterministic Inference for Deep State-Space Models of Interacting Dynamical Systems

A. Look, B. Rakitsch, M. Kandemir, and 1 more author

Transactions on Machine Learning Research, 2023

Abs Bib HTML

Graph neural networks are often used to model interacting dynamical systems since they gracefully scale to systems with a varying and high number of agents. While there has been much progress made for deterministic interacting systems, modeling is much more challenging for stochastic systems in which one is interested in obtaining a predictive distribution over future trajectories. Existing methods are either computationally slow since they rely on Monte Carlo sampling or make simplifying assumptions such that the predictive distribution is unimodal. In this work, we present a deep state-space model which employs graph neural networks in order to model the underlying interacting dynamical system. The predictive distribution is multimodal and has the form of a Gaussian mixture model, where the moments of the Gaussian components can be computed via deterministic moment matching rules. Our moment matching scheme can be exploited for sample-free inference leading to more efficient and stable training compared to Monte Carlo alternatives. Furthermore, we propose structured approximations to the covariance matrices of the Gaussian components in order to scale up to systems with many agents. We benchmark our novel framework on two challenging autonomous driving datasets. Both confirm the benefits of our method compared to state-of-the-art methods. We further demonstrate the usefulness of our individual contributions in a carefully designed ablation study and provide a detailed empirical runtime analysis of our proposed covariance approximations.
@article{look2023cheap, title = {Cheap and Deterministic Inference for Deep State-Space Models of Interacting Dynamical Systems}, author = {Look, A. and Rakitsch, B. and Kandemir, M. and Peters, J.}, year = {2023}, journal = {Transactions on Machine Learning Research}, url = {https://openreview.net/forum?id=dqgdBy4Uv5&noteId=xKtcWgwdxX}, }
TMLR
Meta Continual Learning on Graphs with Experience Replay

A. Unal, A. Akgül, M. Kandemir, and 1 more author

Transactions on Machine Learning Research, 2023

Abs Bib HTML

Continual learning is a machine learning approach where the challenge is that a constructed learning model executes incoming tasks while maintaining its performance over the earlier tasks. In order to address this issue, we devise a technique that combines two uniquely important concepts in machine learning, namely "replay buffer" and "meta learning", aiming to exploit the best of two worlds. In this method, the model weights are initially computed by using the current task dataset. Next, the dataset of the current task is merged with the stored samples from the earlier tasks and the model weights are updated using the combined dataset. This aids in preventing the model weights converging to the optimal parameters of the current task and enables the preservation of information from earlier tasks. We choose to adapt our technique to graph data structure and the task of node classification on graphs. We introduce MetaCLGraph, which outperforms the baseline methods over various graph datasets including Citeseer, Corafull, Arxiv, and Reddit. This method illustrates the potential of combining replay buffer and meta learning in the field of continual learning on graphs.
@article{unal2023meta, title = {Meta Continual Learning on Graphs with Experience Replay}, author = {Unal, A. and Akg{\"u}l, A. and Kandemir, M. and Unal, G.}, year = {2023}, journal = {Transactions on Machine Learning Research}, url = {https://openreview.net/forum?id=8tnrh56P5W}, }

2022

NeurIPS
Learning Interacting Dynamical Systems with Latent Gaussian Process ODEs

C. Yildiz, M. Kandemir, and B. Rakitsch

In Neural Information Processing Systems, 2022

Abs Bib HTML

We study for the first time uncertainty-aware modeling of continuous-time dynamics of interacting objects. We introduce a new model that decomposes independent dynamics of single objects accurately from their interactions. By employing latent Gaussian process ordinary differential equations, our model infers both independent dynamics and their interactions with reliable uncertainty estimates. In our formulation, each object is represented as a graph node and interactions are modeled by accumulating the messages coming from neighboring objects. We show that efficient inference of such a complex network of variables is possible with modern variational sparse Gaussian process inference techniques. We empirically demonstrate that our model improves the reliability of long-term predictions over neural network based alternatives and it successfully handles missing dynamic or static information. Furthermore, we observe that only our model can successfully encapsulate independent dynamics and interaction information in distinct functions and show the benefit from this disentanglement in extrapolation scenarios..
@inproceedings{yildiz2022learning, title = {Learning Interacting Dynamical Systems with Latent Gaussian Process ODEs}, author = {Yildiz, C. and Kandemir, M. and Rakitsch, B.}, year = {2022}, booktitle = {Neural Information Processing Systems}, url = {https://arxiv.org/abs/2205.11894}, }
ICLR
Evidential Turing Processes

M. Kandemir, A. Akgül, M. Haussmann, and 1 more author

In International Conference on Learning Representations, 2022

Abs Bib HTML Code

A probabilistic classifier with reliable predictive uncertainties i) fits successfully to the target domain data, ii) provides calibrated class probabilities in difficult regions of the target domain (e.g. class overlap), and iii) accurately identifies queries coming out of the target domain and reject them. We introduce an original combination of Evidential Deep Learning, Neural Processes, and Neural Turing Machines capable of providing all three essential properties mentioned above for total uncertainty quantification. We observe our method on three image classification benchmarks to consistently improve the in-domain uncertainty quantification, out-of-domain detection, and robustness against input perturbations with one single model. Our unified solution delivers an implementation-friendly and computationally efficient recipe for safety clearance and provides intellectual economy to an investigation of algorithmic roots of epistemic awareness in deep neural nets.
@inproceedings{kandemir2022evidential, title = {Evidential Turing Processes}, author = {Kandemir, M. and Akgül, A. and Haussmann, M. and Unal, G.}, year = {2022}, booktitle = {International Conference on Learning Representations}, url = {https://openreview.net/forum?id=84NMXTHYe-}, }
T-PAMI
A Deterministic Approximation to Neural SDEs

A. Look, M. Kandemir, B. Rakitsch, and 1 more author

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022

Abs Bib HTML

Neural Stochastic Differential Equations (NSDEs) model the drift and diffusion functions of a stochastic process as neural networks. While NSDEs are known to make accurate predictions, their uncertainty quantification properties haven been remained unexplored so far. We report the empirical finding that obtaining well-calibrated uncertainty estimations from NSDEs is computationally prohibitive. As a remedy, we develop a computationally affordable deterministic scheme which accurately approximates the transition kernel, when dynamics is governed by a NSDE. Our method introduces a bidimensional moment matching algorithm: vertical along the neural net layers and horizontal along the time direction, which benefits from an original combination of effective approximations. Our deterministic approximation of the transition kernel is applicable to both training and prediction. We observe in multiple experiments that the uncertainty calibration quality of our method can be matched by Monte Carlo sampling only after introducing high computation cost. Thanks to the numerical stability of deterministic training, our method also provides improvement in prediction accuracy.
@article{look2022adeterministic, title = {A Deterministic Approximation to Neural SDEs}, author = {Look, A. and Kandemir, M. and Rakitsch, B. and Peters, J.}, year = {2022}, journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence}, url = {https://arxiv.org/abs/2006.08973}, publisher = {IEEE} }
L4DC
Traversing Time with Multi-Resolution Gaussian Process State-Space Models

K. Longi, J. Lindinger, O. Duennbier, and 3 more authors

In Learning for Dynamics and Control, 2022

Abs Bib HTML

Gaussian Process state-space models capture complex temporal dependencies in a principled manner by placing a Gaussian Process prior on the transition function. These models have a natural interpretation as discretized stochastic differential equations, but inference for long sequences with fast and slow transitions is difficult. Fast transitions need tight discretizations whereas slow transitions require backpropagating the gradients over long subtrajectories. We propose a novel Gaussian process state-space architecture composed of multiple components, each trained on a different resolution, to model effects on different timescales. The combined model allows traversing time on adaptive scales, providing efficient inference for arbitrarily long sequences with complex dynamics. We benchmark our novel method on semi-synthetic data and on an engine modeling task. In both experiments, our approach compares favorably against its state-of-the-art alternatives that operate on a single time-scale only.
@inproceedings{lungi2022gpssm, title = {Traversing Time with Multi-Resolution Gaussian Process State-Space Models}, author = {Longi, K. and Lindinger, J. and Duennbier, O. and Kandemir, M. and Klami, A. and Rakitsch, B}, year = {2022}, booktitle = {Learning for Dynamics and Control}, url = {https://proceedings.mlr.press/v168/longi22a/longi22a.pdf}, }
DMKD
PAC-Bayesian lifelong learning for multi-armed bandits

H. Flynn, D. Reeb, M. Kandemir, and 1 more author

Data Mining and Knowledge Discovery, 2022

Abs Bib HTML

We present a PAC-Bayesian analysis of lifelong learning. In the lifelong learning problem, a sequence of learning tasks is observed one-at-a-time, and the goal is to transfer information acquired from previous tasks to new learning tasks. We consider the case when each learning task is a multi-armed bandit problem. We derive lower bounds on the expected average reward that would be obtained if a given multi-armed bandit algorithm was run in a new task with a particular prior and for a set number of steps. We propose lifelong learning algorithms that use our new bounds as learning objectives. Our proposed algorithms are evaluated in several lifelong multi-armed bandit problems and are found to perform better than a baseline method that does not use generalisation bounds.
@article{flynn2022pac, title = {PAC-Bayesian lifelong learning for multi-armed bandits}, author = {Flynn, H. and Reeb, D. and Kandemir, M. and Peters, J.}, year = {2022}, journal = {Data Mining and Knowledge Discovery}, url = {https://link.springer.com/article/10.1007/s10618-022-00825-4}, publisher = {Springer} }