On Logarithmic Regret for Bandits with Knapsacks

Wenbo Ren,Jia Liu,Ness B. Shroff

On Logarithmic Regret for Bandits with Knapsacks

2021

Wenbo Ren
Jia Liu
Ness B. Shroff

This paper studies bandits with knapsacks (BwK). In a BwK instance, there are a set of $n$ arms and $d$ types of resources with limited budgets. Each pull of an arm returns a noisy reward and consumes some amount of resources in each type according to some latent distribution of this arm. The decision-maker adaptively pulls arms in order to maximize the accumulated reward gained before some type of resources runs out. We investigate logarithmic distribution-dependent regrets for the BwK problem, specifically for the instances with deterministic costs. We propose a new algorithm with regret in the form of O(n log $T$ /Δ) (Δis the gap of rewards similar to that in standard MAB), which to our knowledge, is of the lowest order till now, and has the same order as the standard MAB problem when d = 1. Simulation results also suggest the performance improvements using our algorithms.

Keywords:

Type (model theory)
Upper and lower bounds
Set (abstract data type)
Regret
Mathematics
Minification
Mathematical optimization
Logarithm
Noise measurement
order

Correction
Source
Cite
Save

References

Citations