Tuesday, March 15, 2016

Bandit algorithms 7 - Bandits in the Real World: Complexity and Complications

A/A Testing
you should use the code you develop for assigning users to arms, but then define two different arms that are actually both the same user experience. If you find differences between these two identical arms, you need to substantially temper your claims about differences between other arms. This illusory difference between two A arms may indicate that there’s a bug in your code or a mistake in the way you’re analyzing your data. But it may also indicate that your test is running in a context that is subtly different from the assumptions we’ve been implicitly making when setting up our algorithms and simulations.
Running Concurrent Experiments
In an ideal world, concurrency issues raised by running multiple experiments at once won’t come up. The best solution to this is simple: try your best to keep track of all of the experiments each user is a part of and include this information in your analyses of any single experiment.
Continuous Experimentation vs. Periodic Testing
Bandit algorithms look much better than A/B testing when you are willing to let them run for a very long time. 
Bad Metrics of Success
Once you decide to start working with automated metrics, you need to supplement those systems by exercising human judgment and making sure that you keep an eye on what happens as the system makes changes to your site.
Scaling Problems with Good Metrics of Success
The algorithms described in this book may not work well unless you rescale those metrics into the 0-1 space we’ve used in our examples. The reasons for this are quite boring: some of the algorithms are numerically unstable, especially the softmax algorithm, which will break down if you start trying to calculate things.
Intelligent Initialization of Values
First, you can use the historical metrics for the control arm in your bandit algorithm. 
Second, you can use the amount of historical data you have to calibrate how much the algorithm thinks you know about the historical options.
Running Better Simulations
In addition to initializing your algorithm using prior information you have before deploying a Bandit algorithm, you can often run much better simulations if you use historical information to build appropriate simulations. 
Moving Worlds
Arms with changing values can be a very serious problem if you’re not careful when you deploy a bandit algorithm. 
new_value = ((n - 1) / float(n)) * value + (1 / float(n)) * reward
self.values[chosen_arm] = new_value
Correlated Bandits
Look into classical smoothing techniques in statistics to get a sense for how you might deal with correlated arms.
Contextual Bandits
"A Contextual-Bandit Approach to Personalized News Article Recommendation" 
"Parametric Bandits: The Generalized Linear Case".
Implementing Bandit Algorithms at Scale
Blocked assignments
 Assign incoming users to new arms in advance and draw this information from a fast cache when users actually arrive. Sore their responses for batch processing later in another fast cache.
Blocked updates
 Update your estimates of arm values in batches on a regular interval and regenerate your blocked assignments.
Here are the most salient lessons:
Trade-offs, trade-offs, trade-offs
In the real world, you always have to trade off between gathering data and acting on that data. Pure experimentation in the form of exploration is always a short-term loss, but pure profit-making in the form of exploitation is always blind to the long-term benefits of curiosity and openmindedness. You can be clever about the compromises you make, but you will have to make some compromises.
God does play dice
Randomization is the key to the good life. Controlled experiments online won’t work without randomization. If you want to learn from your experiences, you need to be in complete control of those experiences. While the UCB algorithms we’ve used in this book aren’t truly randomized, they behave at least partially like randomized algorithms from the perspective of your users. Ultimately what matters most is that you make sure that end-users can’t self-select into the arms you want to experiment with.
Defaults matter a lot
The way in which you initialize an algorithm can have a powerful effect on its long-term success. You need to figure out whether your biases are helping you or hurting you. No matter what you do, you will be biased in some way or another. What matters is that you spend some time learning whether your biases help or hurt. Part of the genius of the UCB family of algorithms is that they make a point to do this initialization in a very systematic way right at the start.
Take a chance
You should try everything at the start of your explorations to insure that you know a little bit about the potential value of every option. Don’t close your mind without giving something a fair shot. At the same time, just one experience should be enough to convince you that some mistakes aren’t worth repeating.
Everybody’s gotta grow up sometime
You should make sure that you explore less over time. No matter what you’re doing, it’s important that you don’t spend your whole life trying out every crazy idea that comes into your head. In the bandit algorithms we’ve tried, we’ve seen this lesson play out when we’ve implemented annealing. The UCB algorithms achieve similar effects to annealing by explicitly counting their experiences with different arms. Either strategy is better than not taking any steps to become more conservative over time.
Leave your mistakes behind
You should direct your exploration to focus on the second-best option, the third-best option and a few other options that are just a little bit further away from the best. Don’t waste much or any of your time on options that are clearly losing bets. Naive experimentation of the sort that occurs in A/B testing is often a deadweight loss if some of the ideas you’re experimenting with are disasters waiting to happen.
Don’t be cocky
You should keep track of how confident you are about your evaluations of each of the options available to you. Don’t be close-minded when you don’t have evidence to support your beliefs. At the same time, don’t be so unsure of yourself that you forget how much you already know. Measuring one’s confidence explicitly is what makes UCB so much more effective than either the epsilon-Greedy algorithm or the Softmax algorithm in some settings.
Context matters
You should use any and every piece of information you have available to you about the context of your experiments. Don’t simplify the world too much and pretend you’ve got things figured out: there’s more to optimizing your business that comparing A with B. If you can figure out a way to exploit context using strategies like those seen in the contextual bandit algorithms we briefly discussed, use them. And if there are ways to generalize your experiences across arms, take advantage of them.

No comments:

Post a Comment

Blog Archive