Friday, March 11, 2016

Bandit algorithms 2 - Multiarmed Bandit Algorithms

What Are We Trying to Do?

Traffic
Conversions
Sales
CTR's

The Business Scientist: Web-Scale A/B Testing

Standard A/B testing consists of:

A short period of pure exploration, in which you assign equal numbers of users to Groups A and B.
A long period of pure exploitation, in which you send all of your users to the more successful version of your site and never come back to the option that seemed to be inferior.
Why might this be a bad strategy?

It jumps discretely from exploration into exploitation, when you might be able to smoothly transition between the two.
During the purely exploratory phase, it wastes resources exploring inferior options in order to gather as much data as possible. But you shouldn’t want to gather data about strikingly inferior options.

Bandit algorithms provide solutions to both of these problems:
1) they smoothly decrease the amount of exploring they do over time instead of requiring you to make a sudden jump.
2) they focus your resources during exploration on the better options instead of wasting time on the inferior options that are over-explored during typical A/B testing.



No comments:

Post a Comment

Blog Archive