1 INTRODUCTION
Changes over time:
Ad format/ ad content (campaigns, creative, sites, etc.),
Impression delivery rates,
Narrowly / broadly audience targeting,
Geo targeting changes, etc.
Traditionally, online campaign effectiveness has been measured by clicks because someone who interacts with an ad was affected by it.
Immediate responses, i.e., counting clicks alone miss the value of a campaign.
Long-term interests in the brand by counting visitors overstate campaign effectiveness.
Changing in online brand interest that can be attributed to the display ad campaign alone can be a better metrics. User searches for brand terms or navigates to brand sties can be attributed to an online ad campaign.
Propensity scoring and doubly robust estimation methods are used to protect against residuals, hidden selection bias, etc. The test statistics for an outcome of interest to the advertiser is judged against a null distribution that is based on a set of outcomes that are irrelevant to the advertiser.
2 Causal effects
Counterfactuals are what would have happened had the campaign not been run, which is fundamental to understanding the analysis of observational data found in the databases.
Y0: user outcome if not shown a campaign ad control group
Y1: user outcome if shown a campaign ad test group
Selection bias: more active web users are more likely to be exposed, and less active web users are more likely to be unexposed and hence potential controls, even if they satisfy the targeting conditions for the campaign.
In short, the ad serving process leads to selection bias or differences in the control and exposed that are related to the distribution of the outcomes (Y0,Y1).
3 The Controls
The first ad served to an exposed user breaks its timeline into before and after exposure periods. Checking the control and exposed before their first exposure for a campaign.
Advertiser’s own campaign information
Start dt, end dt, etc.
Ad serving logs
Ranking, frequency cap, etc.
Demo info & Geo Info
Country, language
Historical activity before
# of navigations
# of navigations to advertiser’s site before exposed
# first exposure time
# number of campaign ads served to user during the analysis period.
4 Propensity Scores
If a control and exposed user were identical before exposure, then it is reasonable to assume that they would have had the same probability of showing interest in the brand after exposure if the campaign had not been run.
P(Y0 =1 |X, exposed) = P(Y0=1 | X, control)
4.1 Propensity Matching
P(X) = P(exposed | X)
Removes selection bias whenever matching on X itself removes selection bias.
Matching on a consistent estimate Phat can remove selection bias better than matching on X or P(X).
Controls and exposed with similar propensities can be grouped together and mean differences within groups averaged.
4.2 Inverse Propensity Weighting
There is selection bias because a feature is correlated with both sample selection and the outcome, but re-weighting the data with weights inversely proportional to the probability of selection removes the bias.
Control: Wi = Phat(Xi)/(1-Phat(Xi))
Exposed: Wi = 1
Weighting the exposed by 1/Phat(X) and the controls by 1/(1-Phat(X)) is appropriate when the goal is to estimate the potential campaign effect on all users.
Delta(IPW)
5 Doubly Robust Estimation
The estimated Delta (DR) is doubly robust because it remains consistent if the outcome models are wrong but the propensity model is right or if the propensity model is wrong but the outcome models are right, although in those cases there is no guarantee that Delta(DR) has minimum asymptotic variance.
6 Hypothesis Testing
T(DR) = Delta(DR)/S(DR) follows a standard normal(0,1).
A fully non-parametric test is obtained by rejecting H0 only if T(DR) is larger than the quantile of the empirical null distribution.
7 Model Selections and Validation
Primary goal is to balance X across controls and exposed, in the sense that the distribution of X conditional on Phat(X,Z) should be independent of Z even for features of X that are not included in the fitted propensity model.
ANOVA F-test
GAM model
8 Experiments
9 Discussions
No comments:
Post a Comment