My DS Coding Bolg: 3 - Linkedin Talk: Claudia Perlich: Data Science and Optimization

Problem setup:

1 Ads exchange system: real time bidding

How much to bid? If win, what ads to show?

2 Observe view through conversion rate

Group content, reduce dim.

Quality control for incoming data.

Attribution question, even independent effect, who get credits if multiple targeter show ads?

Goal:

Evaluate every opportunity (browser showing up inventory, which is URL currently user on and try to show an ads), and given an campaign, decide $ to bid.

Procedure:

1 Depend on deals (CPM, CPA). Measure the performance by cvr.

2 Data, 100M users in 100M URLs, 50K inventory(buckets), 300 Concurrent campaigns.

How to reduce Dim? 100M URLs with low base rates. Real time scoring. Scalable. How to sampling?

0:15:10-User churn rate is high. Half of the people disappear within a day, which is similar to our results in the user churn analysis.

0:17:27 Three models:

Model 1: How to get signals from 100M URLs(Dim reduction)? 20 URLS per user on average.

Unconditional Brand Process. What is Prob(conv)? Conversion probability independent of ads.

Take every single positive event. Augment to get URL visits to get negative sets, b/c cant sampling by user.

Naïve Bayes, sparse logistic regression to get scores.

Adding new good data might cause problem. Timing is important. Facts with 30 days history and facts with 10 days history contain different volume and signals. Match by sampling.

Model 2: Conditional on showing you ads, what is Prob(conv|imp)?

Frequency * scores as inputs with 20 dim variables.

Show ads for RON users randomly. Prevent from targeters (??exclude retargeting user sets).

Performance comparions with RON.

Cluster URLs. Pre-filtering URLs, calculate correlations based on campaigns and scores. Work for small campaigns.

Model 3: Bidding.

First, nothing to ads, reinforce belief of the match.

Second, current intention.

Third, certain sites indicate cookies likely delete soon. Show ads won’t get credits for those users. Survival rate low, less probability to show ads.

Fourth, some websites are invisible.

Fifth, perceptiveness. The gorilla experiment. Focus on specific task, perceptions on other things completely out. http://www.theinvisiblegorilla.com/gorilla_experiment.html

0:47:48 real time bidding world use second price rule (highest bidder wins, pays second highest bid price). Theory suggests truthful bid your true value.

Experience: Bid $2, b/c pay volume. No true value exists. AM decides, adjusted by baseline probability.

$2  CPM $5 * (1-margin 60%).

Locally preference number by adjusting it by cvr, say, good reason to bid twice.

Strategy 1: price set by AM.

Strategy 2: price set * ratio, which is estimated by Pr(conv|segment, inventory)/Pr(conv|segment), which indicates the inventory value to tell good inventory and bad inventory. Economic efficient model, pay as much as you are able to get.

Strategy 3: greedy bid. Only bid when it is good, but bid a lot.

Conclusion: both strategy 2 and 3 got higher cvrs. Strategy 2 got evenly equal price as Strategy 1. Don’t pay more, but look better. Strategy 3 has higher CPA.

Site visits, clicks, purchases. Site visits almost always better than clicks. 70% of site visits better than purchases b/c bias-variance tradeoff.

Control trafficking by segment. Pr(conv|segment, inventory)/Pr(conv|segment

0.55.19 Worst place to go for inventory is social media, Facebook.

Lift about 3-5X over RON.

ROC plot shows that sensitivity (% of correctly classified positive observations) and specificity (% of correctly classified negative observations) as the output threshold is moved over the range of all possible values. In the ROC context, the area under the curve (AUC) measures the performance of a classifier and is frequently applied for method comparison.

My DS Coding Bolg

Thursday, October 15, 2015

3 - Linkedin Talk: Claudia Perlich: Data Science and Optimization

No comments:

Post a Comment

Blog Archive