Here’s the third installment of my series on Facebook ad demographic optimization. In the previous posts, we defined the problem of Facebook advertisers misspecifying demographics . In the second post, we examined the results of Facebook ad clusters derived from text mining . This instalment picks up where we left off by showing a method for determining the optimal demographics within a homogenious ad cluster. Once the overall distribution of responses has been charted for each demographic, we explore a way to determine if a single ad shares the same response rate distributions. Then and only then, can we determine if an ad has been misspecified.
Once clusters of similar ads are identified, the relationship between the probability to click and user demographics can be explored. Certain levels of the demographic predictors would exhibit a higher click probability than others; men are more likely than women to click on get six pack abs. Candidate variables for a click model would include, but not be limited to: location, age, sex, education, relationship and interested in. If probability to click by demographic can be modeled for each ad group, Facebook would be able to identify the target range for an ad group based on the number of clicks per day the advertiser wanted to receive. If the advertiser’s budget were small, Facebook could serve the ad to the most targeted range. If the budget were large, they would have to increase the range size serving ads to less and less targeted user groups.
Logistic regression using a selection method can be employed to determine the significance and contribution of each demographic to predict the probability of a click. The interested in variable may need to be dropped because of insufficient frequency counts (small number of LGBT responders) at different levels of the other demographic predictors to prevent quasi-separation. Other variables such as keywords and workplaces have too many response levels to be considered in the model without binning or clustering the data. Additionally, location will need to be binned by region or clustered using census data to reduce the number of predictor levels. Age will need to be plotted by the logit to determine if it should be entered as a continuous, quadratic or cubit predictor.
Below is the equation for the proposed logistic model: