Facebook Ad Optimization: Ad Clustering

Posted by bkloss | facebook | Monday 4 May 2009 2:10 pm

This is the second post of my Facebook ad serving optimization article.  If you haven’t read my first post in the series , check it out here for a fleshed out version of the problem statement. To recap, we are looking at the problem of identifying Facebook ads that are misspecified, then giving marketers a gentle nudge to help them get the highest CTR.  This post employs text mining to discover similar ad groups.  This would provide Facebook with a basis of comparison to determine if an ad is not optimally targeted.

Ad Text Clustering

The first step to correcting over targeting would entail identifying similar ad groups.  Once these groups are identified, Facebook can then amass demographic and response data into a large data set for later analysis.

Ads are composed of text and pictures describing a product or service.  To evaluate the worth of ad clustering based on textual attributes, a sample of 47 distinct Facebook ads were copied from search result pages.  SAS Enterprise Miner (EM) was used to create and describe ad clusters.  Below is an outline of the text clustering process flow:

This is an upper level overview of the text mining process

Initial clustering was performed using Singular Value Decomposition (SVD).  SVD is most easily conceived of as a dimension reduction technique for text mining.  SVD extracts scores that represent as much of the latent structure of terms as possible between different documents.  The process is analogous to what principal component analysis does for a set of predictor variables.

In the first round of clustering, only a few words (and, of, the, ect) were excluded from the list of considered terms.  All document terms minus the excluded terms constituted the start list.  Only the terms in the start list were used to extract the SVD scores used to form the document clusters.

EM offers a number of tools to interact with and explore resulting cluster sets.  It is common to iterate on the clustering process by refining the start list through further term exclusion and the stemming of similar terms.  Additional terms were removed from the Facebook ad start list (free, most, ect.) to further refine the ability of the algorithm to differentiate between ad groups.  The resulting ad clusters  are shown below with a few descriptive terms listed for each cluster:

This image shows the breakdown and description of facebook ad clusters

Clustering Results

Clustering was performed on both ad body text and a combination of ad body and ad headline text.  The clusters resulting from only ad body text were more homogeneous and are described below.

As a whole, the final start list did a fairly good job of separating different ad groups.  The words report, credit and score drive cluster two membership.  This cluster contains every credit score ad as well as an ad for Graduate school tests and continuing legal education.  Cluster seven is represented by the terms quote, insurance and work.  It contains both insurance ads from the sample as well as an ad for stock quotes and health and wellness.  Cluster nine contains three out of four ads for home buying or improvement.  Cluster five lacks SVD values and is composed of all unclassifiable ads.

If a larger sample were obtained, purity of the ad clusters may improve. Additionally, with a larger sample, methods such as decision tree classification can be applied to the members of an ad clusters to increase the purity of the final ad segments.  For example, if after text mining, a cluster contained ads for home, car and life insurance, a decision tree could be trained to use a small set of categorical values (home, life car, protection) for the splitting criteria, resulting in smaller, more pure segments.

These initial results based on a small corpus are encouraging and suggest that a more refined method with a larger sample size could prove to be quite effective to identify ad groups within Facebook’s inventory.   If pure clusters could be obtained, the next task would be to find the optimal demographic range for each ad group.  Once the optimal range were known, it could be determined if a single ad’s demographic specifications were in line with the optimal range or otherwise over targeted.

In the next post, we look at a method for determining the optimal demographic ranges for Facebooks ad groups.

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a comment