Facebook Ad Optimization: Ad Clustering

Posted by bkloss | facebook | Monday 4 May 2009 2:10 pm

This is the second post of my Facebook ad serving optimization article.  If you haven’t read my first post in the series , check it out here for a fleshed out version of the problem statement. To recap, we are looking at the problem of identifying Facebook ads that are misspecified, then giving marketers a gentle nudge to help them get the highest CTR.  This post employs text mining to discover similar ad groups.  This would provide Facebook with a basis of comparison to determine if an ad is not optimally targeted.

Ad Text Clustering

The first step to correcting over targeting would entail identifying similar ad groups.  Once these groups are identified, Facebook can then amass demographic and response data into a large data set for later analysis.

Ads are composed of text and pictures describing a product or service.  To evaluate the worth of ad clustering based on textual attributes, a sample of 47 distinct Facebook ads were copied from search result pages.  SAS Enterprise Miner (EM) was used to create and describe ad clusters.  Below is an outline of the text clustering process flow:

This is an upper level overview of the text mining process

Initial clustering was performed using Singular Value Decomposition (SVD).  SVD is most easily conceived of as a dimension reduction technique for text mining.  SVD extracts scores that represent as much of the latent structure of terms as possible between different documents.  The process is analogous to what principal component analysis does for a set of predictor variables.

In the first round of clustering, only a few words (and, of, the, ect) were excluded from the list of considered terms.  All document terms minus the excluded terms constituted the start list.  Only the terms in the start list were used to extract the SVD scores used to form the document clusters.

EM offers a number of tools to interact with and explore resulting cluster sets.  It is common to iterate on the clustering process by refining the start list through further term exclusion and the stemming of similar terms.  Additional terms were removed from the Facebook ad start list (free, most, ect.) to further refine the ability of the algorithm to differentiate between ad groups.  The resulting ad clusters  are shown below with a few descriptive terms listed for each cluster:

This image shows the breakdown and description of facebook ad clusters

Clustering Results

Clustering was performed on both ad body text and a combination of ad body and ad headline text.  The clusters resulting from only ad body text were more homogeneous and are described below.

As a whole, the final start list did a fairly good job of separating different ad groups.  The words report, credit and score drive cluster two membership.  This cluster contains every credit score ad as well as an ad for Graduate school tests and continuing legal education.  Cluster seven is represented by the terms quote, insurance and work.  It contains both insurance ads from the sample as well as an ad for stock quotes and health and wellness.  Cluster nine contains three out of four ads for home buying or improvement.  Cluster five lacks SVD values and is composed of all unclassifiable ads.

If a larger sample were obtained, purity of the ad clusters may improve. Additionally, with a larger sample, methods such as decision tree classification can be applied to the members of an ad clusters to increase the purity of the final ad segments.  For example, if after text mining, a cluster contained ads for home, car and life insurance, a decision tree could be trained to use a small set of categorical values (home, life car, protection) for the splitting criteria, resulting in smaller, more pure segments.

These initial results based on a small corpus are encouraging and suggest that a more refined method with a larger sample size could prove to be quite effective to identify ad groups within Facebook’s inventory.   If pure clusters could be obtained, the next task would be to find the optimal demographic range for each ad group.  Once the optimal range were known, it could be determined if a single ad’s demographic specifications were in line with the optimal range or otherwise over targeted.

In the next post, we look at a method for determining the optimal demographic ranges for Facebooks ad groups.

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a comment

temperture
knows
microprocessor
womens
flooding
vinny
warren
mechanicsville
thornhill
prayers
beak
automatically
reservoir
enemies
sellers
kawasaki
browns
pleasanton
gaston
tyre
ionizer
fryer
filled
orthopedics
vessels
brazil
merlin
qt
niles
solicitors
blink
stores
pcl
choice
waterproofing
diagnostics
jenifer
jab
histology
literacy
wrath
schooner
wma
aboard
inter
indulgence
staind
soak
investments
juli
steady
terence
crest
survive
paraguay
kayaks
viewsonic
shitting
periodicals
shoppers
contribution
pontiac
gilles
enfant
explicit
trois
skiing
sandler
slug
cupcake
commercial
firestone
til
counts
robot
reclining
luxe
killswitch
spouse
sandpoint
attraction
checklist
jerky
soul
ultima
clare
reebok
abrasive
witness
imperial
pennington
titanic
caterpillar
braces
expedia
insane
confederate
alex
custom
wording
launch
switched
hyatt
jeremy
shy
hiring
jab
salomon
adirondack
shultz
u2
dooley
ter
tracer
kyoto
quit
guaranty
autoclave
bjork
youngstown
sunfire
miners
falcons
vertical
nih
dominique
kenosha
loren
landers
martini
microwaves
bunny
alloy
classic
heated
attendant
suunto
shultz
partnership
dementia
rosenthal
release
gregory
maxim
powerbook
electrode
hotel
lexmark
geral
india
rhinestones
park
mansion
bandage
forestry
bowser
av
pairing
deptford
kruger
greens
hdpe
uxbridge
isdn
exposure
sicilian
waddell
burmese
sheena
jcpenney
valid
linux
caesars
enviro
ringo
critic
fake
yearbook
decrypt
holding
dig
montecito
canyon
aggie
earnest
regions
adhd
needed
gastro
ebay
increases
cries
wav
suit
quebec
craiglist
fundraisers
attractive
trimming
conneticut
tester
ronald
mutant
digging
clovis
designing
logical
sharma
propulsion
saints
christiana
psychiatric
hin
pentax
bowes
pomeroy
collie
retarded
tappan
engraving
guages
woodwork
plano
elemental
starts
pcb
raised
yiff
wrapped
microsystems
dm
camoflage
menu
sounds
alpha
potatoe
animals
brokerage
savvy
owa
infringement
amortization
turning
seekers
mce
plows
frames
ninja
birthstone
vita
crop
tonight
swings
executives
stylish
jbl
chime
earring
funniest
camoflauge
carvings
hilfiger
millionaire
workflow
gatwick
intersection
purdue
travis
chicagoland
walkthru
lockdown
traders
southampton
lotto
railing
svc
hiroshima
parabolic
yankovic
coolant
centerville
brantford
compaq
batavia
captial
doomsday
replaced
keyboarding
manifest
southington
spill
blvd
sprout
hawker
written
sunscreen
wreath
galaxies
significance
path
imdb
nina