Monday, September 03, 2012
I’ve just ended my six month tour as VP Optimization at LeftBrain DGA, and am now returning full time to my usual consulting, writing, and general shenanigans. It was fun to work again as a hands-on marketer. Here are some insights based on the experience.
- lots of content. We all know that content is king, but sometimes forget the king has a voracious appetite. A serious demand generation program might move contacts through half dozen stages with several levels within each stage and several messages within each level. This could easily come to forty or fifty messages, each offering a different downloadable asset. The numbers go even higher when you start to create separate streams for different personas. Building these materials is major undertaking, first to understand what’s appropriate and then to create it. But deploying the initial content is just the start: you then have to monitor performance, test alternatives, and periodically refresh the whole stream. Finding efficient ways to do this is critical to keeping costs and schedules within reason. (Note that I’m talking here about email programs to nurture known contacts, not acquisition programs to attract new names. That takes another massive content collection.)
- content isn’t everything. It’s an old saw among direct marketers that the list determines most of your response rate and the offer controls for most of the rest. Actual creative execution (copy, graphics, format, etc.) accounts for maybe 10% of the result. We proved this repeatedly with tests that used the different content at the same stage in the campaign flow: basically, results were similar even with content originally designed for different purposes. Conversely, the same piece of content had hugely different results at different places in the flow. What this meant in both cases was that response was primarily driven by the people at each stage, not by the specifics of the materials presented.
- simplicity helps. That results are primarily driven by audience doesn’t mean that content doesn’t matter. We did a fascinating (to me, at least) analysis of 100 emails, logging specific features such as number of words and readability scores and then comparing these against open, click-through, and form submit rates. A clear pattern emerged: simpler emails (shorter, fewer graphics, easier to read) performed better. In fact, the pattern was so clear that there's a danger of over-reaction: at some point, a message can be too short to be effective (think of the mayor in The Simpsons, who just repeats “Vote for Me”). So the real trick is to find an optimal length, and even then to recognize that some messages truly need to be longer than others.
- simplicity isn’t everything, either. We did a lot of testing – it was my favorite part of my job – but the content tests were often inconclusive: sometimes shorter won, sometimes longer won, most often the difference was too small to matter. Given that we were starting with competently-created materials, that’s not too surprising. On the other hand, we consistently found that forms with fewer questions yielded better results, typically by a ratio of 3:1. This is one example of a non-content item with major impact; another was contact frequency (more is better, but, as with simplicity, only up to a point). There were other aspects of program structure that I would have tested had time and resources permitted; the goal was to focus on variables with the potential for a substantial impact on over-all results. This generally meant moving beyond individual content tests to items with larger and more global impact.
- test themes, not details. Don’t misinterpret that last sentence: I’m not against content tests. What I'm against is tests that only teach one small, random lesson, such as whether subject line A is better than subject line B. The way to build more powerful tests is to build them around a hypothesis and then try several simultaneous changes that support or refute that hypothesis. (I’ve shamelessly stolen this insight from Marketing Experiments, whose methodology I hugely admire and highly recommend.) So, if you think simplicity is an issue, create one test with shorter subject line and less copy and fewer graphics and a simpler call to action, and run that against your control. This is exactly the opposite of conventional testing advice of changing just one thing at a time. That approach made sense back in the days of direct mail when you were running a handful of versions per year, but isn’t an option in the content-intensive environment of modern online marketing. And even if you had the resources to run a gazillion separate tests, you’d still need to see larger patterns to guide your future content creation.
- multivariate tests work. As if the infinite number of potential tests were not enough of a challenge, most B2B marketers also have relatively small program quantities to work with. We multiplied our test volume by applying multivariate test designs, which let us use the same contacts in several different test cells simultaneously. This probably needs a post of its own, but here's a quick example: Let’s say you need 10,000 names per test cell and have 20,000 names total. Traditionally, you could just run one test comparing two choices. But with a multivariate design, you’d create four cells of 5,000 each. Cells 1 and 2 would get the first version of the first test, while cells 3 and 4 would get the second version. But – and here’s the magic – cells 1 and 3 would also get the first version of the second test, while cells 2 and 4 would get the second version of the second test. Thus, each test gets the required 10,000 names, but you can still see the impact of each test separately. (Here’s a random article that seems to do a good job of explaining this more fully.). We generally limited ourselves to two or three tests at a time. More complicated structures are possible but I was always concerned about keeping execution relatively simple since we were doing all our splitting manually.
- metrics matter. As it happens, most of the programs we executed rely heavily on form submissions to move people to the next stage. This meant that form fills were the key success metric, not opens or click-throughs. Although these generally correlate with each other, the relationship is weaker than you might expect. Some exceptions were due to obvious factors such as differences in form length, but the reasons for others were unknown. (I often suspected but could never prove reporting or data capture issues.) Of course, most email marketers are used to looking at open and click rates, so it took some gentle reminding to keep everyone focused on the form fill statistics. The good news is we prevented some pretty serious mistakes by using the right measure. Note that form fills are especially important in acquisition programs responders are lost altogether if don't complete a form that let you add them to your database.
- test results need selling. As you’ve probably guessed by now, I spent much of time lovingly crafting our tests and analyzing the results. But others were not so engaged: more than once, I was asked what we found in a test whose results I had published weeks before. This wasn’t a complete surprise, since other people had many other items on their mind. But we did eventually conclude that simply publishing the results was not enough, and started to go through the results in person during weekly and monthly status meetings. We also found that reviewing individual results was not enough; when we found larger patterns worth reporting, we had to present them explicitly as well. Again, there’s no surprise in this, but it does bear directly on expectations that managers will find important data if reporting systems simply make it available. Most will not: the systems have to go beyond reporting to highlight what’s new, what it means, why it matters, and what to do next. Although some parts of that analysis can be automated, most of it still relies on skilled human effort.
- reports need context. Reporting was another of my responsibilities, and we made great strides in delivering clearer and more actionable data to our clients. One of the things I already knew but was reminded really matters was the importance of putting data in context. It wasn’t enough just to show cumulative quantities or conversion statistics; we needed to compare this data with previous results, targets, and other programs to give a sense of what it meant. To take one example, we reported the winner of a series of email package tests, without realizing until late in the analysis that the response rate for the test as a whole was much lower than previous results. This was a more important issue that the tests themselves. We had other instances where entire waves were missing from reports; we only uncovered this because someone noticed they were missing – whereas, a proper comparison against plan would have highlighted it automatically. Again, such comparisons are widely acknowledged as a best practice: my point here is they have immediate practical value, so they shouldn't just be relegated to the list of “nice but not necessary” things that no one ever quite gets around to doing.
- survival is more important than conversion. That phrase has a vaguely religious ring to it, and I suppose it’s also true in a theological sense. But right now I’m talking about reporting of survival rates (how many people who enter a nurture program actually end up as customers) vs. conversion rates (how many people move from one program stage to the next). Marketers tend to focus on conversion rates, and of course it’s true that the survival rate is mathematically the product of the individual conversion rates. But we repeatedly saw changes in program structure or even individual treatments that caused large swings in a single conversion rate, which was often balanced by opposite changes in the following stage. Looking at conversion rates in isolation, it was hard to see those patterns. This was an even bigger problem when each rates was calculated cumulatively, so the impact of a specific change was masked by being merged into a larger average. More important, even when there was an obviously related change in two successive rates, the net combined impact wasn’t self-evident. This is where survival rates come in, since they directly report the cumulative result of all preceding stages. Of course, conversion rates and survival rates are both useful: I'm arguing you need to report them both, not just conversion rates alone.
- throughput matters. Survival and conversion rates show the shape of the funnel, but not the dimension of time. We did report how long it took contacts to move through our programs – in fact, a sophisticated and detailed approach was in place before I arrived – but the information was largely ignored. That was a pity, because it contained some important insights about contact behaviors, opportunities for improvement, and results of particular tests. A greater focus on comparing expected vs. actual results would have helped, since calculating the expectations would have probably required a closer focus on how long it took leads to move through the funnel.
- acceleration is hard. A greater focus on timing would have also forced a harder look at the fundamental premise of many B2B campaigns, which is that they can speed movement of prospects through the sales funnel. The more I think about this, the more doubts I have: B2B purchases move according to their own internal rhythms, driven by things like budget cycles, contract expirations, and management changes. Nurture programs can educate potential buyers and build a favorable attitude towards the seller, thereby increasing the likelihood of making a sale once the buyer is ready. They can also track, through lead scoring, when a buyer seems ready to act and is thus ripe for contact by sales. That’s all good and valuable and should more than justify the nurture program’s existence. But expectations of acceleration are dangerous because they may not be met, and could unfairly make a successful program look like a failure.
- drip needs attention. Like that leaky faucet you never quite get around to fixing, drip programs often don't get the attention they deserve. In practice, the vast majority of people who enter a nurture program will not move quickly to the purchase stage; most will stall somewhere along the way. This is where the drip program must work hard to keep them engaged. Again, every marketer knows this, but it’s easy to focus attention on the fascinating and complicated stage progressions (remember all that content?) and relegate the drip campaigns to a simple newsletter. Big mistake. Put as much effort into segmenting your drip communications and encouraging response as you put into stage conversions. If you want a practical reason for this, look at your mail quantities: chances are, you’re actually sending more drip emails than all your active stages combined.
- proving value is the ultimate challenge. It’s relatively easy to track contacts as they move through the marketing funnel, but it’s much harder to connect them to actual revenue in the sales or accounting systems. I whined about this at length in June, so I won’t repeat the discussion. Suffice it to say that some sort of revenue measurement, however imperfect, is necessary for your testing, reporting, and program execution to be complete.
Whew, it’s good to have all that out of my system. As I said at the beginning, I did enjoy my little visit to the marketing trenches. Now, it’s goodbye to that world and hello to what’s next.