Skip to content

How a structured e-commerce test plan leads to rapid and steady gains

If you were CEO of, Jeff Bezos, how would you structure your testing and experimentation process to spur growth?

Let's look at what Bezos says about experimentation (emphasis mine):

"One area where I think we are particularly distinctive is failure.I think we are the best place in the world to fail (we have a lot of practice!), And l & # 39; failure and invention are inseparable twins. To invent, you must experiment, and if you know in advance that it will work, it is not an experiment. Most large organizations embrace the idea of ​​invention, but do not want to suffer the series of failed experiments necessary to achieve it.

Oversized returns often come from betting against conventional wisdom, and conventional wisdom is usually right. With a 10% chance of winning 100 times, you should take this bet every time. But you will always be wrong nine times out of ten. We all know that if you run for the fences, you will hit a lot, but you will also hit some circuits. The difference between baseball and business, however, is that baseball has a truncated distribution of results. When you swing, no matter how you connect with the ball, the most points you can get is four. In business, from time to time, when you go up on the plate, you can score 1,000 points. This long-term distribution of returns is the reason why it is important to be daring. The big winners pay for so many experiences. "

As CEO of, if it is the world's first, certainly the largest and most successful e-commerce (which is now involved in industries far beyond from retail), Bezos advocates a test culture in any e-commerce environment.

In this article, we'll see how you can structure your internal e-commerce CRO program and create a test plan that evolves with your organization.

You could not be Amazon … but why not swing for fences?

Plan to fail (and learn from it)

The conversion rate optimization process, or CRO, aims to make e-commerce businesses more profitable by increasing the proportion of buyers compared to the total number of visitors.

A structured process – encompassing research and the creation of hypotheses, the test itself and the prioritization and documentation of these tests – is crucial for creating a test culture that produces long-term, sustainable results .

In most of these stages, the need for a plan is obvious. But most people do not plan a test phase. In fact, tests are often considered an end in themselves.

However, testing is only the culmination of the whole process behind it. Its ultimate goal is to increase revenues.

In the same way that it is not possible to formulate and create tests without prior research, it is also not possible to run tests without planning. Going from an individual test or a test sequence to a full-scale and constantly active test is what separates a one-off CRO sprint from a thoughtful deliberate CRO program.

Guess which approach is best for establishing a test culture that allows companies to grow while absorbing their mistakes?

Making mistakes and failures as part of growth means adopting the main components of any learning process. Each experience, whether successful or unsuccessful, is a learning opportunity for you and your organization. The implementation and integration of knowledge from your tests is one of the main tasks of an effective CRO testing program.

Just a few reasons why you should structure and document your test program …

  • Testing all aspects of your website also allows you to question your previous assumptions by basing other assumptions in the data – instead of wild opinions or assumptions.
  • The experiment allows you to estimate the results of all improvements in real time, without having to wait until the end of the quarter to see an improvement (or a lack of improvement).
  • By applying a deliberate structure to the test process, you facilitate tracking, learning, and repetition.

All this makes the conversion optimization test a decisive factor for any company with growth ambitions. One of the most effective ways to prepare yourself for the success of online CRO is to establish a continuous process within your organization, with a dedicated and dedicated team.

This forces you to consider CRO not as a la carte service provided by an agency, but as an opportunity to institutionalize and embrace the CRO process. And this requires that you learn to conduct tests yourself.

Why is a test program a necessity?

Note: If you want to test one hypothesis at a time, you can go ahead and skip this section.

Why? If you run one test at a time, your test plan and program will be identical to the hypothesis prioritization list (discussed later). There is just one small problem that can bother you – the time it takes to put all your assumptions to the test.

If you choose to take the route of one test at a time, be prepared to spend time on the ride. The best scenario, if you have 25 hypotheses to test, is that you are considering two years of testing. Why would it take two years? The recommended practice is to perform each experiment for at least one month (or until the test reaches significance and / or covers a few purchase cycles) to ensure valid test results.

See also  You must be at SMX East. The registration is open!

"Meaning" is a statistical concept that allows you to conclude that the result of an experiment was actually caused by the changes made to the variation, and not by a random influence. This is the key to ensuring that the tests are truly valid and that their results are durable and reproducible.

Alex Birkett, content editor for Conversion XL, explains the concept of meaning a little deeper:

"What worries us is the representativeness of our sample.How can we do this in simple terms? Your test should last two business cycles, so it includes everything external:

– Every day of the week (and tested one week at a time because your daily traffic can vary a lot)

– Different sources of traffic (unless you want to customize the experience for a dedicated source)

– Your post on the blog and newsletter

– People who visited your site, thought about it, then returned 10 days later to buy [your product]

– Any external event that may affect the purchase (eg salary) "

The 1-month rule above applies to most websites. Those with exceptionally high traffic (up to millions of unique visits) will likely be able to achieve significant results in shorter time frames. However, to eliminate external influences, it is best to let the tests run for at least a week or two.

Suppose you have 37 different hypotheses to test. Your ideal goal is probably to create all 37 tests and drive them all at the same time, as an alternative to go through the test process one by one.

Unfortunately, this is not possible either, for a different reason. Sometimes the experiences themselves conflict with each other, limiting their usefulness or even invalidating the results of the other.

Since none of us wants to be an old man when our conversion optimization efforts materialize, we need an alternative. This is where the concept of velocity testing comes in. The velocity of the tests is an indicator of the number of tests that you perform at a given time, such as a month. This is one of the measures of the effectiveness of the test program and the higher the speed you reach, the more your program will accelerate your income. Provided, of course, you do everything right.

Here is the simplified process of creating a test program

The building blocks of your test program

The main elements that will determine the dynamics of your test program are:

  1. Volume of traffic
  2. Interdependence of tests
  3. Possibility of taking charge of the simultaneous design and implementation of several tests (operational constraint)

Let's quickly see what each of these elements means.

Traffic volume

The volume of traffic is an obvious barrier because your website traffic will not only influence the types of tests you can run, but also the number of concurrent tests and pages that will attract enough traffic to support the tests.

Traffic volume is the reason for prioritizing the tests that have the greatest projected effect. Tests with higher expected elevation have much lower requirements in terms of sample size / volume of traffic needed to achieve statistical significance.

In practice, this means that if one expects a test to result in an increase in conversions, for example greater than 25%, we will need less than one. observations to confirm this expectation only if we expected a 10% increase. This is the consequence of the use of a T-test as a statistical engine for current experiments: the greater the effect of a change is smaller, the more l & # 39; The sample must be large to eliminate all outliers and achieve statistical significance and confidence.

Interdependence of tests

The ability to do experiments simultaneously depends on the dependence of each experience on the others. What does it mean?

The basic principle is that we want to test a new page processing on the maximum number of visitors. If you set up an experiment that will filter people from the following experience you will not respect this basic premise.

If your visitors are 50% split on an initial page, which means that half of the next page is not tested, the result of the test will not be valid.

For example, you can improve your funnel. You thus create experimental treatments (variations) that will work on two different stages of the funnel. This can mean that visitors who are displayed on one page do not see each other – because the result of the experience has influenced the number of people who see the other experience that you are running.

Your sample will automatically be reduced by 50%, which means that the test will have to last twice as long as it takes to get meaning.

The simultaneous execution of experiments can cause problems of interdependence

To avoid this problem, consider the risk of interdependence before creating an experiment and running interdependent experiments separately. You can sometimes solve this problem by using multivariate (MVT) tests, but sometimes your traffic volume excludes it. In addition, too many variants in MVTs may invalidate the results of the experiment.

Operational Capacity – How many tests can you design and run actively?

In an ideal world, we would test all the hypotheses we have created as soon as the search is complete!

However, creating and running an experiment is hard work. It takes the efforts of many people to create a viable and functional test. Once the results of the research are known and you have formulated your hypothesis, the experiment will not see the day.

Doing an experiment requires preparation. At a minimum, you must:

  1. Sketch an updated visual design, which you will use to create a high-fidelity wireframe or model
  2. Creating a real design based on the model
  3. Coding design / copy modifications
  4. Conduct a quality assurance check and dry test before commissioning the test

All of this requires time and effort on the part of a team of people, and some steps can not even begin before the previous ones are over. This is your operational limitation.

You can overcome operational limitations by hiring more people or limiting the number of tests you run.

Adjusting tests for external influences

Although it would be great if each experiment was going on in a vacuum, it is not the case. Web site experiences made for conversion optimization purposes will never benefit from the controlled environment of scientific experiments – where the experimenter can keep control over all other influences besides the one which is intentionally changed.

However, we can at least account for obvious or expected test influences, such as holidays that affect our customers' shopping habits or other predictable events that may change the behavior of buyers . By taking these factors into account when developing your plan, you can adjust this and run the experiments at a time when the risk of external influence is lower.

Even more benefits to creating a test plan

Having a test plan not only makes your CRO process faster and more efficient, but it also presents a number of important additional benefits.

Let's start with the biggest benefit in the long run. A test plan structures and normalizes your approach, making it reproducible and predictable.

An active, structured test process with no expiry date essentially creates a positive feedback loop, so even when your test plan comes to an end you will feel encouraged to take on new challenges and perform more tests.

In the long run, this leads to the establishment of a culture of authentic testing within your organization.

A structured process also helps to better evaluate the results. At the end of each phase, you can review the results, update your expectations for the next phase, or adjust the experiences that failed in the previous phase. Indeed, you "learn as you go".

Finally, a simple and simple test plan allows for better reporting and is a more convincing argument for optimizing conversion as an organizational necessity. If you are able to report progress in monthly installments, with results clearly attributed to the experiments (which were built on assumptions, which were derived from the research), you are much more likely to be able to do so. Get organizational support for your CRO program.

A test plan creates clear milestones and allows the research team to accurately track progress, plan future activities, and eliminate potential bottlenecks in deployment and the implementation of the experiments. In this way, the possibility that the test process can get out of control is completely avoided, and the role of each member of the team is clear.

How to structure your test plan

We have just explored why you need to establish a test plan before the test itself. Let's call this step zero, if you want. Now let's talk about the workings of creating this plan.

First, determine the type of test (A / B, MVT or bandit test) that you will perform. The type of test determines how much traffic you need, as well as the development efforts needed to deploy experiments.

Next, you must carefully estimate the interdependence of your tests and make adjustments to your priority list if tests confront each other.

Finally, to determine the number of experiments that you can perform, estimate the number of people that you can effectively support with the available staff. Bear in mind that you need to ask researchers to formulate hypotheses, designers, and front-end developers to create variants and configure the experiment itself. Since each of these groups will have a number of tasks to perform, you need to make sure that you only run as many tests as your staff can handle.

For that, start by going through your list of hypotheses. If you prioritize testing accurately based on the effort required to deploy it, you will already have many entries for your test plan.

In the end, your test plan should take the form of Gantt charts, which are very useful for indicating the period of time for each test phase.

A test program is generally presented in the form of a Gantt chart

A "test phase" contains all the tests that can be run simultaneously. For example, if you discover that you can run four tests simultaneously and that you have 22 tests to run based on your assumptions, you will have 5 test phases.

Your test plan should also list all the proposed tests and provide the following concise information for each:

  • Related Hypothesis (the "why" of the test)
  • Sample size required
  • Expected effect
  • Who will be the subject (target or public segment)
  • Where will it go (URL of the page)
  • When (the period during which he will run)
  • Approximate description of the changes (the "what" of the test)
  • How to Measure Success (What Metrics Experience Should Improve / Affect to Be Considered a Success)

If you structure your test plan in this way, you will maximize the speed of your test and maximize the efficiency of your optimization program.

How to prioritize and assign test tasks

Once you have created and structured a plan, the only ingredient left for success is really the process.

Obviously, both to get the most income possible and to create initial confidence, the first tests you perform must be the ones you hope to have the most effect. Select hypotheses that are of great importance (for example, problems that affect the movement of your users through the funnel); that you are the most confident will work; and that require the least effort to be implemented.

You can choose a hierarchy model to apply to the hypotheses during the search process. Apply the template correctly and if your estimates are correct, you will almost certainly get the results you are looking for.

For every experiment to succeed, you need to translate hypothetical solutions into web page designs as accurate as possible.

When you have a mental picture of the variation you want to test, translate it into a visual image with the help of a mesh or a model. Give this to your designers, who can turn it into an actual web page.

While visual design is being prepared, your front-end developers need to check if additional coding will be needed to implement the variation.

The most important part of implementing an experiment is to make sure that it is set up without technical problems. To do this, do quality assurance protocols and checks as part of your testing program.

Once a given stage of the development cycle of the experiment is completed, the staff involved in this step can immediately start working on the next experiment. Having a plan allows them to advance further without delay, and adds to the efficiency of your conversion optimization effort.

Establishment of a culture of experimentation

Building a test culture is the main goal of a structured CRO process. A test culture requires that the business goes from a risk-averse mindset and slow decision-making to a faster and riskier approach. This is possible because testing allows you to make decisions based on measurable and known quantities – reducing your risk.

In-depth research is a prerequisite for successful A / B testing (hopefully already understood by a majority of people involved in testing)! Suffice it to say that the role of research is highly publicized, and there are a number of articles about it.

We will also assume that you now know how to formulate a hypothesis from this research. The process of creating a hypothesis is just as important to the ultimate success of your CRO effort as performing the tests themselves. Only strong and correctly supervised hypotheses will result in conclusive A / B tests.

In a structured CRO effort, no element should be left to chance. Extend the same meticulous treatment to actual tests as those you perform for research and creation of hypotheses. Once you have correctly prioritized your assumptions by the effort each will take, their importance and the expected effect, you must prepare your tests with the same foresight.

The way you approach the setup of your test program will have a huge impact on your end results. The goal of any good test program is to reach the maximum test speed and see significant test results in the shortest time possible.

About the author: Edin Šabanović is a senior CRO consultant working for Objeqt. It helps e-commerce stores to improve their conversion rates through analysis, scientific research and A / B testing. Edin is passionate about analyzing and optimizing the rate of conversion, but to have fun, he likes to read history books. It can help you grow your e-commerce business with Objeqt's personalized and data-driven CRO methodology. Get in touch if you want someone to take care of your CRO efforts.