University of Illinois
Purpose – This paper tests whether Bayesian A/B testing yields better decisions that traditional Neyman-Pearson hypothesis testing. It proposes a model and tests it using a large, multiyear Google Analytics (GA) dataset. Design/methodology/approach – This paper is an empirical study. Competing A/B testing models were used to analyze a large, multiyear dataset of GA dataset for a firm that relies entirely on their website and online transactions for customer engagement and sales. Findings – Bayesian A/B tests of the data not only yielded a clear delineation of the timing and impact of the intellectual property fraud, but calculated the loss of sales dollars, traffic and time on the firm’s website, with precise confidence limits. Frequentist A/B testing identified fraud in bounce rate at 5% significance, and bounces at 10% significance, but was unable to ascertain fraud at the standard significance cutoffs for scientific studies. Research limitations/implications – None within the scope of the research plan. Practical implications – Bayesian A/B tests of the data not only yielded a clear delineation of the timing and impact of the IP fraud, but calculated the loss of sales dollars, traffic and time on the firm’s website, with precise confidence limits. Social implications – Bayesian A/B testing can derive economically meaningful statistics, whereas frequentist A/B testing only provide p-value’s whose meaning may be hard to grasp, and where misuse is widespread and has been a major topic in metascience. While misuse of p-values in scholarly articles may simply be grist for academic debate, the uncertainty surrounding the meaning of p-values in business analytics actually can cost firms money. Originality/value – There is very little empirical research in e-commerce that uses Bayesian A/B testing. Almost all corporate testing is done via frequentist Neyman-Pearson methods.
James Christopher Westland