By Omri Goldberg, Data Scientist — Liquidity Capital
Average Revenue Per Account (ARPA) is one of the fundamental metrics of any recurring revneue buisenss. The basic notion is that if one has a good grasp on customer acquisition retention and gross margin, he has a variable business model.
Following this logic, one would assume that an increase in ARPA suggests a strong indicator of success, but does an increase always depict a positive trend upward, or does a decrease convey a decline in what customers are paying you.
Real life experience shows us the answer to this question isn’t always straightforward. If a data observation is not driving a clear view it comes down to what “the averages” means, and when we can use them effectively. In conclusion for humans, it’s going to be near to impossible to train a machine to make a call. It really comes down to averages and when we should use them as features in our models. Have you heard about the statistician who drowned crossing a river that was three feet deep on average ?
Revenue per customer is characterized, in many cases, by a non-symmetrical distribution. The largest portion of customers might have a lower contribution to the total revenue and a few big customers contribute most of the revenue. In some cases, the difference between the absolute number of customers or revenues from different customers-pools can differ by an order of magnitute. The skewed customer distribution cannot be described by a simple central metric like average or median. So what is the correct way to describe the distribution? More importantly, how would we know if things are getting better or worse ?
Simple metrics like averages and medians are prone to instability due to changes in the revenue per customer distribution caused by a handful of accounts, or alternatively, by many customers with negligible revenue contribution. These effects do not necessarily represent trends and may point to wrong conclusions. Put in layman terms, we have a few customers with unique deals skewing our average, and possibly giving us the wrong assumptions.
To alleviate, we suggest a new central-distribution metric that accounts for both changes per customer and per revenue by using a blended method of checks and balances between the two extremes. Singular cases are toned down by clustering of accounts into bins and assigning them labels, minimizing the skewing effect of extreme values.
Another important advantage of the method is that it weighs the skewness of the revenue trend, allowing a quick and intuitive interpretation of trends between periods. For example, a negative trend can be explained by an addition of many small customers, by churn of one big client or by a consistent shift of clients from high payment bin to a lower one.
The role of the trend skewness in the final estimation is to distinguish between the latter case, considered as a “real” shift in customer behavior, and the first two.
The need to define the “typical account” by a single number is an impossible challenge. However, by carefully applying a set of weights and restrictions (without throwing the baby out with the bathwater) we were able to extract meaningful information about trends in the customers population.
Trend in revenue-per-customer during 12-months. The upper-left figure exhibits the trends of the median, mean and binned average, where the binned average exhibits the smoothest trend which is least affected by extreme values. The upper right panel shows that the number of distinct paying customers quadrupled. The lower panel shows the evolution of revenue distribution, indicating that the decrease in average revenue is originated from addition of new small customers, and not by price downgrade. The extreme skewness towards small customers should eliminates the negative binned average and result in a flat trend-score that does not penalize the expansion in revenue sources.