picture of Prof. Dr. Nick Gehrke

written by

Prof. Dr. Nick Gehrke

Part II of this series shows you how to use simple statistical considerations to determine which invoices from a vendor are exceptionally high and thus represent an statistical outlier.

 

Part 2 of the series: "Very high incoming invoice amount"

1. Where Topmanagers should keep an eye on
2. Getting serious: What are high incoming invoices?
3. Analytics of high incoming invoices
4. Two graphics for evaluating high incoming invoices

 

Outlier Detection made easy or what is a statistical outlier?

As mentioned in my last article, it is not the sorting of incoming invoices with high amounts but the consideration of statistical outliers. In order to assess what a statistical outlier is, one must look at the variation of a vendor's invoices by their mean value. The variation around the mean value is measured by the standard deviation. Now comes a formula and I hope I do not lose half the readership, as is often said when formulas come. But auditors are quite accustomed:

 

stddev.jpg

 

Where:

n is the number of invoices and
Xi is the i-th invoice of the vendor
X the mean value of the vendor's invoice amounts

 

The simple idea for a statistical outlier is that an invoice with an amount that is greater than the average plus n times the standard deviation is considered an outlier and is thus unusually high. 

How high is n?

Depending on n, you are going to classify an outlier. Luckily, the statistics is helping here too, because intuitively one can judge n hard. In order to approach the solution, we assume for the time being that the invoice amounts of the incoming invoices are normally distributed. In the normal distribution one can always deduce how the frequency of calculation amounts within an interval of mean value +/- n times the standard deviation is:

 

normal.pngσ is determined as the standard deviation of the vendor invoice amounts. If you look at the graph, you see how many of the invoices would have to lie in an interval +/- n times standard deviation:

68.2% of the invoices in the interval mean value +/- one standard deviation

95.4 % of the invoices in the interval mean value +/- two standard deviations

99,7 % of the invoices in the interval mean value +/- three standard deviations

E.g. we could choose n = 3. That means: outliers would have to be (100-99.7) / 2 = 0.15% percent of the vendor's invoices. This provides an individual measure for outliers of a specific vendor.

But watch out! Invoice amounts are often not distributed normally! Then the statements of the normal distribution with respect to n are also not valid. Finally, one must say that the probability distribution of the invoicing amounts is not always known.

Can you still specify a rule for n even though you do not know the probability distribution of the invoice amounts? So to speak, a stable rule that is "immune" against the ignorance of the probability distribution?
Yes we can!

 

The so-called inequality of Chebyshev helps us to estimate probabilities without the assumption of distribution. The inequality of Chebyshev tells us how large the probability of invoice amounts is "at most for invoice amounts larger mean + n times the standard deviation". This expresses the following formula:

tscheb.png

Where:

σ is the determined standard deviation

λ stands for the n-fold

This means the following:

n or λ

The probability that an invoice amount is above n times the standard deviation above the mean value is at most

3

11,11%

4

6,25%

5

4,00%

6

2,77%

7

2,04%

 

The right column indicates the probability that invoice amounts are above the mean plus n times standard deviation. However, this value is the highest possible probability. That is, depending on the actual (unknown) probability distribution, this probability is generally much smaller.

My recommendation:

I would take 6 times the standard deviation.

So we come to our definition, which is an unusually high invoices amount:

An incoming invoice from a vendor is unusually high if the invoice amount is above the mean value of the vendor's invoice amounts plus six times the standard deviation of the vendor's invoice amounts.

 

How to get data for analytics of high input invoices?

If you want to analyse unusually high incoming invoices in your SAP, you need the necessary data structures and data from your SAP system. You can perform a data download from your SAP system using zapliance. Downloading zapliance is free of charge.

The analytics of high incoming invoices are part of zapliance. You can download zapliance here:

 

Download

 

In the next part of this series, we will show you how these analytics can be done in Excel and prepare a template for your analytics. You have any questions or suggestions? Then leave a comment. 

Comments