Summary Statistics PART I
We have already learned in this article .It is difficult to learn anything from raw data unless the data is arranged in proper manner. When the data have been arranged into a frequency distribution, the information contained in the data could be easily understood. Now we will move a step ahead and find a single value, which will represent all the values of the distribution in some particular way. Values which are being used in this way to represent the distribution is called Summary statistics .
The most commonly used Summary statistics can be classified according to their purpose to the following category;
1- LOCATION
2- SPREAD
3- SHAPE
4- DEPENDENCE
1- LOCATION
1.1 MEAN
i. Pythagorean Mean
a. Arithmetic Mean
b. Geometric Mean
c. Harmonic Mean
ii. Weighted arithmetic mean
iii. Truncated mean
iv. Interquartile mean
1.2 - Median
1.3 - Mode
Mean is one of the types of summary statistics. Mean is further divided into four kinds, which are the Pythagorean mean, weighted arithmetic mean , Truncated mean and interquartile mean. These kinds are explained as follows;
Arithmetic Mean:
The arithmetic mean is most commonly used average. It is generally referred as the average or simply mean. The arithmetic mean or simply mean is defined as the value obtained by dividing the sum of values by their number or quantity. It is read as X-bar. Therefore, the mean for the values X1, X2, X3,……….., Xn .Following is the mathematical representation for the formula for the arithmetic mean or simply, the mean.
Arithmetic Mean for Grouped Data:
The formula provided above is being used when the number of values is small. If the number of values is large, they are grouped into a frequency distribution. In case of grouped data when the data is arranged in the form of frequency distribution, all the values falling in a class are assumed to be equal to the class mark or midpoint. If the X1, X2, ……, Xk are the class marks with f1, f2, ….., fk as the corresponding class frequencies, the sum of the values in the first class would be f1X1, in the second class f2X2 and so on the sum of the values in kth class would be fkXk. Hence, the sum of the values in all the k classes would be:
For example
Geometric Mean:
The geometric mean is often skipped over in many stats classes. And I think that is very unfortunate for several reasons. One, it’s actually pretty cool and interesting, there are some insights we’re gonna learn about as we go. But also it is extremely useful, especially in business when you’re dealing with rates of return on investments, or other types of financial instruments, but it’s also useful in other disciplines like biology, medicine, agriculture, or any other discipline, where you’re dealing with growth rates over periods of time. So let’s go ahead and learn about the geometric mean. The geometric mean, G, of a set of n positive values X1, X2, ……, Xn is the nth root of the product of the values. Mathematically the formula for geometric mean will be as follows:
In practice, it is difficult to extract higher roots. The geometric mean is, therefore, computed using logarithms. Mathematically, it will be represented as follows:
Geometric Mean for Grouped Data:
When the data have been arranged into a frequency distribution, each of the original observation in a class is assumed to have a value equal to its class marks. Suppose X1, X2, ……, Xk represents the class marks in a frequency distribution with f1, f2, ….., fk as the corresponding class frequencies, where f1 + f2 + ……… + fk = ∑f = n. since X1 occurs f1 times, X2 occurs f2 times,………., Xk occurs fk times, then the formula for the geometric mean will be as:
Harmonic Mean:
The harmonic mean is a type of numerical average. It is calculated by dividing the number of observations by the reciprocal of each number in the series. Thus, the harmonic mean is the reciprocal of the arithmetic mean of the reciprocals.
The harmonic mean helps to find multiplicative or divisor relationships between fractions without worrying about common denominators. Harmonic means are often used in averaging things like rates (e.g., the average travel speed given a duration of several trips).It is the most appropriate measure for ratios and rates because it equalizes the weights of each data point. For instance, the arithmetic mean places a high weight to large data points, while geometric mean gives a lower weight to the smaller data points.
In finance, the harmonic mean is used to determine the average for financial multiples such as price-to-earnings (P/E) ratio. The financial multiples should not be averaged using the arithmetic mean because it is biased toward larger values. One of the most common problems in finance that uses the harmonic mean is the calculation of the ratio of a portfolio that consists of several securities.
Harmonic Mean for Grouped Data:
Suppose X1, X2, ……, Xk represents the class marks in a frequency distribution with f1, f2, ….., fk as the corresponding class frequencies, where f1 + f2 + ……… + fk = ∑f = n. Then the reciprocals of the class marks will be
Since, the reciprocals occur with frequencies f1 + f2 + ……… + fk, the total value of the reciprocals in the first class is f1/x1, in the second class isf2/x2, ……., in the kth class is fk/xk The formula for calculating harmonic mean for grouped data will be as follows
Weighted Arithmetic Mean:
The weighted arithmetic mean (or weighted average) is used if one wants to combine average values from samples of the same population with different sample sizes or when the values are not of equal importance, we assign them certain numerical values to express their relative importance. These numerical values are called weights. If X1, X2, ……, Xk have weights W1, W2, ……., W3, then the weighted arithmetic mean or the weighted mean, is calculated by the following formula
Truncated Mean:
A truncated mean or trimmed mean (similar to an adjusted mean) is a method of averaging that removes a small designated percentage of the largest and smallest values before calculating the mean. After removing the specified outlier observations, the trimmed mean is found using a standard arithmetic averaging formula. The use of a trimmed mean helps eliminate the influence of outliers or data points on the tails that may unfairly affect the traditional mean. A trimmed mean is stated as a mean trimmed by x%, where x is the sum of the percentage of observations removed from both the upper and lower bounds. The trimming points are often arbitrary in that they follow rules of thumb rather than some optimized method of setting those thresholds.
Interquartile Mean:
The interquartile mean (IQM) (or midmean) is a statistical measure of central tendency based on the truncated mean of the interquartile range.
How to Find the Interquartile Mean?
The calculation is different depending on if your data is divisible by 4 or not.
First: Data is Divisible by Four
Step 1: Sort the data from smallest to largest
Step 2: Discard the bottom 25% and top 25% of numbers. In other words, split the data set into quarters and remove the top and bottom quarters
Step 3: Find the mean of the remaining numbers
Second: Data is NOT Divisible by Four
Step 1: Sort the data from smallest to largest:
1 3 6 8 9 11 13 15 16 17 30 44 55 56 65
Step 2: Divide the number of items in the set by four. The set has 15 items, so 15/4 = 3.75.
Step 3: Remove the whole number (Step 2) from the bottom and the top of the set. For this example, the whole number is 3 (from 3.75):
1 3 6 8 9 11 13 15 16 17 30 44 55 56 65
which leaves:
8 9 11 13 15 16 17 30 44
Step 4: Figure out how many items are in the interquartile range. The IQR is the middle two quarters, so there would be 3.75 * 2 = 7.5 numbers.
Step 5: Place parentheses around the middle set of numbers using the whole number from Step 4. In this example, the whole number is 7:
8 (9 11 13 15 16 17 30) 44
Step 6: Take the fractional part from Step 4 (.5 in this case) and divide it by two (because there are two numbers on the outside of the parentheses):
.5/2 = .25
This means that the numbers 8 and 44 will each contribute 25% to the IQM.
Step 7: Multiply the two “outside” numbers (8 and 44 in this case) by the fraction in Step 6:
8 * .25 = 2
44 * .25 = 11
Step 8: Replace the two outside numbers by the fractional numbers (Step 7) and find the mean. When dividing by “n”, use the number of items in the IQR from Step 4 (7.5 in this case), not the actual number count (9 in this example):
82 (9 11 13 15 16 17 30) 4411 =
(2 + 9 + 11 + 13 + 15 + 16 + 17 + 30 + 11)/7.5 = 16.53.
KEY TAKEAWAYS For arithmetic mean
- The arithmetic mean (average) is the sum of a series of numbers divided by the count of that series of numbers.
- In the world of finance, the arithmetic mean is not usually an appropriate method for calculating an average.
- However, the arithmetic mean isn’t always ideal, especially when a single outlier can skew the mean by a large amount.
KEY TAKEAWAYS For Geometric mean
- The geometric mean is the average rate of return of a set of values calculated using the products of the terms.
- It is most appropriate for series that exhibit serial correlation. This is especially true for investment portfolios.
- Most returns in finance are correlated, including yields on bonds, stock returns, and market risk premiums.
- For volatile numbers, the geometric average provides a far more accurate measurement of the true return by taking into account year-over-year compounding that smooths the average.
KEY TAKEAWAYS for harmonic mean
- T he harmonic mean is the reciprocal of the arithmetic mean of the reciprocals.
- Harmonic means are used in finance to average data like price multiples.
- Harmonic means can also be used by market technicians to identify patterns such as Fibonacci sequences.
KEY TAKEAWAYS for Truncated mean:
- A trimmed mean removes a small designated percentage of the largest and smallest values before calculating the average.
- Using a trimmed mean helps eliminate the influence of outliers or data points on the tails that may unfairly affect the traditional mean.
- Trimmed means are used in reporting economic data in order to smooth the results and paint a more realistic picture.
- Providing a trimmed mean inflation rate, along with other measures, provides a basis for comparison.
Median in Statistics:
Definition
The median divides a frequency distribution into two halves. The median of set of values arranged either in ascending order or descending order of their magnitude is referred as the middle value.
Explanation
Where the number of values in a data is odd, the median shall be the middle value. And where the number of values is even, the median shall be the mean of two middle values. Once the distribution is divided into two halve by way of median then the number of values greater than the median is equal to the values smaller than the median. when the number of values is odd, the median is the middle values and when the number of values is even, the median is the mean of the two middle values present in the data.
Median for Grouped Data
In case of frequency distribution the median is the value of (n/2)th item from either end. Therefore, if we have 100 items in a frequency distribution, the median will be the value of the 50th item. In order to find the median from a frequency distribution, we need to form a separate column for cumulative frequency. The median will lie in the class which corresponds to the cumulative frequency in which (n/2) lies. The formula for median in case of frequency distribution is as follows:
Hint:
- median class is the class with the smallest cumulative frequency greater than or equal to (n/2)
Example:
Mode:
The mode is defined as the value in the data, which occur the greatest number of times in an array of data.
However, if each value occurs the same number of times then there will be no mode. There may be a case where two or more values occur the same number of times but more frequently than any other values, in such case there is more than one mode. In this perspective mode differs from median and mean because there is always single mean or median. In a distribution where there is only one mode is called uni-modal distribution. Whereas, a distribution having two modes is called bi-modal distribution and where there are more than two modes, such distribution shall be referred as multi-model distribution.
Mode for Grouped Data
In a frequency distribution, having equal class interval sizes, the class with the highest frequency is called the modal class. The formula for calculating mode for grouped data, arranged in the form of frequency distribution, is provided as follows:
Where:
lbmo: is lower class boundary of the model class i.e. the class with the highest frequency.
D1: Difference between frequency of model class and frequency of the class preceding the modal class.
D2: Difference between frequency of model class and frequency of the class following the modal class.
i: class interval size of the modal class.
Mode for Discrete Data
In case of discrete data, the mode may be found at a glance since it is the most common value i.e. the value with greatest frequency.