Measures of Central Tendency
A measure of central tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data. As such, measures of central tendency are sometimes called measures of central location. They are also classed as summary statistics. The mean (often called the average) is most likely the measure of central tendency that you are most familiar with, but there are others, such as the median and the mode.
The mean, median and mode are all valid measures of central tendency, but under different conditions, some measures of central tendency become more appropriate to use than others. In the following sections, we will look at the mean, mode and median, and learn how to calculate them and under what conditions they are most appropriate to be used.
The mean (or average) is the most popular and well known measure of central tendency. It can be used with both discrete and continuous data, although its use is most often with continuous data (see our Types of Variable guide for data types). The mean is equal to the sum of all the values in the data set divided by the number of values in the data set. So, if we have n values in a data set and they have values x1, x2, …, xn, the sample mean, usually denoted by (pronounced x bar), is:
This formula is usually written in a slightly different manner using the Greek capitol letter, , pronounced “sigma”, which means “sum of…”:
You may have noticed that the above formula refers to the sample mean. So, why have we called it a sample mean? This is because, in statistics, samples and populations have very different meanings and these differences are very important, even if, in the case of the mean, they are calculated in the same way. To acknowledge that we are calculating the population mean and not the sample mean, we use the Greek lower case letter “mu”, denoted as µ:
An important property of the mean is that it includes every value in your data set as part of the calculation. In addition, the mean is the only measure of central tendency where the sum of the deviations of each value from the mean is always zero.
The median is the middle score for a set of data that has been arranged in order of magnitude.
The mode is the most frequent score in our data set. On a histogram it represents the highest bar in a bar chart or histogram. You can, therefore, sometimes consider the mode as being the most popular option.
Describing and Interpreting Data
The manner in which you analyze data depends on the type of data/variables that you are evaluating. There are several different classifications that are used in classifying data.
- A variable is an item of data
- Examples of variables include quantities such as: gender, investment type, test scores, and weight. The values of these quantities vary from one observation to another.
Types/Classifications of Variables
- Qualitative: Non-numerical quality
- Quantitative: Numerical
- Discrete: counts
- Continuous: measures
- This data describes the quality of something in a non-numerical format.
- Counts can be applied to qualitative data, but you cannot order or measure this type of variable. Examples are gender, marital status, geographical region of an organization, job title….
- Qualitative data is usually treated as Categorical Data.
With categorical data, the observations can be sorted according into non-overlapping categories or by characteristics.
- For example, shirts can be sorted according to color; the characteristic ‘color’ can have non-overlapping categories: white, black, red, etc. People can be sorted by gender with categories male and female.
- Categories should be chosen carefully since a bad choice can prejudice the outcome. Every value of a data set should belong to one and only one category.
- Measurement Scale
- Nominal: classifies with no ranking (e.g. color, investment type…)
- Ordinal: classifies with ranking (e.g. product satisfaction, grades…)
- Analyze qualitative data using:
- Frequency tables, Contingency tables (for 2 variables)
- Modes – most frequently occurring
- Graphs: Bar Charts, Pie Charts, Pareto Charts
- Quantitative or numerical data arise when the observations are frequencies or measurements.
- Discrete Data
- The data are said to be discrete if the measurements are integers (e.g. number of employees of a company, number of incorrect answers on a test, number of participants in a program…)
- Continuous Data
- The data are said to be continuous if the measurements can take on any value, usually within some range (e.g. weight). Age and income are continuous quantitative variables. For continuous variables, arithmetic operations such as differences and averages make sense.
- Analysis can take almost any form:
- Create groups or categories and generate frequency tables.
- Effective graphs include: Histograms, Stem-and-Leaf plots, Dot Plots, Box plots, and XY Scatter Plots (2 variables).
- All descriptive statistics can be applied.
- Measurement Scale
- Interval: ordered and difference between variables is meaningful (e.g. standardized scores…)
- Ratio: ordered and difference between variables is meaningful, true 0 in measuring
Describing Distributions Description by Enumeration
One way we can describe the distribution of a variable is by enumeration, that is, by simply listing all the values of the variable. But if the data set or distribution contains more than just a few cases, the list is going to be too complex to be understood or to be communicated effectively. Imagine trying to describe the distribution of a sample of 300 observations by listing all 300 measurements.
Description by Visual Presentation
Another alternative that is frequently used is to present the data in some visual manner, such as with a bar chart, a histogram, a frequency polygon, or a pie chart.
The data for bar charts should consist of a relatively small number of response categories in order to make the visual presentation useful. That is, the variable should consist of only a small number of classes or categories.
In histograms or bar charts, the shape of the distribution can convey a significant amount of information. This is another reason why it is desirable to conduct measurement at an ordinal or interval level, as this allows you to organize the values of a variable in some meaningful sequence. Notice that the values on the horizontal axis of the histogram are ordered from lowest to highest, in a natural sequence of increasing levels of the theoretical concept.
The shape of the distribution would convey no useful information at all. Bar charts and histograms can be used to compare the relative sizes of nominal categories, but they are more useful when the data graphed are at the ordinal or higher level of measurement.
A frequency polygon is constructed by connecting the points which have heights corresponding with the frequencies on the vertical axis. Another way of thinking of a frequency polygon is as a line which connects the midpoints of the tops of the bars in the histogram.
Pie charts are appropriate for presenting the distributions of nominal variables, since the order in which the values of the variable are introduced is immaterial. The sequence in which these shares are listed really does not matter. All we need to consider is the size of the “slice” associated with each class of the variable.
source: cios.org (http://www.cios.org/readbook/rmcs/ch08.pdf)