This is, once again, a somewhat technical post. It highlights some of the shortcomings of focussing on gender pay gaps as a means of identifying biases in pay. It also raises the question: what is a practically important gap? I do not have a good answer to that question. It would be nice to see some discussion of what makes a pay gap important for practical purposes. A related question is: how much effort should employers put into addressing a pay gap that is in some sense ‘large’ but that has a high probability of being due to chance (i.e. is not statistically significant), bearing in mind that resources that are used for one activity are not available for other activities? Again, there is probably not a unique answer to this question but it would be nice to see it discussed.
The principle of equal pay is equal pay for:
• Equal work – work that is the same or broadly similar.
• Equivalent work – work that has been rated as equivalent by a job evaluation scheme.
• Work of equal value – work that places similar demands on those performing it.
As noted by the Equality and Human Rights Commission (EHRC) the overall gender pay gap is not an indicator of unequal pay. They suggest that it should be seen as an ‘equal opportunity gap’. The analysis in my earlier post suggests that it is not a particularly useful measure of that either. Nevertheless, it is popular.
The tool to check for equal pay is an equal pay audit. Equal pay audits should be carried out for ethnicity and disability as well as gender but gender is often easier since the quality of the data is better. Basically the steps in an equal pay audit are:
1. Determine which people are doing equal work, equivalent work or work of equal value. An organisation that has a job evaluation scheme may assume that everyone with a job evaluated to be in the same grade is doing equivalent work or they may split people on the same grade into groups doing similar work. The latter procedure is preferable if there are enough employees since aggregating over all the employees in a grade could mask problems that were specific to one group.
2. Assess whether men and women are equally paid. There are two situations that might apply: there is systematic bias against women (or men) or a few individual women (or men) are disadvantaged, for example, the way in which starting salary is determined may disadvantage people returning from a career break which tends to disadvantage some, but not all, women. Although carrying out an equal pay audit may uncover instances of the latter it is more usual to concentrate on the former, though both are illegal. The EHRC suggests that the first step is to calculate the average basic pay for men and women and the average total pay for men and women. The EHRC recommends that if the difference between the average pay for men and the average pay for women is greater than 5% or if the difference between the average pay for men and the average pay for women is greater than 3% and there is a pattern of gaps favouring one sex over the other then further investigation is required.
Most scientists asked to determine whether the difference of two means indicated the presence of systematic bias would probably start by trying a t-test. There are some problems with this approach, especially if the means are calculated from a small number of employees, because t-tests are based on the assumption that the estimate of the mean has a normal distribution. Tests of statistical significance, calculate the probability of observing a difference of at least the size you did observe on the assumption that there is no difference. The larger the value of this probability the more likely it is that the observed effect could have arisen just by chance even if, in fact, there is no difference. It is not necessary to adopt the convention that a result is significant if the probability of getting a result at least that big is less than 0.05 and not otherwise. Depending on the cost it might be reasonable to take action even if the probability that the observed difference is due to chance is 0.3, or even more, depending on the circumstances. Note that observing an effect that is not statistically significant does not imply that there is no effect. It implies that there is not enough data to say whether or not there is an effect.
As noted by the EHRC, statistical significance should not be confused with effect size (see Technical Note 3.5). A large difference can fail to be statistically significant if there are a small number of employees, or a small number of employees of one sex. A small difference can be statistically significant if there are a large number of employees. So, for example, an institution might be more worried about a 15% gender pay gap that had a 10% probability of being exceeded due to chance than about a 1% gender pay gap that had a 4% probability of being exceeded due to chance. In the latter case the institution is saying that they are willing to accept a low probability that the observed result is due to chance since they believe the bias to be too small to be of practical importance. What constitutes a large difference? What is practical importance?
The EHRC criteria are that if, on average, one sex earns more than 5% more than the other for doing particular equivalent jobs, or more than 3% more if there is a pattern, then there is a problem that needs investigating. These criteria could lead to anomalies depending on how pay is determined. Organizations that rely on negotiation by individuals to set pay or on pay schemes with a significant component dependent on performance evaluation are open to inadvertent discrimination leading to systematic discrepancies that would be evident in gender pay gaps. Other organizations use a system in which the person identifying the need for a position writes a job specification that is used by a professional job evaluator to assign a grade to the job with the person appointed to the position being assigned to a point within the grade on the basis of their qualifications and experience and then progressing by automatic annual increments to a point at which he or she needs to apply for promotion to discretionary or contribution points of the grade. Under this system there is much less scope for discrimination. Possible ways in which bias can occur are:
1. There could be a tendency for women to be appointed at a lower point in the grade.
2. Women could be less likely to apply for promotion to discretionary or contribution points.
3. The job evaluation scheme could result in jobs predominantly done by men being graded higher than jobs predominantly done by women.
4. Women could be less likely to receive, or receive lower amounts of, additional payments such as allowances, payment for additional responsibilities, recruitment incentives or market supplements.
5. There could be differences in the contractual hours of different occupational groups in jobs evaluated to be at the same grade.
6. There could be differences in pension entitlements or retirement ages between different groups with different representations of men and women.
Two factors which could influence gender pay gaps but which are not equal pay issues are:
1. Women might be more likely to leave giving a greater proportion of women on lower points in the grade or men might be more likely to leave, for example, for higher graded positions, leaving proportionately more women at the top of the grade.
2. Women may have been entering jobs at this level in increasing numbers in recent years leading to a clustering of women at lower points in the grade.
We thus have factors which are equal pay related that will be reflected in gender pay gaps, such as lower starting salaries, factors which are equal pay related that will not be reflected in equal pay gaps, such as biases in the job evaluation scheme, and factors that are not equal pay related but will affect the gender pay gap.
Figure 1 shows a distribution of women on a nine point scale, perhaps arising as a combination of women ending up with lower starting salaries, being less likely to apply for promotion and men leaving for better paid positions. The trend line has a slope of -0.0425, so the proportion of women falls by about 4 percentage points per scale point. If there are the same number of people on each salary point and each salary point is 2.5% higher than the one below then this distribution leads to a gender pay gap of 2.8% and no investigation is required. If, however, each salary point is 5% higher than the one below the gender pay gap is 5.7% and further investigation is required although the underlying biases that led to this situation would be the same in both cases. Both gaps are statistically significant if there are twenty or more people on each salary point.
Figure 2 shows a similar distribution on a fourteen point scale. The trend line has a slope of -0.0312. In this case the gender pay gap is 5.1% when the increment from one scale point to the next is 2.5%. This gap is highly statistically significant if there are at least twenty people on each scale point. Note that if this grade was split into two grades of seven scale points the gender pay gaps would be 1.6% for each grade though the men and women would be being paid the same salaries as before.
These examples show that the same underlying biases can give rise to gaps that may or may not be regarded as practically important depending on the particular salary structure.
As another example, suppose you have 100 men doing a particular job at a particular grade with an average salary of £25,000 and 100 women doing the same job at the same grade who would have the same average salary except that a policy of taking existing salary into account when determining the starting point in the grade has led to twenty women who returned from a career break being paid 6% less than similarly qualified men or women who had not taken a career break. The average salary for all the women is £24717, a gap of 1.2%. This is not likely to be statistically significant (on a ten point scale with an average of twenty people per scale point with a 3% increment there is a 15% chance of men’s average pay exceeding that of women by at least 1.2%) and nor is it large, though the twenty affected women might disagree.
This example shows that relying on the gender pay gap to identify anomalies could result in failing to detect substantial biases.
This still leaves the question: what is a practically important gap? There does not seem to be a good answer to this question. Is it acceptable for women to paid one scale point less than comparable men as long as the gap between scale points is less than 3% but not acceptable if the gap between scale points is greater than 3%? Is exactly the same bias acceptable if it occurs over two grades with a small number of steps but not if it occurs over one grade with a larger number of steps? Is it acceptable for 50% of women to be paid one scale point less than comparable men but not for 100% of women? What are the criteria for practical importance? Practically important to whom? The women earning 6% less than they might have been? The employer who might face an equal pay claim?
Does this mean carrying out an equal pay audit is a waste of time? No, it does not. An organization that measures its gender pay gaps for groups identified as doing the same or equivalent work or doing work of equal value is more likely to identify anomalies than one that does not. What it does mean is that organizations should examine their pay schemes and identify how anomalies could occur, for example, that people returning from career breaks tended to be placed on lower starting salaries thus tending to disadvantage women, and monitor those points directly regardless of whether or not they have observed substantial gaps.
An effective equal pay audit would:
• Describe the way the institution sets pay or, at the very least, refer to another document that does so.
• Identify the processes where bias could occur, e.g. setting starting salaries.
• Monitor those processes.
It could be that in a system where individuals negotiate their own pay or individuals’ line managers have a large say in setting performance pay that the best way of monitoring is to measure the gender pay gap. In institutions with job evaluation schemes and set pay scales it would be better to monitor starting salaries and progression directly to avoid potential bias being masked by other factors that affect the pay gap.
Processes that could introduce bias include the job evaluation scheme, if one is in place. The EHRC has pertinent advice on how to check that a job evaluation scheme does not itself inadvertently discriminate against women.
Has your institution carried out an equal pay audit? Does it meet the above criteria? Do you think it should? Is there anything you can do about it if it doesn’t?
Resplandy et al. correction and response
3 hours ago