Monday, February 21, 2011

More on Quantitative vs Qualitative Evidence


One perceived disadvantage of qualitative studies is that analyzing the data, for example, responses to interviews, is inevitably done within some particular  interpretative framework. In fact, quantitative data is also interpreted within some framework but it is less usual to state what it is.

For example, the Asset 2010 survey of academic staff in SET departments found that 17% of women compared with 14% of men had no provision for appraisals, 9% of women compared with 7% of men could have an appraisal on request, and 74% of women compared with 80% of men had appraisals as a matter of routine. These differences are statistically significant.

There are a number of ways of interpreting this:

  1. There are departments which discriminate against women by providing men with appraisals but not women.
  2. Women are not uniformly distributed across disciplines. Perhaps some disciplines are more likely to operate appraisal schemes than others.
  3. Perhaps men in departments with routine appraisals were more likely to have been encouraged to complete the survey.
  4. The appraisal process itself discriminates against women so departments with routine appraisals are less likely to attract and retain women.
  5. If men are more likely to receive routine appraisals, it is they who are discriminated against since they have to spend time filling in appraisal forms while their female colleagues get on with their jobs. (Presumably the slightly over a quarter of academic staff who reported that they did not find their appraisal to be either useful or valuable might concur with this interpretation.)

My point is not that any of these interpretations is particularly likely. It is that the same quantitative data can be viewed in different ways depending on your perspective. Those who are keen to establish that women are hard done by might incline to interpretation 1. Those who believe that universities are meritocracies and hence fair to all might incline to interpretation 2 or interpretation 3. Those with a less than positive experience of appraisal might incline to interpretation 4 or interpretation 5.

What then should the response be to findings such as these? The standard scientific response of wanting to eliminate incorrect explanations in order to isolate a single best explanation has the disadvantage that it could mean that action to correct an unfair situation is delayed, possibly for years. The other problem with this response is that the holders of strong views, wherever they lie on the spectrum between 'an appraisal is a good thing and everyone should have one whether they want it or not' and 'appraisals are a bureaucratic waste of time foisted on us by HR', rarely base their beliefs on rational analysis of evidence. Consequently finding more evidence is unlikely to change their minds. The standard top-down response of asserting one interpretation to be correct and labelling anyone who disagrees as 'obstructionist' or 'a dinosaur' has the disadvantage of being ineffective. Academics either ignore top-down initiatives or find ways of getting around them. My own view is that, at around a quarter, the proportion of academic staff finding their appraisals to be neither useful nor valuable is unacceptable high. The course of action I would favour is:
  •  Explicitly recognise that people have different beliefs and experiences. Expecting someone whose experience of appraisal has been negative to be enthusiastic about a new appraisal policy is counter-productive.
  •  Articulate what  a departmental staff appraisal policy is supposed to achieve
  •  Ask staff for suggestions for how best to achieve it
  •  Give staff feedback on their suggestions. (Too often, people make suggestions only for them to apparently disappear. For example, a suggestion may be perceived as being too resource intensive to implement. If the person making the suggestion is given this feedback they may well be able to think of ways of achieving the same result more efficiently.)
  •  Formulate a policy that is grounded in reality and recognises the constraints on people's time. (In an ideal world everyone would attend training courses in how to appraise/be appraised. In real life they do not, unless they are compelled to do so, in which case they turn up, resent being there and don't learn anything.)
  •  Monitor your procedures, not just by ticking off whether everyone has completed an appraisal but by seeking feedback on whether the procedures are achieving the desired results.

Disclaimer: My own experience of appraisal has been positive, though personally I would not rate appraisal as having been particularly useful to my career development, which may have more to do with my moves between New Zealand and the UK than the process itself.

Tuesday, February 15, 2011

When does evidence become evidence?


There are four broad categories of evidence that can be used to inform actions to improve the position of women in science.

The first category is institutional and national statistics, for example, what proportion of undergraduates, graduate students, post-docs, staff by grade are women? Such statistics are essential. You cannot even identify that you have a problem without them, or, indeed, that you have not. There are a number of difficulties with these data:
  • The categories that were used to present the data may not be helpful. For example, physics and chemistry are often combined as 'physical sciences' although the participation rate of women in physics is much lower than that in chemistry.
  • Aggregating data may obscure issues specific to a particular area or unit but reporting data at too fine a level makes it difficult to distinguish effect from random fluctuation. For example, I would be surprised if many individual departments (units of 20-50 academics) could make meaningful comparisons between the rate at which men are promoted and the rate at which women are promoted. Observations made with a sample size of three are effectively anecdotes, even if they are converted to a percentage and plotted on a graph.
  • Snapshot data can be difficult to interpret. As Gillian Gehring put it in a comment on the interaction between gender, level of qualification and pay in Physics World in September 2001: “We need to think like an astronomer here: the women of 50+ graduated from an “earlier universe”.” (If you are not an astronomer the reference is to the fact that, when we observe an object in the universe now, the further away it is the longer ago that the light we now see was emitted.) Women who are now 50+  were 20+ thirty years ago. Things were different then.
  • Even if you can demonstrate that there is a problem, the statistics by themselves throw no light on what has caused it. One approach is to try varying some practice while hoping that other factors remain the same and monitoring the statistics to see if they change. The disadvantages of this approach are that other factors do not normally remain the same, it can be difficult to ensure a uniform change in practice, especially in universities, and it can take an unfeasibly long time to have any confidence that you are observing a genuine change rather than a random fluctuation.

The second category is surveys. Surveys can be a very useful way of assessing how many women are affected by a particular issue. With the advent of tools like Survey Monkey they are technically very easy to set up though it still takes some effort to write good questions. Results can often be presented in numerical form and analyzed with conventional statistical tools, which is an advantage for those who are uncomfortable with qualitative data. The disadvantages are:
  • Sample sizes may be small making it difficult to achieve statistical significance.
  • It is difficult to assess how representative the sample is of the overall population. Respondents to surveys are often atypical at least in the respect that they have bothered to complete it.
  • You only get responses to the questions that you thought of when you designed the survey. If someone raises a new issue in a free text response you have no way of knowing how many others might have agreed, unless you run a follow-up survey.

The third category is existing research. Usually you don't start a scientific project from scratch. You review the literature to see what is already known. Adopting a similar approach to tackling issues for women in STEM would be both more efficient and more effective than continually starting from scratch. Furthermore, some issues can only be identified via research projects. For example, the evidence for unconscious bias largely comes from published research studies. Research projects involving multi-variate analysis of reams of data can offer useful information about what issues affect the recruitment and retention of women but require considerable time and resources. The principal disadvantage with trying to find out what is already known is lack of time. Also, much experience is recorded in non-peer-reviewed reports published by organizations, for example the reports produced by the Royal Society of Chemistry, and can be hard to discover. We need more books like Virginia Valian's 'Why so Slow? The advancement of women' that pull the research together and present it in an accessible way.

Finally, there are qualitative methods such as interviews and focus groups. Natural scientists tend to be uncomfortable with qualitative data dismissing it as anecdotal. This is not surprising. You can't run a focus group for electrons to gather their experiences of being accelerated by an electromagnetic field or interview proteins about how they fold (though think how many person-hours of effort would be saved if you could). However, women are intelligent, articulate human beings. You can save a lot of time and effort by simply asking them what is important to them. All too often women's lived experiences are dismissed as irrelevant. There is an unfortunate connotation to ignoring qualitative data. Doing so sends the message: ‘Women are not capable of understanding or articulating their concerns. We need our armoury of experimental tools and statistical analysis to determine what is best for them.’

If the purpose of gathering the data is to identify the factors that affect women's progress in a particular institution and find ways of ameliorating their effects we do not need to apply the same criteria of rejection/provisional acceptance that we would apply if we were seeking a general predictive theory of why women are not thriving as well as might have been expected within the institution, assuming such a theory to exist, which I doubt. Decisions have to be made using the best information available now, not put off until sufficient data have been accumulated for a result to be considered statistically significant at some conventional level of significance. That does not mean we should ignore statistical significance. Devoting resources to fixing a random fluctuation is a waste of time and effort, though, of course, likely to appear to be successful.

Quantitative and qualitative methods are complementary. Quantitative methods are good for demonstrating that there is a problem: qualitative methods are good for generating insights into what might be causing the problem. Decisions should be made on the basis of all the available data.