Tuesday, February 15, 2011

When does evidence become evidence?


There are four broad categories of evidence that can be used to inform actions to improve the position of women in science.

The first category is institutional and national statistics, for example, what proportion of undergraduates, graduate students, post-docs, staff by grade are women? Such statistics are essential. You cannot even identify that you have a problem without them, or, indeed, that you have not. There are a number of difficulties with these data:
  • The categories that were used to present the data may not be helpful. For example, physics and chemistry are often combined as 'physical sciences' although the participation rate of women in physics is much lower than that in chemistry.
  • Aggregating data may obscure issues specific to a particular area or unit but reporting data at too fine a level makes it difficult to distinguish effect from random fluctuation. For example, I would be surprised if many individual departments (units of 20-50 academics) could make meaningful comparisons between the rate at which men are promoted and the rate at which women are promoted. Observations made with a sample size of three are effectively anecdotes, even if they are converted to a percentage and plotted on a graph.
  • Snapshot data can be difficult to interpret. As Gillian Gehring put it in a comment on the interaction between gender, level of qualification and pay in Physics World in September 2001: “We need to think like an astronomer here: the women of 50+ graduated from an “earlier universe”.” (If you are not an astronomer the reference is to the fact that, when we observe an object in the universe now, the further away it is the longer ago that the light we now see was emitted.) Women who are now 50+  were 20+ thirty years ago. Things were different then.
  • Even if you can demonstrate that there is a problem, the statistics by themselves throw no light on what has caused it. One approach is to try varying some practice while hoping that other factors remain the same and monitoring the statistics to see if they change. The disadvantages of this approach are that other factors do not normally remain the same, it can be difficult to ensure a uniform change in practice, especially in universities, and it can take an unfeasibly long time to have any confidence that you are observing a genuine change rather than a random fluctuation.

The second category is surveys. Surveys can be a very useful way of assessing how many women are affected by a particular issue. With the advent of tools like Survey Monkey they are technically very easy to set up though it still takes some effort to write good questions. Results can often be presented in numerical form and analyzed with conventional statistical tools, which is an advantage for those who are uncomfortable with qualitative data. The disadvantages are:
  • Sample sizes may be small making it difficult to achieve statistical significance.
  • It is difficult to assess how representative the sample is of the overall population. Respondents to surveys are often atypical at least in the respect that they have bothered to complete it.
  • You only get responses to the questions that you thought of when you designed the survey. If someone raises a new issue in a free text response you have no way of knowing how many others might have agreed, unless you run a follow-up survey.

The third category is existing research. Usually you don't start a scientific project from scratch. You review the literature to see what is already known. Adopting a similar approach to tackling issues for women in STEM would be both more efficient and more effective than continually starting from scratch. Furthermore, some issues can only be identified via research projects. For example, the evidence for unconscious bias largely comes from published research studies. Research projects involving multi-variate analysis of reams of data can offer useful information about what issues affect the recruitment and retention of women but require considerable time and resources. The principal disadvantage with trying to find out what is already known is lack of time. Also, much experience is recorded in non-peer-reviewed reports published by organizations, for example the reports produced by the Royal Society of Chemistry, and can be hard to discover. We need more books like Virginia Valian's 'Why so Slow? The advancement of women' that pull the research together and present it in an accessible way.

Finally, there are qualitative methods such as interviews and focus groups. Natural scientists tend to be uncomfortable with qualitative data dismissing it as anecdotal. This is not surprising. You can't run a focus group for electrons to gather their experiences of being accelerated by an electromagnetic field or interview proteins about how they fold (though think how many person-hours of effort would be saved if you could). However, women are intelligent, articulate human beings. You can save a lot of time and effort by simply asking them what is important to them. All too often women's lived experiences are dismissed as irrelevant. There is an unfortunate connotation to ignoring qualitative data. Doing so sends the message: ‘Women are not capable of understanding or articulating their concerns. We need our armoury of experimental tools and statistical analysis to determine what is best for them.’

If the purpose of gathering the data is to identify the factors that affect women's progress in a particular institution and find ways of ameliorating their effects we do not need to apply the same criteria of rejection/provisional acceptance that we would apply if we were seeking a general predictive theory of why women are not thriving as well as might have been expected within the institution, assuming such a theory to exist, which I doubt. Decisions have to be made using the best information available now, not put off until sufficient data have been accumulated for a result to be considered statistically significant at some conventional level of significance. That does not mean we should ignore statistical significance. Devoting resources to fixing a random fluctuation is a waste of time and effort, though, of course, likely to appear to be successful.

Quantitative and qualitative methods are complementary. Quantitative methods are good for demonstrating that there is a problem: qualitative methods are good for generating insights into what might be causing the problem. Decisions should be made on the basis of all the available data.

No comments:

Post a Comment