Guide to Content Analysis

This page is designed primarily to assist ASELL team members in undertaking a qualitative content analysis of responses provided to open-ended survey questions.  Such an analysis could be undertaken on the feedback available following workshop testing of each experiment,  and will be required as part of the final submission for the evaluation of the student feedback collected during semester.  The principles and approach described here are applicable more generally, and so it is also hoped that this guide may provide assistance with qualitative analysis of other data.  This guide is written in an “every day” style and seeks to minimise the use of jargon, in the hope that this makes it more accesible to those unfamiliar with the literature of education.  An analysis of delegate feedback concerning the February 2006 ACELL workshop, undertaken by the ASELL Directors in the latter half of 2006, is used as an example of one approach to carrying out a content analysis.  A link to an Excel spreadsheet containing these data is provided at the bottom of this page; this spreadsheet will be discussed in the text below, so it is recommended that you download it before reading on.

The first point to keep in mind is that there is no hard and fast set of rules governing how to perform a qualitative data analysis.  The following discussion describes one possible way of proceeding, but this is not meant to mandate that this approach be followed by ASELL submitters.  The reader is directed to references such as Mason (2002), Miles and Huberman (1994), or Silverman (2005), included at the bottom of this page, for a more comprehensive discussion of possible methodological approaches.

As practicing academics, most of us will have sets of student evaluations of our teaching administered by our employing institutions.  Most commonly, these surveys contain a mixture of data from both Likert-style (usually along a “Strongly Agree” to “Strongly Disagree” scale) and open-ended questions / statements (formally known as items).  The responses to the Likert-scale items are an example of quantitative data, and are often summarised in terms of histograms and / or statistical summaries.  (This use of some statistical summaries can be controversial, as such scales are usually ordinal, but this point will not be further explored here – see Michell (1986) and Andrich (1978) for further discussion.)  Data from the open-ended items are qualitative, and are usually just given to us in raw form.  Most of us read through these comments, usually consisting of responses to questions such as, “What is the best aspect of this persons teaching?”, or “How can this persons teaching be improved?”, to gain an overall “feel” for what the students think about our teaching, but will generally not undertake any more detailed analysis of these responses.

Qualitative data analysis involves being a little more sophisticated and rigorous in extracting information about what students are saying.

The following discussion is framed as a five-step process that covers the key elements of the approach to qualitative data analysis.

Step 1

The first issue to grapple with in performing qualitative data analysis is to decide what is the purpose of the study – in other words, to consider the question of what you want as an end product.

For example, you may wish to survey your students as part of a pilot study to identify what they perceive as the important aspects of the topic in which you are interested.  The results of this pilot study might then be used to help design a more comprehensive survey instrument, allowing you to tailor the design to explore aspects that are of interest to both you and your students.  In such a case, you would likely be interested in categories of comments made by students, and their frequency, but probably much less interested in the individual comments themselves.

Alternatively, you may already have a survey instrument (such as the ASELL Directors already had for the February 2006 workshop feedback process), and are interested in learning in a broad sense, whether respondents feel positively or negatively inclined towards the issues relating to the questions posed.  Whilst potentially a crude analysis, consideration of respondents’ positive / negative perceptions can be useful in helping gain some insight into the question(s) at hand, but can also be potentially misleading.  As an example, most delegates responded “negatively” when commenting on the structure and format of the February 2006 ACELL workshop.  Without knowing more about the content of these responses, it would be possible to form a quite innacurate picture of the workshop itself.  Examination of the actual comments shows that these negative comments referred to aspects such as the quality of the student dormitory accommodation and the lack of air conditioning during the laboratory sessions (during a hot, humid period in Sydney).   Very few negative comments were made regarding the educational aspects of the ACELL workshop – in fact, a considerable number of positive comments were made in this area.  This illustrates the importance of treating simple analyses, such as whether comments are overall positive or negative, with caution, especially if such an analysis is carried out in isolation.

The positive / negative “perception” analysis lends itself easily to statistical analysis, which can be useful to compare to statistical analyses of quantitative Likert data.  Quite often this type of analysis does not require detailed data entry.  Rather, a simple counting of the number of positive and negative comments may be all that is required.  This can get a little more complicated if a single respondent offers more than one comment and that these multiple comments are both positive and negative.  If this becomes an issue, some separating of comments into several thematically distinct comments may be required.  This process is discussed in Step 2 below.

To gain a comprehensive understanding of the views of respondents to the open-ended questions asked, a detailed content analysis is often required.

A third possible goal might be the answers to specific questions asked on the survey.  For example, the ASELL student evaluation survey includes the question “What did you think was the main lesson to be learnt from the experiment?”, and is highly likely that the actual answer to this question will be of interest.  This will likely influence choices made in the steps outlined below, as it is likely that some coding will be restricted to responses to that specific item.

In the surveying of delegates about their views of the February 2006 ACELL workshop, we were not interested in the answers provided per se.  What we were interested in was to develop a broad and deep understanding of the experiences of delegates, both staff and students, to the workshop.  The items themselves were merely probes designed to elicit a range of responses that gave us as complete as possible a picture of delegates’ experiences.  As a consequence, our decisions in the steps which follow allowed coding choices which focussed on themes emerging in response to multiple items.

Step 2

Having decided what on the purpose of the study, if a detailed content analysis is required, the next step is to undertake data entry, separating comments where they seem thematically distinct.  At its simplest, if N respondents each offer one, and one only, comment to the item asked then a total of N responses are considered.  For example, consider the following response from the February 2006 ACELL workshop survey.  It is clear that only one comment, regarding the interest of academic staff in student education, is being made.

It is not uncommon for respondents to offer more than one thematically distinct comment to each item asked.  Under these circumstances more than N comments need to be considered.  Sometimes it is relatively straightforward to identify that multiple comments are provided, as in the example below where the respondent has used dot points to indicate that three distinct comments are being made.

However, it is likely that some cases will require judgment to decide how many, if any, multiple comments are being provided.  For example, consider the example below:

How many comments do you think are included in this response?  Remember, there is no right or wrong answer to this question, but it is important to try to treat all comments in a consistent way.  One might identify three distinct themes; the technical competency of physical chemists, delegate engagement at the workshop, and conduct of the “beer sessions” (debriefs).  However, as a general rule of thumb it is undesirable to separate out comments if such separation will rob them of context.  Hence, we have chosen to enter the response above as two thematically distinct comments.  If you examine the Raw Data sheet in the Excel workbook, you will see that each respondent is placed in a separate row, and that multiple comments have been allowed in response to each open-ended item.  The response above can be seen on row 42 (cells W42 and X42).

As a final example, how many distinct comments do you think are contained in the response below?

We have interpreted this response as containing only one thematically distinct comment (cell S13), and the workbook shows that this student’s response to the previous item (concerning the most valuable aspects of the workshop) had three distinct comments (cells O13, P13, and Q13).

Step 3

Having completed the data entry, for example, by transcribing all comments into a spreadsheet or other database, the next step is to code each comment.  That is, each individual comment entered above now needs to be categorised.  Based upon your work already in Steps 1 and 2 you will likely have some idea of what categories are present, and it is likely that some categories could have been predicted in advance.  However, as you read through the comments it is common for other categories raised by the respondents and not anticipated by you to become apparent.

The presentation in the Raw Data sheet is convenient for data entry, especially if you have a large volume of data, and should be kept to allow for later cross-referencing of comments if needed.  However, it is not in a form that is convenient for coding.  One useful approach can be to bring all comments under each item together.  Academic level users of this website will be able to see such a summary in the workshop feedback available with experiments published in the database.  This type of data presentation can also be seen on the Qualitative Coding sheet in the Excel workbook related to this commentary.

Sometimes an individual comment seems to naturally fit into more than one category, and under these circumstances, double coding may be appropriate.  However, if you find that a comment seems to fit into three or more categories, then there you may need to revise the categories.  Moreover, it is undesirable for there to be a large amount of double coding, as this may indicate that the categories themselves are insufficiently distinct.   In other words, the categories, or codes, you identify can (and probably should) have some overlap, but if the extent of overlap is too large, then some revision is probably needed.  If you are collaborating with another researcher, a second opinion can be extremely helpful in this area.

When performing the coding process, a common trap for beginners (which can be seen on our worksheet) is to use numbers for each category.  This practice should be avoided as when you’re trying to work with a database of many comments it can become difficult to remember what theme is Category 5, for example.  Rather, it is better practice to use unambiguous descriptions for each code.  As an example, the code “SAFE” might be appropriate for the category related to issues surrounding safety.

Another common trap for beginners is to identify too many codes.  Whilst there are times that a large number of codes might be justified, it is worth considering whether your codes are, in fact, broadly different, or whether they are sub-categories which should be grouped together under a broader theme.

It’s probably best to illustrate the coding process with an example from our spreadsheet.  The worksheet Qualitative Coding shows that the comments have been grouped by item, with numerical codes used and a key included (it starts at row 119).  You can see that both “beginner pitfalls” mentioned above can be seen – there are 18 categories included, but it is questionable that 18 distinct themes are represented, and the numerical codes are confusing (try remembering what category 8 is as you look through the spreadsheet!). One positive aspect of this first pass through the data is that there is not an excessive amount of double coding, although the presence of triple coding is potentially problematic.

Note that this does not mean that the work done to this point is wasted, but it does suggest that some rationalisation may be needed.  In this case, the researchers involved separately considered the coding categories present, and then collaboratively concluded that many of these codes could be combined into much broader themes.  Codes 8, 12, and 13, for example, were judged to be all relate to workshop design, which was then given the broad code WD.  In some circumstances, these original codes might usefully be retained as sub-categories of the overall WD code.  For example, category 8 related specifically to the program of the workshop, and could easily have been coded as something like WD-prog.

It is worth noting that approaches to coding vary between individuals, and disagreement is common and desirable, as the process of coming to a consensus between researchers (known as investigator triangulation) will often produce a better and more robust coding scheme.  In the present example, the two authors of this page took opposite approaches, one developing about five broad categories, which would have needed to be subsequently sub-divided, and the other having eighteen initial categories, which were subsequently consolidated.  There is nothing inherently bad about either approach, and experience will likely lead different people to settle on different approaches.  As a final comment, it should be kept in mind that as the analysis above proceeds, it may be necessary to cycle through Steps 2 to 4 to address any shortcomings in the coding process, including in revising codes – this is not a problem, providing that the coding is ultimately consistent.

Step 4

The fourth step in the process involves collating and interpreting the data in terms of the codes identified. The final worksheet in the spreadsheet, titled “Revised Broader Categories”, re-tabulates the delegate responses into each of the revised broad categories, whilst also maintaining an identifier of (a) whether the comment was made by a staff or student workshop delegate, (b) the question the comment was made in response to (as this can be necessary for understanding the context in which a comment is made), and (c) an interpretation of whether the comment is expressing a positive or negative sentiment.  The data is now set up for both a statistical “perception” analysis in terms of the positive / negative sentiments expressed (see our discussion of Step 1 above) as well as for an analysis of the story being told.

Step 5

The final step in the process of content analysis is the interpretation of the data.  A key consideration here is triangulation of data – in other words, using all of the data that you have related to a particular topic to illuminate that topic.  (The book chapter by Siddell (1993) provides some straight forward discussion of triangulation, although the examples given are set in a health / aging context.)  In the event that you have any quantitative data (such as in response to Likert-style items), a good place to start might be by comparing those responses with that from the qualitative data.  In the event that there is consensus, this might make clearer the nuances in the story the data supports.  For example, the quantitative data may shows that delegates agreed that participating in the workshop had inreased their understanding of educational issues.  Qualitative data may provide further detail by identifying educational areas that were covered particularly well, or by indicating areas which delegates would appreciate being given further attention in the future.

Another possibility is that the data can appear contradictory.  For example, the workshop data above under the Workshop Design category is quite negative in terms of the sentiments expressed by both staff and students.  This does not seem to fit with the impression from the Likert data, as this suggests that delegates thought that the whole experiences was quite worthwhile.  Are the quantitative and qualitative data contradicting each other? Well, that’s where the content analysis helps to address this issue.

The important thing to keep in mind is that you are looking for evidence that tells a story. In this example, the story tells us that the delegates were negative about what we might call infrastructure aspects of the workshop; things like the quality of the student dormitory accommodation and the lack of air conditioning within the laboratories. Delegates were not expressing negative sentiments about the structure and design of the educational aspects of the workshop.

A third possibility is that the content analysis can reveal information which may be important and which is not available from other data sources.  For example, quantitative data from an item such as “It was clear to me what I was expected to learn from completing this experiment” may show broad agreement, but responses to the open-ended item “What did you think was the main lesson to be learnt from the experiment?” may show that the the focus of the experiment from the students’ perspective is different from that intended.  Alternatively, such an analysis may reveal that a small group (say 10% of the cohort) are not sure what they were supposed to learn, which would suggest that improvements could be made to the student notes.  Finally, such reponses could indicate that the cohort is divided into groups, each seeing the experiment as having a different focus.  Such information is valuable both to the submitter and to people considering introducing an experiment into their laboratory program.

At the bottom of this page is a conference proceeding that the ASELL team have published recently (Read et al., 2006) that includes some of the data from this analysis.  A more complete and detailed discussion of our analysis of the February 2006 ACELL workshop delegate experience is currently being reviewed and will be available on this website in the near future.


The overall goal of the process of content analysis is to look for the story that is being told by those surveyed. Look for the most interesting and informative comments, but be sure to quantify as to how common these sentiments are. Importantly, you do not necessarily want to ignore infrequent comments; such data may raise import issues that the majority of stakeholders failed to identify. Moreover, from an educational perspective, infrequent comments may highlight important but uncommon misconceptions.

As part of getting the educational analysis of your submitted experiment fully published under the ASELL umbrella, you will be required to undertake a content analysis of student survey data. Such an analysis will assist to identify the educational strengths and potential pitfalls of the experiment, and thereby assist you and others in the ASELL community to provide the best possible student laboratory learning experience. We hope that this “how to” guide assists you in your own analysis. Don’t forget, however, that we are here to help you. Please contact Justin in the first instance if you are struggling.


Andrich, D. (1978). Relationships between the Thurstone and Rasch approaches to item scaling. Applied Psychological Measurement,’ 2,’ 449-460.

Mason, J. (2002). Qualitative Researching. London, SAGE Publications.

Michell, J. (1986). Measurement scales and statistics: a clash of paradigms. Psychological Bulletin,’ 3,’ 398-407.

Miles, M. B. and Huberman, A. M. (1994). Qualitative Data Analysis: An Expanded Sourcebook. London, SAGE Publications.

Read, J. R., Buntine, M. A., Crisp, G. T., Barrie, S. C., George, A. V., Kable, S. H., Bucat, R. B. and Jamie, I. M. (2006). The ACELL project: Student participation, professional development, and improving laboratory learning. In Symposium Proceedings: Assessment in Science Teaching and Learning (pp. 113-119). Sydney, NSW: UniServe Science.

Sidell, M. (1993) Interpreting. In P. Shakespeare, D. Atkinson and S. French (Eds) Reflecting on Research Practice: Issues in Health and Social Welfare (pp. 106-118). Buckingham, UK: Open University Press.

Silverman, D. (2005). Doing Qualitative Research: A Practical Handbook. London, SAGE Publications.
Copyright ©: Mark A. Buntine and Justin R. Read, 2007

This page was last updated on 22 January 2007; it may be referenced as:

Buntine, M. A. and Read, J. R. (2007). Guide to Content Analysis. Available from

Related files:

  1. Data from Survey of February 2006 Workshop Delegates: This file contains the raw data from the survey of delegates at the February 2006 ACELL workshop, plus some of the content analysis of the qualitative data from the open-ended survey items.  It is provided to be read in conjunction with the Guide to Content Analysis provided on the ACELL website. Download Related File
  2. The ACELL Project: Student Participation, Professional Development, and Improving Laboratory Learning: This refereed conference proceeding focusses on ‘the February 2006 Workshop, and the feedback collected at that workshop. Full reference: Read, J. R., Buntine, M. A., Crisp, G. T., Barrie, S. C., George, A. V., Kable, S. H. ,Bucat, R. B. and Jamie, I. M. (2006). ‘The ACELL project: Student participation, professional development, and improving laboratory learning. In Symposium Proceedings: Assessment in Science Teaching and Learning (pp. 113-119). Sydney, NSW: UniServe Science. Download Related File