Coding data for qualitative analysis

Ross Woods, 2022-24

Thematic coding is an increasingly common way to analyze documentary data, usually transcripts of interviews or focus groups. As a qualitative methodology, it gives researchers a way to interpret data.

Thematic coding is derived from a research approach called grounded theory. In essence, this is a method of using a comprehensive set of examples to identify patterns, from which the researcher can create a theory. The theory is justified by the range of real examples.

Thematic coding is really only a systematic way of analyzing data to reach a conclusion. It has become increasingly popular in recent years, partly because it looks like a set of steps. However, the latter stages are less procedural and require more thought. It is not actually a set of steps, but is more like a set of phases that can overlap. For example, you can start transcribing and analyzing data as soon as it is collected.

Thematic coding has several advantages. First, the researcher simply has to follow the method. Second, it gives a way to systematically analyze lots of data, such as when writing a longer thesis or a dissertation. Third, the researcher can use it as in stages, giving an opportunity to adapt the method as needed, and perhaps hold more interviews. Fourth, it is easier if teh researcher uses voice-to-text software to transcribe interviews.

It also has several disadvantages. Although it is quite flexible, it probably doesn’t allow much scope for innovation. If you do not use transcription software, it is very time-consuming to transcribe interviews by hand, or quite expensive if you have to pay someone else to do it for you.

A series of stages

Stage 1: Continue keeping a diary

You should already be keeping a written diary of your methodolology, including what you did, why you did it, your methods, and your observations. Your description is essential to your accountability. In principle, it must to be detailed enough to enable someone else to follow your method.

Add records of your reflections to your diary. You can start to informally analyze data it as soon as you start collecting it. You should take notice anything relevant to your research question, for example:

  1. Questions on what seems to be important or curious
  2. Observations
  3. Emerging patterns
  4. Questions that aren’t answered yet
  5. Outliers and anomalies
  6. Specific “turns of phrase,” interesting quotations that best encapsulate a strand of meaning in the data.
  7. Contradictions and things that don’t seem to make sense,
  8. Suspicions that things are not quite what they appear to be.

In your diary, you should also write down the reasons why you interpreted the data the way you did. The description will probably be quite simple at first, but any later changes or elaborations will be significant because they indicate a better interpretation of the data.

💡 If you are writing a dissertation:

Stage 2: Decide whether or not you will use deductive themes

You can can start formulating deductive themes very early in the whole process, even before before you collect any data. It is quite permissible to derive a set of deductive themes from your literature review or your statement of the research question. However, it would be a mistake to use deductive themes exclusively because other unexpected but significant themes emerge later on in the data.

The other alternative is to use inductive themes, that is, themes that emerge in the data, that you identify as you assign codes and themes. You can even modify your system of inductive themes during data-gathering and analysis in order to get themes that better represent your data.

Stage 3: Let your system evolve

Some of your ongoing analysis might affect your data gathering. Qualitative research is often iterative, and this method allows you to improve your data collection and analysis as you progress:

Stage 4: When to stop collecting data

One of these methods will help you when to decide to stop collecting data:

  1. You have reached “data saturation” when you have gathered data such that gathering more would not improve, strengthen, nor add to your conclusions.
  2. In some kinds of research, such as ethnography, it is usually possible to keep collecting more and more data. In these cases, the criterion is your research question. You can stop collecting data when you have enough to confirm your conclusions.
  3. In other cases, you can stop when you have obtained data from everybody in your sample or from everybody in your population.

Stage 5: Transcription

You can start transcription as soon as you have collected data. Transcribe it word-for-word into documents, although you might be able to exclude anything clearly irrelevant to your research purpose. Warning: Some things that look irrelevant at first might appear more relevant later on when you understand the data better.

Most researchers prefer to use transcription voice-to-text software or external services to do transcriptions. A few, however, prefer to do it manually because it brings the very close to the data, even though it is horrendously time-consuming.

Stage 6: Familiarize yourself with the data

Start reading and re-reading all your data while it is still coming in, and make diary notes of any other questions arising. (If you transcribe manually, this will come very easily.)

When you become very familiar with your data, it might look very little even when you have enough. Don’t worry.

Stage 7: Select quotations

You might have started collecting quotations, but now you can treat it as an extra stage. Using direct quotations from respondents in your final report has two particular benefits:

  1. Your readers will see the lives of real people whom you have interviewed. This personalizes your research report and makes it easier to read.
  2. It shortens the distance between respondents and reader. This helps to prevent an analysis that is largely an artificial construct that you have created.

Stage 8: Coding

If possible, start coding as soon as you have transcriptions and are familiar with the texts, while you are still collecting data.

Mark all parts of the text that are relevant to your research question with a color-code or symbol. These might be “recurring patterns, terms, or visual elements.” (Naeem et al. p. 2.) On each part of the text that you marked, put a brief label of single word or a short phrase that says what is going on. These labels are your codes. Coding is itself paart of analysis, because you are sorting raw data into structured meaning.

You now have a patchwork of the meanings of everything in your data that is relevant to theory development. It is also simpler and briefer than the full text of raw data.

It is good practice to have someone else check your coding. It will help prevent or minimize personal bias in interpreting data.

Stage 9: Assign themes

Group related codes together and represent them with a theme, that is, an overarching idea that represents what is happening. Themes are a higher level of abstraction.

Stage 10: Check your themes

Do your themes accurately represent the theoretical ideas in your data and codes?

Stage 11: Develop a Conceptual Model

When your have created a system of themes, compare different occurrences and look for patterns in the data. By this stage, you should be able to see patterns; the sooner you spot the patterns and confirm them, the faster you make progress. You will find that you read the transcripts again and again, and become very familiar with them.

What are the relationships between codes and themes? You can use diagrams or models to represent the relationships among these concepts. (Naeem et al. p. 4.) Can you accurately define those relationships and demonstrate them from your data?

You can try this approach as long as you don't treat it as a rigid set of steps that will meet all your conceptualization needs**:

  1. Think of the object of your research as a phenomenon that is not yet understood.
  2. Think of your data as a set of examples of the phenomenon.
  3. Ensure that you really have only one phenomenon, and not multiple different phenomena that should be kept separate. (This is the eidetic question.)
  4. Sift your data to answer the following questions:
    1. What events or condition actually caused this phenomenon?
    2. What intervening factors determine the path of events and cause variations in outcomes?
    3. In what contexts does it occur? Describe it as a specific set of properties.
    4. What interactions occurred?
    5. Were there changes in the phenomenon over time?
    6. Were there changes in the whole process over time?
    7. What were the particular aims or purposes of the phenomenon?
    8. Were there occurrences of the phenomenon that failed to achieve their purpose?
    9. What are the results of the phenomenon?
  5. Compare examples to resolve apparent contradictions.
  6. If necessary, get more data to answer all the above questions accurately.

Avoid these mistakes

Some mistakes are easy to make if you make incorrect assumptions about your respondents:

Questions

How many themes?

There is no rule about specific numbers of themes. The principle is that you need enough to represent the data accurately and to help you reach sound conclusions. If the number of codes hinder and confuse your analysis, you should ask whether the number of them is the cause of the difficulty.

How can I know that the data will answer my research question?

It will if your questions gather data that addresses the research question. (This is why alignment is so valuable.)

However, the answer might not be the answer you anticipate. Some students think that their data is wrong if it leads to conclusions that they didn't expect.

How can I code qualitative data from my interviews so that I work smarter, not harder?***

Organizing large amounts of data is possible with a computer, but it might not be the best way for everybody. Besides, if you make a mistake with a computer you might not notice it or might not be able to reverse it. Advice so far:

  1. Many students use software to do coding and this works for them. If so, keep backups of your original data files. Make progressive backups every time you make major changes.
  2. Many students like to print them out on paper, use color coding, and perhaps even put them on a wall as a huge chart.
    1. Some people simply prefer to work with paper.
    2. Printouts help them to visualize the entire dataset. It might also help them to feel more familiar with the data and perhaps to identify patterns in it. At the very least, seeing the whole thing at once will make you feel very satisfied.
    3. Paper printouts might even be necessary if you still cannot identify patterns in the data after coding with software.

__________
* A mathematical proof of data saturation is unlikely because qualititative data is not appropriate for a mathematical proof.
** Ross Woods, 2020, '24, derived from Strauss and Corbin, 1990, pp. 99-107.
*** With thanks to Rιchαrd Scοtt Bαskαs, Rαιnεε Βrγαnt, Lγndα Dανis.

Muhammad Naeem, Wilson Ozuem, Kerry Howell, and Silvia Ranfagni. A Step-by-Step Process of Thematic Analysis to Develop a Conceptual Model in Qualitative Research International Journal of Qualitative Methods Volume 22:1–18 (2023) DOI: 10.1177/16094069231205789

Ross Woods, 2020, '24. Toolkit of research methods.

Anselm Strauss and Juliet Corbin. 1990. Basics of Qualitative Research: Grounded Theory and Procedures and Techiniques (Newbury Park, Ca.: Sage Publications).