Using Big Data

Ross Woods, 2020

Educational institutions have the opportunity to get a detailed picture of its students and its programs by analysing its database. For the approach to work, it needs to put all its data in one database, have enough students to make statistical data meaningful, and have suitable software. Computing has the advantage that it is not dependent on small samples, but can use the whole data set. Because the range of possible variables is extremely wide, managers can answer questions like these:

  1. Do male students get higher grades than female students?
  2. Do students from Country X get better grades than students from Country Y?
  3. Do younger students get better grades than older students?
  4. Do younger students have higher dropout rates than older students?
  5. At what stages are students most likely to drop out?
  6. What factors indicate a high risk that a students will probably drop out in the future?
  7. Which students have grades that tend to improve over the time of their degree program?
  8. Which students have grades that tend to worsen over the time of their degree program?
  9. Do students from ethnic group X do better or worse than average?
  10. Do some particular units get below average assessment results?
  11. Do some particular assessment activities get below average assessment results?
  12. How long do students spend online?
  13. How long do students spend on each kind of interaction?
  14. How does student time online relate to the semester hour rating for each unit?
  15. What is the cost-effectiveness of each department of the institution?
  16. What is the cost-effectiveness of each tutor? Do they differ in their abilty to attract and retain students?
  17. Do students make unrealistic demands of tutors? And how should we define unrealistic demands?
  18. Do particular course designs improve student retention?
  19. Do particular demographics tend to plagiarism?
  20. Do particular demographics tend to have difficulty paying fees?

Comparisons can all be expressed as hypotheses, e.g. “In degree X, students of age-group Y achieve higher average grades than students in the Y age-group.” The data are then used to test the hypothesis.

Many insignificant differences might not indicate any useful conclusion and the hypothesis could be safely ignored. Some insignificant differences, however, can be helpful, for example:

When statistical comparisons identify a problem, they are normally only descriptive. That is, they generally do not identify the cause nor provide a solution. For example:

Correlation between variables does not demonstrate a cause and effect relationship. If two variables rise and fall together (i.e. they correlate positively), which is the cause and which is the effect? Could they both be effects of some other unidentified cause?

Student and cohort progress

The question as to whether students are progressing satisfactorily has both formative and diagnostic value, especially as data can be used to track the progress of individual students over time.

  1. How well is each student progressing through individual units, and through the whole degree program?
  2. How well is each cohort progressing? (Different cohorts might perform quite differently.)
  3. Do the grades of a cohort follow a normal (bell shape) curve? What does the shape of the curve indicate? (For example, the peak might be at the lower or higher end of the scale. Two peaks indicate that the cohort probably comprises two different kinds of students.)
  4. Do students in the same cohort get higher grades in some units than in other units?

Predictive value

If data represent a long enough period with a large enough population, they can have predictive value, which allows the institution to identify problems in time to instigate early interventions. For example, students who do not achieve a grade of B in their first year in units of their major might be much more likely to drop out in later semesters. Consequently, the institution provided extra remedial support to improve student retention. (Cf. Dimeo, 2017.)

Teaching and assessment materials

Did the teaching materials create misunderstanding of the content? What is a good match for the assessments? Were there gaps?

How effective is a particular assessment activity? What were students’ assessment results for that activity? Did they often hand it in late or ask for extensions? How many just didn’t do it? Did these things vary between cohorts? Did they vary from year to year? How did it correlate with student feedback?

Effects of demographic factors

The data can be used to see possible effects of demographic factors, such as gender, location, ethnicity, disability, background education, and age group.

Feedback

The same general approach is used to analyse feedback, which might be as follows:

Graphs

Results are usually easier to see in graphs. It is also possible to create graphs that combine various results for purposes of comparison.

Online programmed instruction

In an online programmed instruction, the data can indicate the percentage of students choosing the correct answer for each learning activity, the percentage of students choosing each incorrect answer, how long it took for students to do each activity, whether students repeated any sections, and whether they took any remedial loops.

How good is the data?

In assessment, rubrics seem to produce more consistent results than general assessment judgements. In questionnaires that produce statistical data, specific questions are normally more useful than vague questions.

However, this should be qualified in several ways. First, any assessment that depends on assessor judgement is subject to various personal biases, on which much research has been done. Second, personal responses are still valuable; it’s just that qualitative data needs a different kind of analysis.

Data sensitivity

Educational institutions have access to extensive personal information of many kinds. In fact, if an online institution has its own avenue for online socializing, an educational institution has more potential to get more individual private information than social media such as Facebook and Instagram.

The security of private information is therefore of paramount importance and is mission-critical to the whole institution. One major, preventable leak could destroy the entire institution.

Institutions also face the possibility of profiling particular students subgroups, such as ethnic minorities, or disability groups. It can also apply to staff, if, for example, it is found that a particular tutor is unpopular and precipitates student dropouts.

The VCI example

The VCI study compared the following data:

  1. Grades for each unit for each student, based on assessment rubrics for each unit
  2. Feedback from church leaders on the whole program
  3. Feedback from church leaders on individual student development
  4. Student satisfaction surveys at the end of each unit
  5. Graduate exit survey
  6. Student feedback, some of which was anonymous but still cohort-specific.
  7. Themes that run through units, called strands, showing the development of each student in that theme.

By using comparisons of independently derived data as a triangulation procedure, the results were self-validating and gave a finely grained picture of the strengths and weaknesses of the program and of indiviual students. It was also easy to compare different cohorts.

It also permitted rviewers to look not only at pass rates, but also where students in a cohort recieved low, but passing, grades.

Most of the data was based on closed questions, which are more easily expressed as statistics. Some questionnaires had some open questions, but they were not easily expressed as statistics and were not generally used for statistical comparisons.

____________________
References
Dimeo, Jean. 2017. "Data Dive." Inside Higher Ed July 19, 2017. Viewed 10 Sep. 2020. https://www.insidehighered.com/digital-learning/article/2017/07/19/georgia-state-improves-student-outcomes-data

Scott, Daniel (Chair); Webster, Jefferson; Wolvaardt, Bennie; Woods, Ross; Mossa, Moheb; Cagle, Austin. “Master of Arts Degree – Learning Outcomes Report, Report Year: 2020”, (San Antonio, Tx.: Veritas College International).