An analytics system

Ross Woods, 2018

An analytics system is a way of recording what users of an internet site do. In e-learning, the purpose is to answer several main questions. How is the course performing? How are individual learning activities (called frames) performing? How are students doing as groups and as individuals? The answers to these questions are useful at the testing stage when early identification and correction can prevent failures that would reflect badly in public image. These answers are just as useful later after units are released to enrolled, fee-paying students.

Analytics systems usually keep records based on login, logout, keystrokes, mouse-clicks, etc. A different kind of system is necessary when user behavior is defined by session variables and data rather than pages and keystrokes.

What data?

All necessary data can be recorded on one simple SQL table:

Student no. Unit1 Frame no. Entry timestamp2 Action timestamp3 Response4 Correct/
Incorrect5
Exit timestamp6

1 Unit is the id of the automated unit.
2 The time the student entered the frame.
3 The time the student responded with an answer.
4 The response would normally be the number of the answer selected.
5 To be calculated by the software to simplify and speed up SQL searches. (Although it would technically "de-normalize" data, the alternative would be to calculate "correct / incorrect" on the fly during searches.)
6 The time the student clicked on a button to continue.
If wanted, the log-in in and log-out of units is easily deduced from the available data; it is the login to a series of frames done consecutively.

Saving data

Saving data quickly becomes more problemmatic when the number of interactions increases. Writing to disk is slow, and even writing to RAM is sometimes inefficient. So far, the best option is Redis (redis.io), although other options might be available.

Kinds of questions, kinds of answers

  1. How much time did a student spend logged in?
  2. Do login times of a student follow a pattern?
  3. Do login times of a group follow a pattern?
  4. How long do students take from entering a frame to answering?
  5. How long do students take from answering a frame to continuing?
  6. What percentage of activities did the individual student get right?
  7. Do the correct and incorrect answers for a student follow a pattern over time?
  8. Do the correct and incorrect answers for a group follow a pattern over time?
  9. Do the correct and incorrect answers for a lesson follow a pattern over time?
  10. Do the correct and incorrect answers for a whole unit follow a pattern?
  11. What percentage of students answered a particular frame correctly?

The possibility of creating comparisons with more than two variables can generate interesting and useful research projects. Some corelations are below:

  1. Do amounts of login time correlate with percentage of correct answers?
  2. Do all groups perform the same? Why?
  3. Which frames take a long time? Does that result in correct answers?
  4. Which kinds of activity get the best results? Which don’t?
  5. Do certain frames result in students failing to answer or logging out?
  6. Do certain kinds of frames result in students failing to answer or logging out?
  7. Looking at frames used in review (exam prep), do they do better or worse than with frames in lessons and exams?
  8. What are the effects of right and wrong answers?
  9. Do students’ online patterns reflect good study habits and good results? (Do students tend to be lazy at the beginning of the week and then binge toward the end of the week. If so, could they get caught out by being unprepared?)

The use of personal data, degree candidatures, and schedules enables researchers to generate statistics for age groups, gender, region, degree programs, and exams:

  1. Which exam questions were done well? Which not well?
  2. Do students in degree A do better than students in degree B?
  3. Looking at frames used in exams, how did students do? Compare how well they did in the lessons with the exams.
  4. Do MOOCs and pre-admission students behave differently from enrolled students?
  5. Which corelations are the best indicators of:
    1. students at high risk of dropping out?
    2. groups falling below critical mass in size?
    3. groups falling below break-even point?
  6. Consider student risk factors such as:
    1. Consistant performance at border-line pass standard
    2. Declining performance
    3. Erratic performance
  7. What is an acceptable droput rate?
  8. Are students improving over time in percentages of correct frames? In grades? It would be easy to track students’ improvements over time both as individuals and as groups. What would this tell us?
  9. Could analytics be used to identify students who need streaming? For example, gifted students can be streamed into advanced units (e.g advanced placement in the next higher degree), while weaker students could be streamed into less demanding or remedial courses. This does not always work; some gifted students underperform because they are bored, while students at maximum capacity might cope poorly with more demanding courses.
  10. How does students’ feedback compare with their actual behavior in frames? (Researchers commonly compare subjects’ actual behavior with their reports of behavior.) In this case, it depends on whether student feedback can be kept anonymous.
  11. Does amount of time logged in justify the number of semester hours?
  12. What error rate indicates that an aspect of course materials needs to be reviewed? The aspect could be individual frames, a series of frames, a kind of frame, or a testing system.
  13. Tutors
    1. Do tutors get different results with different groups?
    2. Do different tutors get comparable results with similar groups?
    3. Do student results corelate with tutors' careers?
  14. What is the pattern of enrollments for a particular unit? Over time, is the number increasing, decreasing, fluctuating, or steady?
  15. Do units have an identifiable life cycle?
  16. How does the pattern of enrollments compare with the financial model for tuition fees and costs?

Notes

  1. A chi-square test can be built in to two-variable comparisons.
  2. The software makes it difficult for students to cheat through the use of randomized questions, variations in the choice of answers, randomized answers, and multiple versions of correct answers. However, in some cases, it is easy to compare timestamp, location and answers to indicate that students might be at least colluding, for example, frames where students put items in correct order and frames where students select A or B or Both or Neither (called baboon frames: B or A or Both Or Neither).
  3. Dropout can be the result of factors other than poor performance. High achievers might want to transfer to elite schools, some might also be enrolled in other institutions, some students might have erratic habits, be easily bored, or be easily dissatisfied.
  4. Interface
    1. For some kinds of data, the html page contains a form with a row of selectors for variables.
    2. Instructors and tutors might need to view significant stats. For example, they might need to see graphs of whole groups to see who needs help.

Graphs are frequently a useful way to display many kinds of data; they make it easier and faster to visualize large amounts of data. Below is another kind of visual representation, showing a series of simple binary behaviors using only period marks (.), the letter l, and a monospace font. It is easy to display quickly with very little code:

l…...l….ll…ll…l.ll…l…lll.l.l…llll..ll…l…lll.l.llll.l.llllll

What about forums?

Forums are discussions that don’t require all participants to be on line at the same time (asynchronous). The analytics statistics work just the same except that forums do not produce right and wrong answers.

Each major topic has its own forum, and each forum is a frame by itself. Students click a link to transfer from their current frame to the forum. The navigation is a little different, to enable students to resume the frame that they came from:

  1. The software creates a session variable (e.g. $_SESSION[resume]) with a frame number while they are in the forum.
  2. When they want to move back, they click on a button.
  3. On arrival at the resumed frame, the software would erase the [resume] session variable.