In June I successfully defended my dissertation at Northeastern University. My research focused on grading and assessment, which will likely not surprise anyone who has been reading this blog for a while, as I have written about grading and assessment frequently.
My dissertation was qualitative action research, a dissertation in practice grounded in the Carnegie Project on the Education Doctorate. Grading and assessment are ripe for qualitative action research because we have over a century of quantitative research in grading and assessment, and not as much positive change, at least with grading, as we might like to see. I might argue we are seeing more authentic assessment in schools, but grading remains, well, stuck. One of the reasons I think we’re stuck is that we believe persistent myths about grading.
Grades Communicate Students’ Proficiency
One of the most persistent myths about grading is that we agree on what grades mean. As long ago as 1888, researchers were raising questions about inter-rater reliability (Edgeworth, 1888). Study after study indicates that grades are highly inconsistent measures of students’ learning. Starch & Elliott (1912) conducted a study that examined consistency among graders and found that scores on student writing varied by 30-40 points out of 100, or a probable error of 4.5. You might be thinking, “yes, but isn’t writing a little subjective anyway? I’m sure that doesn’t happen in, say, math.” Well, the following year, Starch & Elliott (1913) found that scores on a geometry exam varied even more widely—as much as a probable error of 7.5. They ascribed the difference to several factors: the possibility that graders differently evaluate the students’ methods for reaching the solution, that they assess quality of the students’ drawings, and that they assign different values to problems.
Naturally, things have changed in a hundred years. What do more recent studies say? Brimi (2011) sought to answer that very question. Brimi (2011) engaged 73 participants working for the same school district trained to use the 6+1 Traits of Writing Rubric developed by Education Northwest to score the same argumentative essay using the rubric. The participants’ grades ranged from an A to an F on the traditional grading scale; furthermore, the range of scores assigned to the essay spanned 46 points (Brimi, 2011).
Grading is inconsistent for many reasons, but one of the chief reasons is that teachers evaluate different things when they grade. Some teachers offer extra credit or give students points for bringing supplies (Townsley & Varga, 2018). Teachers can be highly individualistic in selecting criteria for students’ performance (Bloxham et al., 2016). Other factors also impact how teachers evaluate students’ performance. For example, Brackett, et al. (2013) found that a teacher’s mood while grading can impact students’ scores—teachers in a bad mood tend to rate students’ performance lower. This holds true even when grading more objective criteria such as correct spelling (Brackett, et al. 2013). Think what this means as we are teaching in the midst of a pandemic and during a time when it feels as though teachers are being attacked from all sides.
One of the reasons traditional letter or number grades emerged is due to perceived inconsistency, inefficiency, and complication involved in narrative grade reports (Feldman, 2019). It was thought that letter grades could communicate learning both efficiently and plainly (Schneider & Hutt, 2014). By the 1940s, the A-F letter grade system had become the most popular grading system (Schneider & Hutt, 2014).
Traditional grades tend to be derived by averaging the performance on all assessments during a grading period; this average may not capture students’ eventual proficiency in learning and can place undue emphasis on performance anomalies rather than tendencies (Feldman, 2019). In addition, traditional grading sometimes incorporates assessment of student behaviors, such as participation, engagement, and effort (Feldman, 2019).
We might think that grades communicate students’ proficiency in learning, but there are simply too many variables to say this definitively.
Grades Motivate Students
One fear many educators express is that if students are not graded, they will not be motivated to do the work. At best, grades serve as extrinsic motivation for learning. When students care more about the grades than the learning, they are more likely to resort to academic dishonesty. In fact, pressure to earn high grades contributes to academic dishonesty and mental health problems (Rinn et al., 2014; Villeneuve et al., 2019). Grades affect students’ achievement, self-concept, and motivation (Casillas et al., 2012; Pulfrey et al., 2011). Students who earn low grades tend to achieve less and feel lower self-esteem over time (Klapp, 2018).
Fear of earning low grades or focus on earning high grades both serve as extrinsic motivators for learning rather than intrinsic motivators, which demonstrate more effectiveness in supporting learning (Froiland & Worrell, 2016; Hattie & Timperley, 2007). Intrinsic motivation is positively associated with both engagement and achievement (Froiland & Worrell, 2016; Hattie & Timperley, 2007). Helping students develop their intrinsic motivation to learn may increase students’ achievement (Froiland & Worrell, 2016). Extrinsic motivation to earn good grades or avoid the negative consequences of poor grades drives many students rather than the desire to learn, and over time, extrinsic motivation decreases students’ achievement (Hattie & Timperley, 2007). In addition, the reward of good grades tends to decrease motivation for otherwise engaging learning (Hattie & Timperley, 2007).
It’s worth noting that motivation appears to change depending on the grading system used. When students are graded using a 100-point system in which the sum of all student work is worth a total of 100 points, students tend to view each point deducted as a loss (Smith & Smith, 2009). Bies-Hernandez (2012) describes such grading systems as “loss-framed grading” (p. 179). However, when students are graded using a total points system tallying all points earned, they tend to view grades as opportunities to improve and build toward a desired grade (Smith & Smith, 2009). Students who are graded with a system weighting assignment categories by percentage fell in between students in the other grading groups (Smith & Smith, 2009). Even if controls ensure that the resulting grade is the same regardless of the calculation system, students’ responses on a Likert scale questionnaire indicate they still perceive greater risk in 100-point systems and were less motivated and self-assured (Smith & Smith, 2009). Bies- Hernandez (2012) replicated these findings and further found that students’ performances in courses with a loss-framed grading system also decreased. Thus, the framing of the grading system not only has an impact on students’ perceptions of their performance but also on their actual performance (Bies- Hernandez, 2012). The implication is that teachers’ approaches to grading may affect students’ academic achievement (Brookhart et al., 2016).
However, proficiency-based grading (sometimes known as competency-based grading, standards-based grading, or mastery-based grading) has the potential to make grades more meaningful and purposeful (Buckmiller et al., 2017; Guskey, 2007). Proficiency-based grading practices may also lead to greater academic achievement, particularly if the grades are paired with formative feedback (Hattie & Timperley, 2007). Proficiency-based grading practices may also foster more cooperation and less competition (Burleigh & Meegan, 2018). Taking academic risks, weighing differing conclusions, and considering varied points of view are all necessary for developing critical thinking skills, but if students must risk failing grades in order to do so, they are much more likely to take the safer route to earning a higher grade (Hayek et al., 2014; McMorran et al., 2017). Knowing that they could continue to learn, revise, and reflect on their work may increase students’ motivation to learn (Hattie & Timperley, 2007; McMorran et al., 2017).
100-point Grading Scales are More Precise than A-F or 4-Point Grading Scales
Do you know why we use the 100-point scale? It’s not because it’s more precise. It’s because it’s the scale in the gradebook software (Guskey, 2013; Guskey & Jung, 2016). The 100-point scale is terrible, and that’s a hill I’m willing to die on. The 100-point grading scale has become one of the most common scales for reporting students’ grades, but it is one of the most unreliable scales in use (Guskey, 2013).
The 100-point scale is inaccurate and inequitable because the scale is skewed toward failing grades (Feldman, 2019). Passing grades comprise only 40 points of the grading scale, spanning typically from 60 points to 100 points (or from 70-100 points in some systems!), while failing grades comprise the remaining points possible spanning from 0 to 59 (or even 0-69). Serious mathematical errors arise when teachers input zeros in the gradebook when students are missing work (Feldman, 2019). While this practice ostensibly holds students accountable for handing in work, it can make it impossible for students to recover academically (Feldman, 2019). The literature suggests that teachers may compensate for the 100-point scale’s mathematical errors by artificially raising grades in a number of ways (Schneider & Hutt, 2014), including grading formative assessments and executive function skills (Bowers, 2011; Brookhart et al., 2016; Townsley & Varga, 2018).
Unfortunately, a lot of educators perceive the 100-point grading scale to be more accurate (Brookhart & Guskey, 2019; Feldman, 2019). While using 100 points as opposed to four or five points may seem more accurate, it results in a probable error of five or six points; teachers find it difficult to distinguish levels of performance on a 100-point scale (Brookhart & Guskey, 2019). Some grading reformers advocate for the use of minimum grading, or inputting a minimum grade such as 50 percent, rather than inputting zeros for missing work; this practice reduces mathematical error (Carifio & Carey, 2013; Carifio & Carey, 2015; Feldman, 2019). Essentially what educators are doing when they use minimum grading, however, is compensating for the deficiencies of the 100-point scale by converting it to a rough approximation of the 4-point scale. In a four-point scale, failing grades span from 0-0.99 of a point, while passing grades span from 1-4 points (or 2-4 points in a system without a “D”).
Grades Reduce Bias
Variable and unreliable grading practices also introduce equity problems. Black students have less access to AP courses all over the United States (Francis & Darity, 2021). Schools that use gatekeeping methods (Francis & Darity, 2021), such as teacher recommendations and prerequisite grades, may be basing their decisions about students’ fitness for advanced coursework on subjective measures common in traditional grading (Feldman, 2019). Students of color are most impacted by teachers’ implicit bias (Feldman, 2019), especially if subjective, non-academic factors are included in assessment (Cvencek et al., 2018). Implicit bias may especially play a role in lower grades assigned to students of color when the criteria for proficiency are unclear or undefined (Quinn, 2020). Traditional grading’s subjectivity can harm all students, but students of color may be most impacted due to implicit bias (Feldman, 2019; Quinn, 2020).
However, proficiency-based grading can make grades more equitable and more reflective of students’ actual learning (Buckmiller et al., 2017). Proficiency-based grading may include using practices such as rubrics for evaluating student work and student-generated portfolios; however, it may also include traditional assessments such as tests (Baete & Hochbein, 2014; Buckmiller et al., 2017; Iamarino, 2014; Miller, 2013). Students’ grades are tied to their mastery of content, such as standards, knowledge, and skills, as opposed to an average of all the grades earned during a grading period or course (Iamarino, 2014; Miller, 2013). Teachers using proficiency-based grading typically provide students with feedback on formative assessments (Buckmiller et al., 2017). Students may revise and resubmit work in order to demonstrate their proficiency in learning (Buckmiller et al., 2017). Through revision, students demonstrate their learning of the content and skills. As a result, proficiency-based grades may more accurately reflect what students have learned rather than a snapshot of their performance on a single assessment.
We Have to Use Grades
Grades have actually not existed, at least not in the form we’re familiar with, for a very long period of time (Schneider & Hutt, 2014). One of the worst reasons to perpetuate any system is the notion that we’ve always done it that way, especially when it’s not even true that we have always done it this way. The A-F grading system gained popularity as late as the 1940s—as I mentioned before—as educators saw a need to establish more uniform methods for determining students’ proficiency (Schneider & Hutt, 2014). For many years preceding the establishment of “traditional grading,” we used all sorts of other systems (good and bad) for measuring learning. This system is entrenched, but it’s not as old as people might think, and if we decided, collectively, that it no longer worked for us, we could find a better system. The problem is, well, that it’s a system, and systems are notoriously hard to change.
I have heard many educators express anxiety that students will either not be prepared for college or will not get into college unless they are graded. Many schools, however, have successfully eliminated traditional grades. Colleges understand the transcripts these students send them, and these students are able to go to college. For example, the Watershed School, a member of the Mastery Transcript Consortium, does not issue traditional letter grades or test students through final exams and has a 100% college acceptance rate (Plaskov, 2019). A college counselor I worked with told me anecdotally that “colleges are fine with grading that’s ‘non-traditional.’ Parents usually get very concerned about going off the A-F standard, but college admissions folks are experts on grading scales, and what I’ve consistently heard from them is that the most-accurate/least-translated reporting is what they like.”
My own personal experience is that some schools’ grading practices are more entrenched, and while another system of evaluation would work, it wouldn’t be politically feasible. Proficiency-based grading shows additional promise here. Attaching grades to standards or competencies can make grades more accurate reflections of students’ proficiency in learning. Proficiency-based report cards have the potential to be more useful in understanding students’ learning than traditional report cards including only a letter grade (Blauth & Hajdian, 2016; Swan et al., 2014). Swan et al. (2014) found that parents and teachers generally find proficiency-based reports more helpful and easier to understand, in addition to having more and better information about students’ progress.
It’s worth noting that one study I examined indicated parents reported feeling less confidence in the standards-based grade reports because they were unfamiliar and felt the school had not taken their feelings as stakeholders into account before implementing standards-based grade reports (Franklin et al., 2016). These parents also reported finding the grade reports unclear (Franklin et al., 2016). Importantly, Franklin et al. (2016) indicate the parents in their study were all dissatisfied with standards-based report cards; these parents also described themselves as strong students who enjoyed school. Their study did not include parents who expressed satisfaction with the reports. (Franklin et al., 2016).
The Bottom Line?
I think it’s important for teachers to open dialogue with students and parents, read the research on grading and assessment, and work within the system they’re in to make grades more accurate and meaningful. I highly recommend the works referenced in this post, which is derived largely from my dissertation. For a good deep dive, Joe Feldman’s book Grading for Equity is excellent.
Baete, G. S. & Hochbein, C. (2014). Project proficiency: Assessing the independent effects of high school reform in an urban district. The Journal of Educational Research, 107(6), 493-511. https://doi.org/10.1080/00220671.2013.823371
Bies-Hernandez, N. J. (2012). The effects of framing grades on student learning and preferences. Teaching of Psychology, 39(3), 176-180. https://doi.org/10.1177/0098628312450429
Blauth, E. & Hadjian, S. (2016). How selective colleges and universities evaluate proficiency-based high school transcripts: Insights for students and schools. New England Board of Higher Education. https://www.nebhe.org/info/pdf/policy/Policy_Spotlight_How_Colleges_Evaluate_PB_HS_Trans cripts_April_2016.pdf
Bloxham, S., den-Outer, B., Hudson, J., & Price, M. (2016). Let’s stop the pretence of consistent marking: Exploring the multiple limitations of assessment criteria. Assessment & Evaluation in Higher Education, 41(3), 466-481. https://doi.org/10.1080/020602938.2015.1024607
Bowers, A. J. (2011). What’s in a grade? The multidimensional nature of what teacher-assigned grades assess in high school. Educational Research and Evaluation, 17(3), 151-159. https://doi.org/10.1080/13803611.2011.597112
Brackett, M. A., Floman, J. L., Ashton-James, C., Cherkasskiy, L., & Salovey, P. (2013). The influence of teacher emotion on grading practices: A preliminary look at the evaluation of student writing. Teachers and Teaching, 19(6), 634-646. https://doi.org/10.1080/13540602.2013.827453
Brimi, H. M. (2011). Reliability of grading high school work in English. Practical Assessment, Research & Evaluation, 16(7). http://pareonline.net/getvnasp?=16&n=17
Brookhart, S. M., & Guskey, T. R. (2019). Reliability in grading and grading scales. In T. R. Guskey & S. M. Brookhart (Eds.), What we know about grading: What works, what doesn’t, and what’s next (pp. 13-31). ASCD.
Brookhart, S., Guskey, T. R., Bowers, A. J., McMillan, J. H., Smith, J. K., Smith, L. F., Stevens, M. T., Welsh, M. E. (2016). A century of grading research: Meaning and value in the most common educational measure. Review of Educational Research, 86(4), 803-848. https://doi.org/10.3102/0034654316672069
Buckmiller, T., Peters, R., & Kruse, J. (2017). Questioning points and percentages: Standards-based grading (SBG) in higher education. College Teaching, 65(4), 151-157. https://doi.org/10.1080.87567555.2017.1302919
Burleigh, T. J. & Meegan, D. V. (2018). Risky prospects and risk aversion tendencies: does competition in the classroom depend on grading practices and knowledge of peer-status? Social Psychology of Education, 21(2), 323-335. https://doi.org/ 10.1007/s11218-017-9414-x
Carifio, J. & Carey, T. (2013). The arguments and data in favor of minimum grading. Mid-Western Educational Researcher, 25(4), 19-30.
Carifio, J. & Carey, T. (2015). Further findings on the positive effects of minimum grading. Journal of Education and Social Policy, 2(4), 130-136.
Casillas, A., Robbins, S., Allen, J., Kuo, Y. L., Hanson, M. A., & Shmeiser, C. (2012). Predicting early academy failure in high school from prior academic achievement, psychosocial characteristics, and behavior. Journal of Educational Psychology, 104(2), 407-420. https://doi.org/10.1037/a0027180
Cvencek, D., Fryberg, S. A., Covarrubias, R., & Meltzoff, A. N. (2018). Self-concepts, self-esteem, and academic achievement of minority and majority North American elementary school children. Child Development, 89(4), 1099-1109. https://doi.org/10.1111/cdev.12802
Edgeworth, F. Y. (1888). The statistics of examinations. Journal of the Royal Statistical Society, 51(3), 599-635.
Feldman, J. (2019). Grading for equity: What it is, why it matters, and how it can transform schools and classrooms. Corwin.
Francis, D. V. & Darity, W. A., Jr. (2021). Separate and unequal under one roof: The legacy of racialized tracking perpetuates within-school segregation. RSF: The Russell Sage Foundation Journal of the Social Sciences, 7(1), 187-202. https://doi.org/10.7758/RSF.2021.7.1.11
Franklin, A., Buckmiller, T., & Kruse, J. (2016). Vocal and vehement: Understanding parents’ aversion to standards-based grading. International Journal of Social Science Studies, 4(11), 19-29.
Froiland, J. M. & Worrell, F. C. (2016). Intrinsic motivation, learning goals, engagement, and achievement in a diverse high school. Psychology in the Schools, 53(3), 321-336. https://doi.org/10.1002/pits.21901
Guskey, T. R. (2007). Multiple sources of evidence: An analysis of stakeholders’ perceptions of various indicators of student learning. Educational Measurement: Issues and Practice, 26(1), 19-27.
Guskey, T. R. (2013). The case against percentage grades. Educational Leadership, 71(1), 68-72.
Guskey, T. R. & Jung, L. A. (2016): Grading: Why you should trust your judgment. Educational Leadership,
Hattie, J. & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81-112. https://doi.org/10.3102/003465430298487
Hayek, A., Toma, C., Oberlé, D., & Butera, F. (2014). The effect of grades on the preference effect: Grading reduces consideration of disconfirming evidence. Basic and Applied Social Psychology, 36(6), 544-552. https://doi.org/10.1080/01973533.2014.969840
Iamarino, D. L. (2014). The benefits of standards-based grading: A critical evaluation of modern grading practices. Current Issues in Education, 17(2), 1-11.
Klapp, A., (2018). Does academic and social self-concept and motivation explain the effect of grading on students’ achievement? European Journal of Psychology of Education, 33(2), 355-376. https://doi.org/10.1007/s10212-017-0331-3
McMorran, C., Ragupathi, K., & Luo, S. (2017). Assessment and learning without grades? Motivations and concerns with implementing gradeless learning in higher education. Assessment & Evaluation in Higher Education, 42(3), 361-377. https://doi.org/10.1080/02602938.2015.1114584
Miller, J. J. (2013). A better grading system: Standards-based, student-centered assessment. English Journal, 103(1), 111-118.
Plaskov, J. C. (2019, October 23). Reimagining college admissions season. The Mastery Transcript Consortium. https://mastery.org/reimagining-college-admissions-season/
Pulfrey, C., Buchs, C., & Butera, F. (2011). Why grades engender performance-avoidance goals: The mediating role of autonomous motivation. Journal of Educational Psychology, 103(3), 683-700. https://doi.org/10.1037/a0023911
Quinn, D. M. (2020). Experimental evidence on teachers’ racial bias in student evaluation: The role of grading scales. Educational Evaluation and Policy Analysis, 42(3), 375-392. https://doi.org/10.3102/0162373720932188
Rinn, A. N., Boazman, J., Jackson, A., Barrio, B. (2014). Locus of control, academic self-concept, and academic dishonesty among high ability college students. Journal of the Scholarship of Teaching and Learning. 14(4), 88-114. https://doi.org/10.14434/josotl.v14i4.12770
Schneider, J. & Hutt, E. (2014). Making the grade: A history of the A-F marking scheme. Journal of Curriculum Studies, 46(2), 201-224. https://doi.org/10.1080/00220272.2013.790480
Smith, J. K. & Smith, L. F. (2009). The impact of framing effect on student preferences for university grading systems. Studies in Educational Evaluation, 35, 160-167.
Starch, D. & Elliott, E. C. (1912). Reliability of the grading of high-school work in English. The School Review, 20(7), 442-457.
Starch, D. & Elliott, E. C. (1913). Reliability of grading work in mathematics. The School Review, 21(4), 254-259.
Swan, G., Guskey, T., & Jung, L. (2014). Parents’ and teachers’ perceptions of standards-based and traditional report cards. Educational Assessment, Evaluation, and Accountability, 26(3), 289-299. https://doi.org/10.1007/s11092-01409191-4
Townsley, M. & Varga, M. (2018). Getting high school students ready for college: A quantitative study of standards-based grading practices. Journal of Research in Education, 28(1), 92-112.
Villeneuve, J. C., Conner, J. O., Selby, S., & Pope, D. C. (2019). Easing the stress at pressure-cooker schools. Phi Delta Kappan, 101(3), 15–19. https://doi.org/10.1177/ 0031721719885910