import React from "react";
import { Breadcrumb } from "../../Components/ContentPage/ContentRendering/Breadcrumb";
import FooterNav from "../../Components/ContentPage/Menus/FooterNav";
import CompletionExample from "../../Components/ContentPage/ContentRendering/CompletionExample";

export default function GradingFRQs({}) {
  return (
    <div>
      <Breadcrumb page={"Use Cases"} subPage={"Grading FRQs"} />
      <h1 className="font-semibold text-3xl mt-2">Grading FRQs</h1>
      <p className="mt-3 font-light text-md text-gray-800">
        Coming soon. In the meantime, if you want to request a resource, reach
        {" out "}
        <a
          href="https://www.facebook.com/groups/k12promptengineeringguide"
          target="_blank"
          className="hover:cursor-pointer hover:underline text-blue-700"
          rel="noreferrer"
        >
          {"here"}
        </a>
        .
      </p>
      {/* <p className="mt-3 font-light text-md text-gray-800">
        Before reading this article, we recommend you check out the previous
        article{" "}
        <span className="text-blue-600 hover:cursor-pointer hover:underline">
          Grading MCQs
        </span>
        . Using an LLM to grade an assessment follows similar best practices for
        multiple-choice questions and free-response questions alike. Many ideas
        here build off the previous article.
      </p>
      <p className="mt-3 font-normal text-xl scroll-mt-24" id="problems">
        The current problem(s) with grading FRQs
      </p>
      <p className="mt-3 font-light text-md text-gray-800">
        Free-response questions (FRQs) are a powerful pedagogical tool that
        forces students to consider, understand, and synthesize learned material
        at a deeper level{" "}
        <span className="text-blue-600 hover:cursor-pointer hover:underline">
          (research)
        </span>
        . They are also a uniquely valuable form of feedback for educators,
        helping them pinpoint the class’s understanding of a lesson or unit. For
        liberal arts and STEM classes alike FRQs are a staple of high-quality
        assessment.{" "}
      </p>
      <p className="mt-6 font-light text-md text-gray-800">
        Yet, FRQs have practical drawbacks. First and foremost, the sheer effort
        it takes teachers to thoroughly review and annotate student submissions
        discourages educators from using them{" "}
        <span className="text-blue-600 hover:cursor-pointer hover:underline">
          research
        </span>
        . In addition, designing and applying grading criteria is often an art
        and not an exact science, which makes grading FRQ answers more
        cognitively demanding than procedurally marking MCQs correct/incorrect.
        And because more fluid grading criteria creates less certainty for
        students trying to understand where they went wrong, there is usually a
        lot of grading and written feedback conversations that take place after
        assignments are returned. Thus, it is possible the sheer amount of work
        FRQs create for teachers outweighs the assessment benefit.
      </p>
      <p className="mt-9 font-normal text-xl scroll-mt-24" id="llmshelp">
        How LLMs can help you grade FRQs
      </p>
      <p className="mt-3 font-light text-md text-gray-800">
        In addition to preparing FRQs for assessment, LLMs can help
        significantly reduce the time intensive workload of grading and
        providing feedback on student responses. With context and guidance from
        an instructor, LLMs are remarkably capable of annotating text responses
        with comments, assigning scores to the responses, and giving written
        feedback for students and teachers alike to leverage. This holds true
        even for extended and essay-length responses.
      </p>
      <p className="mt-6 font-light text-md text-gray-800">
        Similar to grading MCQs, the only prerequisite is having digital text
        for a student’s response (or the patience to individually convert
        student assignments from handwriting to text). With this fulfilled, LLMs
        can act as a TA or class resource capable of reducing most of the
        cognitive load of grading FRQs.
      </p>
      <p className="mt-9 font-normal text-xl scroll-mt-24" id="experiment">
        Experiment with prompts for grading FRQs
      </p>
      <p className="mt-3 font-light text-md text-gray-800">
        The immediate impulse may be to ask an LLM to, given a question and
        answer, grade a student’s response and give written feedback. We will do
        this in a moment, but I invite you to take a step back and ask yourself:
        what are the heuristics you use to grade a student’s answer to an FRQ?
        The response should be: factually correct, logically consistent,
        grammatically correct and coherent, etc. Instead of, without context,
        asking an LLM to grade an FRQ, let’s instruct it to evaluate it along
        one of these heuristics.
      </p>
      <CompletionExample
        title={`1: asdf`}
        prompt={`FRQ + response + instructions (fact-check)`}
        completion={`feedback`}
        comment={``}
        show={false}
      />
      <p className="mt-6 font-light text-md text-gray-800">
        In the case fact-checking student FRQ responses happens to be relevant
        to your course content, this type of pre-grading annotation will help
        speed up your grading process and ultimately hold students accountable
        for honest and accurate work. We could update this prompt to ask the
        model to evaluate the student’s response among multiple distinct
        dimensions and assign a score to each heuristic.
      </p>
      <CompletionExample
        title={`2: asdf`}
        prompt={`FRQ + response + instructions (Fact-check, grammar, logic) + give score 1-5 with explanation`}
        completion={`feedback`}
        comment={``}
        show={false}
      />
      <p className="mt-6 font-light text-md text-gray-800">
        From this prompt, we receive a ‘scorecard’ evaluating the student
        response. This is not necessarily in the context of your grading rubric,
        a lesson/unit, or course objectives, but a general evaluation of the
        student’s response.
      </p>
      <p className="mt-6 font-light text-md text-gray-800">
        If we follow the above prompting to its logical conclusion, we’ve
        essentially decomposed the high-level guidance of an FRQ’s grading
        rubric into a set of “checks” that evaluate factual accuracy, coherency,
        and others. This could be useful for some situations. Let’s take another
        approach, where we provide grading rubric and class context to deliver
        this score and feedback for a student’s response.
      </p>
      <p className="mt-6 font-light text-md text-gray-800">
        Given a FRQ, a student’s response, and a grading rubric{" "}
        <span className="text-blue-600 hover:cursor-pointer hover:underline">
          (generated with guidance in the FRQ article)
        </span>
        , we can instruct the LLM to score the response from 1 to 3. Let’s also
        ask the model to give a short explanation for its score.
      </p>
      <CompletionExample
        title={`3: asdf`}
        prompt={`frq + student response + grading rubric + instruction`}
        completion={`1-3 + explanation`}
        comment={``}
        show={false}
      />
      <p className="italic text-center font-light text-md text-gray-800">
        The model can imply based on the rubric how the specific question should
        be graded, but it can be valuable to re-emphasize this as an explicit
        instruction.
      </p>
      <p className="mt-6 font-light text-md text-gray-800">
        Would you agree with this grade and the accompanying explanation? What
        about it would you change? A quick tip is using this thinking when
        experimenting with prompts to communicate your expert knowledge as an
        educator to the model.
      </p>
      <p className="mt-6 font-light text-md text-gray-800">
        You may not have practical use for this model’s explanation of the
        score, but it helps both improve interpretability of the LLM as you are
        iterating with different prompts (why did the model produce this score),
        and implicitly forces the model to reason why it gave this score. If you
        quickly evaluate the model’s confidence on this score, use the
        ‘re-generate’ feature (available in most mainstream LLMs) to quickly
        generate a new evaluation - then compare your results.
      </p>
      <p className="mt-6 font-light text-md text-gray-800">
        If there is additional context to the student’s response, such as source
        materials supplementing the FRQ, including them in the prompt will
        further help the LLM evaluate the student’s response in the context of
        the rubric and the question’s related documents.
      </p>
      <CompletionExample
        title={`4: asdf`}
        prompt={`fall above + supporting documents`}
        completion={`1-3 + explanation`}
        comment={``}
        show={false}
      />
      <p className="italic text-center font-light text-md text-gray-800">
        Click on the ChatGPT icon and try re-generating this response multiple
        times!
      </p>
      <p className="mt-6 font-light text-md text-gray-800">
        Running this over multiple FRQs and multiple student responses
        immediately gives you a baseline for further assigning or adjusting
        scores based on the AI’s feedback. AI often works best as an
        augmentation of your day-to-day tasks. This is a great example of how a
        first-run assignment of scores and feedback could ultimately speed up
        your grading and feedback administration process. If you have a large
        class (100+ students) and it is infeasible to give direct feedback on
        each FRQ, consider directly providing this AI-generated feedback to
        students.
      </p>
      <p className="mt-6 font-light text-md text-gray-800">
        Re-framing this use case, an educator could leverage this type of prompt
        to evaluate their own written FRQ feedback to see whether it is
        consistent with the question and its grading criteria.
      </p>
      <CompletionExample
        title={`5: asdf`}
        prompt={`frq, answer, teacher’s manual annotation`}
        completion={`this is a good annotation`}
        comment={``}
        show={false}
      />
      <p className="mt-6 font-light text-md text-gray-800">
        This evaluation holds you accountable to high-quality grading that is
        consistent with the rubric. If the LLM identifies an inconsistency with
        a score and feedback, consider running this meta feedback tool through
        other FRQ annotations you’ve made for an assignment. If the LLM
        confidently critiques your feedback across FRQ responses, it’s possible
        something about your question, rubric, or feedback is unclear.
      </p>
      <p className="mt-6 font-light text-md text-gray-800">
        Calling back to our Grading MCQs section, another way to leverage LLMs
        is to anticipate student questions about their grade by producing
        pre-generated responses that explain their grade. Although individual
        feedback is always preferred, it may not always be possible. Therefore,
        if you’d like to help assist students in “debugging” their score before
        going to their instructor, you could generate responses for certain
        score combinations of the rubric.
      </p>
      <CompletionExample
        title={`6: asdf`}
        prompt={`asdf`}
        completion={`adsf`}
        comment={``}
        show={false}
      />
      <p className="mt-9 font-normal text-xl scroll-mt-24" id="conclusion">
        Concluding thoughts
      </p>
      <p className="mt-3 font-light text-md text-gray-800">
        Grading and providing feedback to students is an exciting use of LLMs.
        Each of these examples give you different ways of augmenting your FRQ
        grading to improve efficiency.
      </p>
      <p className="mt-6 font-light text-md text-gray-800">
        It’s exciting to think of different scenarios for this type of
        AI-powered grading. Imagine if students having received a manually
        calculated score and feedback could follow-up to understand why they
        received this result. The same capabilities of LLMs to assist you in
        grading could be equally valuable to a student looking to understand why
        they received a particular grade.
      </p>
      <p className="mt-6 font-light text-md text-gray-800">
        Perhaps, students could access a portal specifically designed to let
        them “chat with their grades” as a tool for getting students unstuck
        when outside the classroom.
      </p> */}
      <FooterNav
        pageBefore={"Grading MCQs"}
        pageAfter={"Instruction Feedback"}
      />
    </div>
  );
}
