import React from "react";
import { Breadcrumb } from "../../Components/ContentPage/ContentRendering/Breadcrumb";
import FooterNav from "../../Components/ContentPage/Menus/FooterNav";
import CompletionExample from "../../Components/ContentPage/ContentRendering/CompletionExample";

export default function GradingMCQs({}) {
  return (
    <div>
      <Breadcrumb page={"Use Cases"} subPage={"Grading MCQS"} />
      <h1 className="font-semibold text-3xl mt-2">Grading MCQs</h1>
      <p className="mt-3 font-normal text-xl scroll-mt-24" id="problems">
        The current problem(s) with grading MCQs
      </p>
      <p className="mt-3 font-light text-md text-gray-800">
        In the{" "}
        <a
          href="https://www.k12promptguide.com/usecases/mcqs"
          target="_blank"
          className="hover:cursor-pointer hover:underline text-blue-700"
          rel="noreferrer"
        >
          {"MCQs"}
        </a>{" "}
        article, we discussed the difficulty of writing high-quality,
        strategy-proof multiple-choice questions (MCQs) to assess your students.
        Between mapping lesson content to questions, writing strategy resistant
        answer options, and evaluating the clarity of questions, LLMs can give
        educators practical support for developing high-quality MCQs for
        assessments. And, once a teacher has practiced enough, strategic support
        to keep your MCQs reliably high-quality and consistent with assessment
        practices.
      </p>
      <p className="mt-6 font-light text-md text-gray-800">
        Compared to writing MCQs, grading them seems straightforward enough.
        With an answer key and some caffeine, grading the questions you wrote
        simply becomes a type of busy work that is all-too-common for teachers.
        Even if you cleverly use an LLM to, say, help establish grading criteria
        or simply offer moral support (‘ChatGPT, encourage me to grade these
        quizzes i’ve been procrastinating on’), grading seems like an
        unalterable fact of file.
      </p>
      <p className="mt-9 font-normal text-xl scroll-mt-24" id="llmshelp">
        How LLMs can help you grade MCQs
      </p>
      <p className="mt-3 font-light text-md text-gray-800">
        As it turns out, LLMs can improve the MCQ grading process across
        multiple dimensions. Manually grading answers (particularly for large
        introductory courses), generating content specific explanations for
        correct and incorrect answer options, and summarizing and providing
        feedback on a student’s overall performance on the quiz, are just some
        ways to use an LLM. If you were to sit down in front of a student, grade
        their MCQs, and give them feedback, what type of information would you
        give them? This helps you imagine the supporting roles an LLM can play.
      </p>
      <p className="mt-6 font-light text-md text-gray-800">
        Before diving into examples, the obvious limitation here is whether or
        not your assessment was digitally administered and/or collected. An LLM
        cannot pick up a red pen, mark up a page, and flip to the next quiz
        (yet). Yes, you could scan and upload your quizzes to Google Drive and
        use the object-character recognition (OCR) feature to get text to feed
        into an LLM. But these use cases are most time effective for assessments
        that already exist in a digital format. Whether by exporting quizzes
        from a LMS or simply highlighting a student’s text submission.
      </p>
      <p className="mt-9 font-normal text-xl scroll-mt-24" id="experiment">
        Experiment with prompts for grading MCQs
      </p>
      <p className="mt-3 font-light text-md text-gray-800">
        Many digital quiz programs leverage auto-grading features that
        automatically calculate student scores. If you happen to be accepting
        MCQ responses in an unconventional format (e.g. an exit ticket on Google
        Forms), or another scenario makes the auto-grader unavailable, LLMs can
        quickly generate the score for an assessment. Simply provide the
        questions, student answers, and answer key to receive a scored
        assignment.
      </p>
      <CompletionExample
        title={`1: Grading a student's bellringer`}
        prompt={`### QUESTIONS REMOVED FOR SPACE ###\n\nStudent answers:
1) A
2) C 
3) D

Given the student's answers, correct their bellringer answers and give feedback on why they got each question wrong.`}
        completion={`Question 1
Student Answer: A) To increase the stock market speculation
Correct Answer: B) To separate commercial banking from investment banking
Feedback: The primary purpose of the Banking Act of 1933, known as the Glass-Steagall Act, was not to increase stock market speculation, but rather to separate commercial banking activities from investment banking activities. This separation was implemented to reduce the risk of financial speculation leading to bank failures, which had been a contributing factor to the Great Depression. By keeping investment and commercial banking separate, the law aimed to protect the average depositor and ensure a more stable banking system.

Question 2
Student Answer: C) To require more transparency in financial statements and reduce fraud
Correct Answer: C) To require more transparency in financial statements and reduce fraud
Feedback: You answered this question correctly. The main goal of the Securities Act of 1933 was indeed to require more transparency in financial statements and reduce fraud in the securities industry. This legislation required companies issuing securities to provide potential investors with financial and other significant information, aiming to eliminate deceit and misrepresentations in the sale of securities. It was a key measure to restore investor confidence following the stock market crash of 1929.

Question 3
Student Answer: D) The Social Security Act
Correct Answer: C) The Banking Act of 1933
Feedback: The Federal Deposit Insurance Corporation (FDIC) was created as a result of the Banking Act of 1933, not the Social Security Act. The purpose of the FDIC was to provide deposit insurance to depositors in US banks, thereby helping to restore trust in the American banking system following the widespread bank failures of the early 1930s. The Social Security Act, on the other hand, was designed to provide a safety net for the elderly, unemployed, and disadvantaged, through a different set of programs.`}
        comment={`In the instructions, you could specify different scoring weights and logic
        for producing, in addition to right/wrong annotations, a final score.`}
        show={false}
        exampleUrl={
          "https://chat.openai.com/share/9be92ec4-e0f5-4950-b78d-c11338a7759f"
        }
      />
      <p className="mt-6 font-light text-md text-gray-800">
        This is a prompt worth iterating on before using in the classroom.
        Thinking about skills learned in the{" "}
        <a
          href="https://www.k12promptguide.com/strategies"
          target="_blank"
          className="hover:cursor-pointer hover:underline text-blue-700"
          rel="noreferrer"
        >
          {"Strategies"}
        </a>{" "}
        section, you could potentially reduce factual hallucination during the
        grading by putting snippets of course material (related to the MCQs) at
        the top of the prompt. You could also emphasize in the instructions that
        the answer key should be regarded as the ‘ground truth’ and the LLM
        should not use outside information to grade the student’s quiz.
      </p>

      {/* <CompletionExample
        title={`2: asdf`}
        prompt={`same as above with context at top, additional instruction at bottom, and ‘…’ placed in middle to save space`}
        completion={`calculated score and graded questions`}
        comment={`If you experience hallucination with this additional prompting, think
        about other techniques to ‘guide’ the prompt towards grading accurately.
        If an LLM continues to make the same mistake, it means it is opinionated
        on the matter despite your instructions. Is it possible one of your
        question/answer pairs is incorrect?`}
        show={false}
        exampleUrl={
          "https://chat.openai.com/share/1e26836c-7580-4a6f-a38d-355991d5b935"
        }
      /> */}
      <p className="mt-6 font-light text-md text-gray-800">
        Now that we are getting accurate grading for a single student, let’s try
        putting all students’ assessments into the prompt. But first, a quick
        aside. As you add more information to a prompt, you may run up against
        the LLM’s context window. Essentially, all LLMs have a limited number of
        words they can “focus' ' on understanding at a given time. The large
        mainstream LLMs have context windows that allow you to fit hundreds of
        pages worth of text into your prompt (100k+ words). Practically
        speaking, it is unlikely you will hit this limit.
      </p>
      <p className="mt-6 font-light text-md text-gray-800">
        However, as you add more text to a prompt, the model’s ability to
        “understand” the entire prompt slowly degrades. If you asked someone to
        answer basic questions about a 3-page document versus a 50-page
        document, it is a similar decrease in accuracy you may observe. With
        this in mind, if you encounter grading errors when including all student
        assessments, you may need to split your prompt into multiple runs.
      </p>

      <CompletionExample
        title={`2: Grading 10 bellringers`}
        prompt={`### QUESTIONS REMOVED FOR SPACE ###\n\nStudent answers:
1. D, C, C
2. C, A, A
3. A, D, B
4. A, D, A
5. C, B, A
6. C, C, D
7. D, B, B
8. B, B, A
9. A, A, B
10. C, D, C

10 students submitted their answers for the bellringer. For each 'wrong' option for each question, generate a 1-2 sentence explanation for why it is incorrect. Then, grade each quiz and assign it a score.`}
        completion={`Now, let's grade each student's quiz based on their answers:

D, C, C: 2/3 (Incorrect: Question 1)
C, A, A: 1/3 (Correct: Question 1; Incorrect: Questions 2 & 3)
A, D, B: 0/3 (All incorrect)
A, D, A: 0/3 (All incorrect)
C, B, A: 1/3 (Correct: Question 1; Incorrect: Questions 2 & 3)
C, C, D: 2/3 (Correct: Questions 1 & 2; Incorrect: Question 3)
D, B, B: 0/3 (All incorrect)
B, B, A: 0/3 (All incorrect)
A, A, B: 0/3 (All incorrect)
C, D, C: 3/3 (All correct)`}
        comment={`This is a long one!`}
        show={false}
        exampleUrl={
          "https://chat.openai.com/share/8ad7c04b-1b48-4ddd-b242-fcc65cf91793"
        }
      />
      <p className="mt-6 font-light text-md text-gray-800">
        If you are an educator or professor, teaching a large course (100+
        students), discouraged from giving MCQ assessments because of the sheer
        work being created, these types of prompts could potentially free up
        aide or TA time to focus on more important tasks. Better yet, give them
        your prompt to speed up their own grading.
      </p>
      <p className="mt-6 font-light text-md text-gray-800">
        As noted above, there is a meta application for delegating MCQ grading
        to an LLM, even if you have an auto-grader/army of TAs at your disposal.
        At their core, LLMs exist to model and understand human language and
        knowledge. If an LLM tasked with grading 10 MCQs is consistently
        confused despite best efforts to improve the prompt, it’s possible that
        there is some language or knowledge shortcoming in one or more of your
        questions. Yes, there exists an objective answer key to evaluate
        responses, but there is additional context you could receive from
        running student questions and answers through an LLM.
      </p>
      <p className="mt-6 font-light text-md text-gray-800">
        Another application of LLMs grading MCQs is generating explanations for
        correct and incorrect answer options. A common grievance students have
        about MCQs is the implicit lack of information provided by a letter
        answer or X. This has the potential to spawn a wave of repetitive “why”
        questions from students in the absence of answer response information.
        Fortunately, LLMs can help produce this answer-specific feedback to
        clarify student misunderstanding earlier and often.
      </p>
      <CompletionExample
        title={`3: Explanations for each wrong answer`}
        prompt={`### QUESTIONS REMOVED FOR SPACE ###

Student answers:
1. D, C, C
2. C, A, A
3. A, D, B
4. A, D, A
5. C, B, A
6. C, C, D
7. D, B, B
8. B, B, A
9. A, A, B
10. C, D, C

10 students submitted their answers for the bellringer. For each 'wrong' option for each question, generate a 1-2 sentence explanation for why it is incorrect. Then, grade each quiz and assign it a score.`}
        completion={`Question 1: The primary purpose of the Banking Act of 1933 (Glass-Steagall Act)
A) To increase the stock market speculation:
Incorrect because the act aimed to reduce stock market speculation by separating commercial and investment banking to promote financial stability, not to increase it.

C) To create the Federal Reserve System:
Incorrect because the Federal Reserve System was created in 1913, not by the Glass-Steagall Act of 1933. This act was focused on banking reform.

D) To deregulate the banking industry:
Incorrect as the act actually introduced more regulation into the banking industry to prevent the kinds of bank failures that occurred during the Great Depression.

Question 2: The main goal of the Securities Act of 1933
A) To privatize the stock exchanges:
Incorrect because the act did not focus on privatization but rather on regulation and transparency of securities markets.

B) To deregulate securities trading:
Incorrect as the purpose was to increase regulation and transparency in the securities industry to protect investors.

D) To eliminate the Securities and Exchange Commission (SEC):
Incorrect because the SEC was established in 1934 to enforce the Securities Act of 1933 and other securities laws, not eliminated.

Question 3: Creation of the Federal Deposit Insurance Corporation (FDIC)
A) The National Industrial Recovery Act:
Incorrect as this act was aimed at industrial recovery and had no direct relation to the creation of the FDIC.

B) The Agricultural Adjustment Act:
Incorrect because this act was focused on agricultural recovery and did not establish the FDIC.

D) The Social Security Act:
Incorrect as this act established Social Security and had no direct role in creating the FDIC.`}
        comment={``}
        show={false}
        exampleUrl={
          "https://chat.openai.com/share/8ad7c04b-1b48-4ddd-b242-fcc65cf91793"
        }
      />
      <p className="mt-6 font-light text-md text-gray-800">
        Above, the LLM uses the context of the material and the MCQ and answer
        pair to write compelling explanations for why the incorrect options are
        incorrect. You could direct the model to write explanations in the
        context of the topic material (based on source 3 this is incorrect),
        juxtaposed against the correct answer (A says Y but B says Z), or by
        building on key ideas from the lesson and how the responses are relevant
        in a larger context, among other approaches.
      </p>
      <p className="mt-6 font-light text-md text-gray-800">
        In the same way, you can generate a rich explanation for the correct
        answer.
      </p>
      <CompletionExample
        title={`4: Correct answer explanations`}
        prompt={`Question 1: What was the primary purpose of the Banking Act of 1933, commonly known as the Glass-Steagall Act?

A) To increase the stock market speculation
B) To separate commercial banking from investment banking
C) To create the Federal Reserve System
D) To deregulate the banking industry
Correct Answer: B) To separate commercial banking from investment banking

Question 2: The Securities Act of 1933 was a key piece of legislation during the FDR administration. What was its main goal?

A) To privatize the stock exchanges
B) To deregulate securities trading
C) To require more transparency in financial statements and reduce fraud
D) To eliminate the Securities and Exchange Commission
Correct Answer: C) To require more transparency in financial statements and reduce fraud

Question 3: The Federal Deposit Insurance Corporation (FDIC) was created as a result of which New Deal reform?

A) The National Industrial Recovery Act
B) The Agricultural Adjustment Act
C) The Banking Act of 1933
D) The Social Security Act
Correct Answer: C) The Banking Act of 1933

For each question, write rich explanations for each correct answer. Juxtapose against the incorrect answers for each one.`}
        completion={`Question 1: The Glass-Steagall Act (Banking Act of 1933)
Correct Answer: B) To separate commercial banking from investment banking

Explanation: The primary purpose of the Banking Act of 1933, commonly known as the Glass-Steagall Act, was to restore public confidence in the banking system in the wake of the Great Depression. By separating commercial banking from investment banking, the legislation aimed to prevent banks from taking on too much risk with depositors' money. Commercial banks, which handle deposits and offer loans, were barred from engaging in the investment business, which involves the buying and selling of securities. This separation was intended to reduce the potential for conflicts of interest and protect the savings of the American public.

Question 2: The Securities Act of 1933
Correct Answer: C) To require more transparency in financial statements and reduce fraud

Explanation: The Securities Act of 1933 was enacted as part of the New Deal legislation under President Franklin D. Roosevelt's administration, with the main goal of increasing transparency in the financial statements of corporations and reducing fraud in the securities market. It required companies offering securities to the public to make full and accurate disclosure of relevant financial information. This Act aimed to protect investors by ensuring they had access to reliable information before investing in stocks and bonds, thus helping to restore investor confidence after the stock market crash of 1929.


Question 3: Creation of the Federal Deposit Insurance Corporation (FDIC)
Correct Answer: C) The Banking Act of 1933

Explanation: The Federal Deposit Insurance Corporation (FDIC) was established by the Banking Act of 1933, another key component of the New Deal reforms aimed at stabilizing the banking system during the Great Depression. The FDIC provided deposit insurance to depositors in U.S. commercial banks and savings institutions, guaranteeing the safety of deposits up to a certain limit. This was intended to prevent bank runs, where fear of a bank's insolvency leads to mass withdrawals, potentially causing the bank to collapse. By insuring deposits, the FDIC helped restore trust in the American banking system.`}
        comment={`Another long one!`}
        show={false}
        exampleUrl={
          "https://chat.openai.com/share/ba101765-d689-47fe-b415-a5bd0fe389cd"
        }
      />
      <p className="mt-6 font-light text-md text-gray-800">
        With this information prepared, students receiving their graded
        assignments back will immediately have greater context, specific to your
        lesson and material, about why their correct answers are correct and why
        their incorrect answers are incorrect. These option-level explanations
        could also be generated during the MCQ creation process, as the feedback
        is student-agnostic.
      </p>
      <p className="mt-6 font-light text-md text-gray-800">
        We could take this one step further and make the feedback specific to a
        student’s quiz. That is, given a student’s completed and grade quiz
        (whether via auto-grader, LLM, or manual), feed it into an LLM and help
        the student understand at a high-level where they are going wrong.
        Taking in the context of the entire assessment, an LLM can be instructed
        to provide insight into what topic areas and concepts the student failed
        to understand (or positively, the student’s strengths), based on their
        assessment performance.
      </p>
      {/* <CompletionExample
        title={`6: asdf`}
        prompt={`entire quiz, answers, student responses + instructions`}
        completion={`2-3 sentence summary (sourced?) on what the student understood, didn’t understand, and action items`}
        comment={``}
        show={false}
        exampleUrl={
          "https://chat.openai.com/c/7f87416f-d4af-4b9b-9780-4796bd3759cd"
        }
      /> */}
      <p className="mt-6 font-light text-md text-gray-800">
        There is both a pedagogical benefit for students to integrate ideas from
        the lesson and an actionable benefit for teachers to understand what
        topics areas students are getting right versus what they are
        misunderstanding. If you are feeling particularly AI-enabled, you can
        take each student’s personalized feedback and have an LLM summarize it
        into a lengthier aggregate view of how students performed on the
        assessment.
      </p>
      <p className="mt-9 font-normal text-xl scroll-mt-24" id="conclusion">
        Concluding thoughts
      </p>
      <p className="mt-6 font-light text-md text-gray-800">
        An admission from us at the K-12 Prompt Engineering Guide: although
        multiple of the examples could enrich your workflows for grading and
        providing feedback to students on MCQs, some examples are simply
        impractical for daily use in the classroom. Returning to our initial
        example, imagine: 1. manually entering student responses one at a time
        into ChatGPT, 2. waiting for the text to generate, 3. copying and
        pasting the feedback into a separate document/form for submitting the
        feedback to the student, and 4. repeating for every student. Even if the
        student responses could be copy and pasted in one highlight, you’d still
        have to break out each graded assignment back to each student (and what
        if the model got something wrong?) At this point, just ask them to write
        out their MCQ answers on a piece of paper!
      </p>
      <p className="mt-6 font-light text-md text-gray-800">
        This is worth pointing out, because there are a lot of ideas for how to
        use LLMs in the classroom, but they can sometimes be impractical in
        certain scenarios. An improved AI auto-grading application or website
        that automatically imports quiz responses and exports results could
        leverage this incredible capability to eliminate the above concerns. As
        far as I can tell, this doesn’t exist in popular LMSs or as a standalone
        application. This doesn’t mean that there isn’t merit in these use cases
        (or the other examples that were valuable on their own), but it may just
        not be practical for your classroom.
      </p>
      <FooterNav pageBefore={"FRQs"} pageAfter={"Grading FRQs"} />
    </div>
  );
}
