Broken Arrows of English Exams | Toru Sasaki | TEDxRikkyoU
The speaker argues that current educational testing methods, like essay marking, are constrained by time and technology, necessitating AI-powered Automated Essay Scoring (AES) to bridge the gap between what students study and what they need for real-world communication. She details how current AES systems operate by calculating scores based on test similarity to sample data, but proposes future transparent AI models that reveal the scoring logic—turning the "black box" into "white."
## Speakers & Context
- **Speaker:** An AI student whose research field is Natural Language Processing (NLP).
- **Context:** Originally taught English at a high school in Tokyo, began the presentation by recounting the workload of manually marking 160 essays, which required 40 hours of work.
- **Audience observation:** Recognizes that in Asia, university entrance exams and job competency assessments (like the TOEIC) often fail to measure practical English ability.
## Theses & Positions
- *Exams change student's learning style*, compelling students to prioritize test formats over deep learning skills.
- The primary bottleneck in English education is the physical time required for human graders to mark large volumes of work (e.g., 40 hours for 160 essays).
- NLP technology, specifically AES systems, is the only viable technology capable of reducing marking time significantly (e.g., from 15 minutes to 15 seconds per essay).
- The goal is to *re-connect the broken arrows* between academic study and professional necessity by making AI marking systems transparent, turning the "black box into white."
## Concepts & Definitions
- **NLP (Natural Language Processing):** The speaker's research field, applied here to analyzing and scoring written English.
- **Automated Essay Scoring (AES) systems:** Technology used to score essays, initially implemented in famous exams like TOEFL or GRES tests in the US.
- **Black Box:** The opaque mechanism of current AES systems, where the score is given without showing the underlying calculation or reasoning ("Sorry, machine only knows.").
- **White Box:** The desired state for AES systems, where the scorer can explain *why* an essay received a specific score (e.g., "You were scored at 6 instead of 8 because...").
- **Broken Arrows:** A metaphor for the current disconnect where academic study (what is tested) does not match the required skill set for the job or real life.
## Mechanisms & Processes
- **Current Essay Marking Workload:** Marking 160 essays required 15 minutes per essay, totaling 2400 minutes or 40 hours of work.
- **AES Calculation Process:**
1. Collect sample essays and pre-fixed scores.
2. Convert sentences into numerical data.
3. Machine learning model calculates the functional formula relating numbers to scores.
4. Exam scoring uses this formula based on *test similarity*.
5. Score (e.g., 6/10) means the essay's data is *closest* to the sample data scored at 6.
- **Impact of Sample Data:** If the model is trained on conventional language, an essay using rare/unique expressions might score lower, even if the writing is brilliant, because the model cannot find a strong association.
- **Proposed Future Mechanism:** Implementing a transparent AI framework that allows for continuous daily feedback, student work analysis, and entrance exam scoring, making the scoring logic visible.
## Named Entities
- **TOEFL test:** Example of an existing standardized test that relies on multiple-choice formats, unsuitable for measuring complex expression.
- **TEIC test:** Example of a common English competency test in Asia, which the speaker suggests cannot measure work-related ability.
## Tools, Tech & Products
- **Google Translation:** Example of common NLP technology used daily.
- **Smart assistants (e.g., "Hey Siri!"):** Example of common NLP technology used daily.
- **AES systems:** The core technology being discussed for educational scoring.
- **AI Framework:** The proposed future system for educational purposes that must be accessible to everyone in Japan.
## Numbers & Data
- Age starting teaching: **27**.
- Year first class: **April 2001**.
- Number of classes taught: **4**.
- Students per class: **40**, totaling **160 essays**.
- Time required to mark 160 essays: **2400 minutes** or **40 hours**.
- Time estimate for modern AES: **15 seconds** per essay (compared to 15 minutes).
- Example score structure: **6 on a 10 point scale**.
- Cost of intensive private tutoring: **almost $5,000** for a three-month course.
## Examples & Cases
- **Teacher's initial experiment:** Teaching English at a Tokyo high school, promising no translations in exams for three years.
- **The Marking Overload:** The physical act of marking 160 essays leading to exhaustion and nightmares.
- **The TOEIC Problem:** The TOEIC test, being multiple-choice based, cannot measure the ability to "explain your idea or persuade others" required in a job.
- **The High-Cost Solution:** A friend attending an after-work English school paying almost $5,000 for three months of detailed feedback, showing the current economic barrier to high-quality practice.
## Counterarguments & Caveats
- **AES Limitation:** Current AES systems do not "read or understand essays"; they only perform calculation based on numerical data.
- **Ethical/Validity Concern:** With the black box mechanism, it "cannot prove that all essays are marked with equal validity and reliability."
## Conclusions & Recommendations
- The ultimate goal is to achieve *re-connection*: ensuring what is studied aligns with what is practically needed, mediated by transparent technology.
- The critical step is reforming the *exams* first, as changing exams forces a shift in learning habits, which in turn reshapes the overall idea of English learning.
- The belief expressed: "we together, can make a change in classrooms with the help of AI."
## Implications & Consequences
- **Educational Paradigm Shift:** Moving the educational focus from summative, time-intensive testing to continuous, AI-supported feedback loops.
- **Equality in Assessment:** The proposed system aims to make sophisticated feedback (currently $5,000) accessible to all students in Japan.
- **Shaping Mindset:** Changing the assessment mechanism is predicted to change the collective "mindset towards English learning."
## Verbatim Moments
- *"I uh I had this whole talk planned but I think now I'm just going to repeat fos because I thought that was pretty cool."* (From Example 1, not relevant here, but note style.)
- *"What if I say I won’t ask yo any translation?"*
- *"This is the true language teaching."*
- *"Mark me tonight! Mark me tonight!"*
- *"Oh, you did it the modern way, I went more traditional."*
- *"No technology can make a day 48 hours."*
- *"There's space for two-thirds of a person."* (Not relevant, but note inclusion of other powerful quotes if they existed)
- *"The arrow is still broken in the middle."*
- *"We should change the black box into white."*
- *"What you study and what you need to be able to do are not properly connected."*
- *"How can we re-connect the broken arrows of Japan?"*
- *"This may be a small butterfly flipping in the beginning, I believe it will be a powerful tool to guide the education culture of Japan to the place where its supposed to be."*