Friendica Social Network

Effective #teaching is a difficult and counter-intuitive task, and it's not something you can master from the Internet. So it's not surprising that AI is pretty bad at it & bad at evaluating it - even negatively correlated with student learning. Another way of saying this is AI has poor pedagogical content knowledge:
* Knowledge without Wisdom: Measuring Misalignment between LLMs and Intended Impact arxiv.org/abs/2603.00883
Podcast summary: drive.google.com/file/d/1n09DU…
More examples:
#AIEd

Knowledge without Wisdom: Measuring Misalignment between LLMs and Intended Impact

LLMs increasingly excel on AI benchmarks, but doing so does not guarantee validity for downstream tasks. This study evaluates the performance of leading foundation models (FMs, i.e.

^arXiv.org

in reply to Doug Holton

Doug Holton

in reply to Doug Holton • 3 weeks ago • •

* The Aftermath of DrawEduMath: Vision Language Models Underperform with Struggling Students and Misdiagnose Errors
arxiv.org/abs/2603.00925
* Benchmarking the Pedagogical Knowledge of Large Language Models
arxiv.org/abs/2506.18710v1
fab-ai.org/initiatives/ai-for-…
* AI‑generated lesson plans fall short on inspiring students and promoting critical thinking
theconversation.com/ai-generat…
#AIEd #mathed #teaching #education

Benchmarking the Pedagogical Knowledge of Large Language Models

Benchmarks like Massive Multitask Language Understanding (MMLU) have played a pivotal role in evaluating AI's knowledge and abilities across diverse domains.

^arXiv.org

#education #teaching #aied #mathed

This entry was edited (3 weeks ago)

⇧

Doug Holton

Doug Holton 3 weeks ago • •

Knowledge without Wisdom: Measuring Misalignment between LLMs and Intended Impact

Doug Holton

Benchmarking the Pedagogical Knowledge of Large Language Models

Doug Holton
3 weeks ago • •