Search
Items tagged with: aied
"AI petting zoo" resources for instructors: liascript.github.io/course/?ra…
#EdTech #AIEd #EdDev
#EdTech #AIEd #EdDev
Lia
LiaScript is a service for running free and interactive online courses, build with its own Markup-language. So check out the following course ;-)liascript.github.io
* The Aftermath of DrawEduMath: Vision Language Models Underperform with Struggling Students and Misdiagnose Errors
arxiv.org/abs/2603.00925
* Benchmarking the Pedagogical Knowledge of Large Language Models
arxiv.org/abs/2506.18710v1
fab-ai.org/initiatives/ai-for-…
* AI‑generated lesson plans fall short on inspiring students and promoting critical thinking
theconversation.com/ai-generat…
#AIEd #mathed #teaching #education
arxiv.org/abs/2603.00925
* Benchmarking the Pedagogical Knowledge of Large Language Models
arxiv.org/abs/2506.18710v1
fab-ai.org/initiatives/ai-for-…
* AI‑generated lesson plans fall short on inspiring students and promoting critical thinking
theconversation.com/ai-generat…
#AIEd #mathed #teaching #education
Benchmarking the Pedagogical Knowledge of Large Language Models
Benchmarks like Massive Multitask Language Understanding (MMLU) have played a pivotal role in evaluating AI's knowledge and abilities across diverse domains.arXiv.org
Effective #teaching is a difficult and counter-intuitive task, and it's not something you can master from the Internet. So it's not surprising that AI is pretty bad at it & bad at evaluating it - even negatively correlated with student learning. Another way of saying this is AI has poor pedagogical content knowledge:
* Knowledge without Wisdom: Measuring Misalignment between LLMs and Intended Impact arxiv.org/abs/2603.00883
Podcast summary: drive.google.com/file/d/1n09DU…
More examples:
#AIEd
* Knowledge without Wisdom: Measuring Misalignment between LLMs and Intended Impact arxiv.org/abs/2603.00883
Podcast summary: drive.google.com/file/d/1n09DU…
More examples:
#AIEd
Knowledge without Wisdom: Measuring Misalignment between LLMs and Intended Impact
LLMs increasingly excel on AI benchmarks, but doing so does not guarantee validity for downstream tasks. This study evaluates the performance of leading foundation models (FMs, i.e.arXiv.org
