AI isn't very good at history, new paper finds

Researchers introduced Hist-LLM benchmark to assess GPT-4, Llama, and Gemini on historical questions using Seshat Global History Databank. Results showed limited accuracy, highlighting LLMs' shortcomings in nuanced historical inquiries due to reliance on prominent data and potential biases in training data.

Did you find this insightful?

Bad

Just Okay

Amazing

Stocks News

AI isn't very good at history, new paper finds

Select an alert type

Setup alert

Premium Content