AI isn't very good at history, new paper finds
Researchers introduced Hist-LLM benchmark to assess GPT-4, Llama, and Gemini on historical questions using Seshat Global History Databank. Results showed limited accuracy, highlighting LLMs' shortcomings in nuanced historical inquiries due to reliance on prominent data and potential biases in training data.