We proposed AcademicEval, a live benchmark for evaluating LLMs over long-context generation tasks. AcademicEval adopts papers on arXiv to introduce several acadeic writing tasks with long-context ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results