Template Literal JavaScript

Does the model actually understand the question, or just the words in it?

A small benchmark of short natural-language prompts that look easy but expose weak reasoning — goal grounding, world-state tracking, social pragmatics, modified-riddle templates, literal precision ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Does the model actually understand the question, or just the words in it?

Trending now