Results on 11 reasoning models with 16k token budgets. "Acc" denotes accuracy, "Tok" denotes token count, and "CR" denotes compression rate. Experimental results presented in bar charts.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results