Individual results

View docs

View in-depth performance of a single language model on a single test suite.

Region-by-region surprisal
Sample item for Filler-Gap Dependencies (object extraction)
Item
Condition
prefixcompnp1verbnp2prepnp3end
ItemConditionprefixcompnp1verbnp2prepnp3end
1 what_nogap I know what our uncle grabbed the food in front of the guests at the holiday party
1 that_nogap I know that our uncle grabbed the food in front of the guests at the holiday party
1 what_gap I know what our uncle grabbed in front of the guests at the holiday party
1 that_gap I know that our uncle grabbed in front of the guests at the holiday party
Showing 1 to 4 of 4 entries
Prediction performance for Ordered Neurons on Filler-Gap Dependencies (object extraction)
Accuracy
Formula
Description
AccuracyPredictionDescription
95.83% what_gap.prep < that_gap.prep We expect the “prep” region to be lower in the what_gap condition than in the that_gap condition, because gaps must be licensed by upstream wh words (such as “what”).
100.00% what_nogap.np2 > that_nogap.np2 We expect the NP2 to be less surprising in the that_no-gap condition than in the what_no-gap condition, because an upstream wh-word should set up an expectation for a gap.
Showing 1 to 2 of 2 entries