Individual results
View docsView in-depth performance of a single language model on a single test suite.
Region-by-region surprisal
Sample item for Filler-Gap Dependencies (object extraction)
The first item of the test suite is shown below for quick reference. Please visit the page for Filler-Gap Dependencies (object extraction) to see the full list of items.
Item |
Condition
| prefix | comp | np1 | verb | np2 | prep | np3 | end |
---|---|---|---|---|---|---|---|---|---|
Item | Condition | prefix | comp | np1 | verb | np2 | prep | np3 | end |
1 | what_nogap | I know | what | our uncle | grabbed | the food | in front of | the guests | at the holiday party |
1 | that_nogap | I know | that | our uncle | grabbed | the food | in front of | the guests | at the holiday party |
1 | what_gap | I know | what | our uncle | grabbed | in front of | the guests | at the holiday party | |
1 | that_gap | I know | that | our uncle | grabbed | in front of | the guests | at the holiday party |
Showing 1 to 4 of 4 entries
Prediction performance for Ordered Neurons on Filler-Gap Dependencies (object extraction)
Accuracy |
Formula
| Description |
---|---|---|
Accuracy | Prediction | Description |
95.83% | what_gap. prep < that_gap. prep |
We expect the “prep” region to be lower in the what_gap condition than in the that_gap condition, because gaps must be licensed by upstream wh words (such as “what”). |
100.00% | what_nogap. np2 > that_nogap. np2 |
We expect the NP2 to be less surprising in the that_no-gap condition than in the what_no-gap condition, because an upstream wh-word should set up an expectation for a gap. |
Showing 1 to 2 of 2 entries