Individual results
View docsView in-depth performance of a single language model on a single test suite.
Region-by-region surprisal
Sample item for Filler-Gap Dependencies (hierarchy)
The first item of the test suite is shown below for quick reference. Please visit the page for Filler-Gap Dependencies (hierarchy) to see the full list of items.
Item |
Condition
|
prefix | subj_subj | subj_wh | subj_embed | subject_gap | filler | matrix_verb | matrix_gap | continuation |
---|---|---|---|---|---|---|---|---|---|---|
Item | Condition | prefix | subj_subj | subj_wh | subj_embed | subject_gap | filler | matrix_verb | matrix_gap | continuation |
1 | what_nogap | The fact that | my brother said | who | his friend trusted | our uncle | at the party | surprised | my daughter | yesterday afternoon |
1 | that_nogap | The fact that | my brother said | that | his friend trusted | our uncle | at the party | surprised | my daughter | yesterday afternoon |
1 | what_subjgap | The fact that | my brother said | who | his friend trusted | at the party | surprised | my daughter | yesterday afternoon | |
1 | that_subjgap | The fact that | my brother said | that | his friend trusted | at the party | surprised | my daughter | yesterday afternoon | |
1 | what_matrixgap | The fact that | my brother said | who | his friend trusted | our uncle | at the party | surprised | yesterday afternoon | |
1 | that_matrixgap | The fact that | my brother said | that | his friend trusted | our uncle | at the party | surprised | yesterday afternoon |
Prediction performance for GPT-2 on Filler-Gap Dependencies (hierarchy)
Accuracy |
Formula
|
Description |
---|---|---|
Accuracy | Prediction | Description |
70.83% | (573,what_nogap/6,filler) > (572,that_nogap/6,filler) | We expect the “filler” to be less surprising in the that_no-gap condition than in the what_no-gap condition, because an upstream wh-word should set up an expectation for a gap. |
91.67% | (575,what_subjgap/6,filler) < (576,that_subjgap/6,filler) | We expect the “filler” region to be lower surprisal in the what_gap condition than in the that_gap condition, because gaps must be licensed by upstream wh words (such as “what”). |