Individual results

View docs

View in-depth performance of a single language model on a single test suite.

Region-by-region surprisal
Sample item for Center Embedding (with modifier)
Item
Condition
intro np_1 that det_2 np_2 modifier verb1 verb2
Item Condition intro np_1 that det_2 np_2 modifier verb1 verb2
1 plaus The painting that the artist who lived long ago painted deteriorated
1 implaus The painting that the artist who lived long ago deteriorated painted
Prediction performance for GPT-2 on Center Embedding (with modifier)
Accuracy
Formula
Description
AccuracyPredictionDescription
85.71% ( (554,plaus/7,verb1) + (554,plaus/8,verb2) ) < ( (553,implaus/7,verb1) + (553,implaus/8,verb2) ) The sum of Verb1+Verb2 should be lower in the match condition than in the mismatch condition, where the verb plausibility matches a first-in-first-out ordering. In this case we add a modifier to separate the NPs and their corresponding VPs.