Individual results

View docs

View in-depth performance of a single language model on a single test suite.

Region-by-region surprisal
Sample item for Center Embedding
Item
Condition
intro np_1 that det_2 np_2 verb1 verb2
Item Condition intro np_1 that det_2 np_2 verb1 verb2
1 plaus The painting that the artist painted deteriorated
1 implaus The painting that the artist deteriorated painted
Prediction performance for TinyLSTM on Center Embedding
Accuracy
Formula
Description
AccuracyPredictionDescription
82.14% ( (634,plaus/6,verb1) + (634,plaus/7,verb2) ) < ( (633,implaus/6,verb1) + (633,implaus/7,verb2) ) The sum of Verb1+Verb2 should be lower in the match condition than in the mismatch condition, where the verb plausibility matches a first-in-first-out ordering.