This is a beta release of SyntaxGym. Please send questions and comments to contact@syntaxgym.org.

Individual results

View in-depth performance of a single language model on a single test suite.

Region-by-region surprisal

Tip: hover over bars to see surprisal values. Surprisals are averaged over items. Error bars show 95% confidence intervals.
More info: view full details for model RNNG or test suite Center Embedding.

Sample item for Center Embedding

The first item of the test suite is shown below for quick reference. Please visit the page for Center Embedding to see the full list of items.

Item	Condition	intro	np_1	that	det_2	np_2	verb1	verb2
Item	Condition	intro	np_1	that	det_2	np_2	verb1	verb2
1	plaus	The	painting	that	the	artist	painted	deteriorated
1	implaus	The	painting	that	the	artist	deteriorated	painted

More info: view full details for test suite Center Embedding.

Prediction performance for RNNG on Center Embedding

Accuracy	Formula	Description
Accuracy	Prediction	Description
78.57%	( (634,plaus/6,verb1) + (634,plaus/7,verb2) ) < ( (633,implaus/6,verb1) + (633,implaus/7,verb2) )	The sum of Verb1+Verb2 should be lower in the match condition than in the mismatch condition, where the verb plausibility matches a first-in-first-out ordering.

Tip: hover over region or condition names to highlight the corresponding entries in the Sample Item above. Predictions are evaluated on region-level surprisal values.
Need help? Take a look at the documentation.

← Back to visualization home