This is a beta release of SyntaxGym. Please send questions and comments to contact@syntaxgym.org.

Individual results

View in-depth performance of a single language model on a single test suite.

Region-by-region surprisal

Tip: hover over bars to see surprisal values. Surprisals are averaged over items. Error bars show 95% confidence intervals.
More info: view full details for model TinyLSTM or test suite Center Embedding (with modifier).

Sample item for Center Embedding (with modifier)

The first item of the test suite is shown below for quick reference. Please visit the page for Center Embedding (with modifier) to see the full list of items.

Item	Condition	intro	np_1	that	det_2	np_2	modifier	verb1	verb2
Item	Condition	intro	np_1	that	det_2	np_2	modifier	verb1	verb2
1	plaus	The	painting	that	the	artist	who lived long ago	painted	deteriorated
1	implaus	The	painting	that	the	artist	who lived long ago	deteriorated	painted

More info: view full details for test suite Center Embedding (with modifier).

Prediction performance for TinyLSTM on Center Embedding (with modifier)

Accuracy	Formula	Description
Accuracy	Prediction	Description
57.14%	( (554,plaus/7,verb1) + (554,plaus/8,verb2) ) < ( (553,implaus/7,verb1) + (553,implaus/8,verb2) )	The sum of Verb1+Verb2 should be lower in the match condition than in the mismatch condition, where the verb plausibility matches a first-in-first-out ordering. In this case we add a modifier to separate the NPs and their corresponding VPs.

Tip: hover over region or condition names to highlight the corresponding entries in the Sample Item above. Predictions are evaluated on region-level surprisal values.
Need help? Take a look at the documentation.

← Back to visualization home