This is a beta release of SyntaxGym. Please send questions and comments to contact@syntaxgym.org.

Individual results

View in-depth performance of a single language model on a single test suite.

Region-by-region surprisal

Tip: hover over bars to see surprisal values. Surprisals are averaged over items. Error bars show 95% confidence intervals.
More info: view full details for model Ordered Neurons or test suite Filler-Gap Dependencies (hierarchy).

Sample item for Filler-Gap Dependencies (hierarchy)

The first item of the test suite is shown below for quick reference. Please visit the page for Filler-Gap Dependencies (hierarchy) to see the full list of items.

Item	Condition	prefix	subj_subj	subj_wh	subj_embed	subject_gap	filler	matrix_verb	matrix_gap	continuation
Item	Condition	prefix	subj_subj	subj_wh	subj_embed	subject_gap	filler	matrix_verb	matrix_gap	continuation
1	what_nogap	The fact that	my brother said	who	his friend trusted	our uncle	at the party	surprised	my daughter	yesterday afternoon
1	that_nogap	The fact that	my brother said	that	his friend trusted	our uncle	at the party	surprised	my daughter	yesterday afternoon
1	what_subjgap	The fact that	my brother said	who	his friend trusted		at the party	surprised	my daughter	yesterday afternoon
1	that_subjgap	The fact that	my brother said	that	his friend trusted		at the party	surprised	my daughter	yesterday afternoon
1	what_matrixgap	The fact that	my brother said	who	his friend trusted	our uncle	at the party	surprised		yesterday afternoon
1	that_matrixgap	The fact that	my brother said	that	his friend trusted	our uncle	at the party	surprised		yesterday afternoon

More info: view full details for test suite Filler-Gap Dependencies (hierarchy).

Prediction performance for Ordered Neurons on Filler-Gap Dependencies (hierarchy)

Accuracy	Formula	Description
Accuracy	Prediction	Description
75.00%	(573,what_nogap/6,filler) > (572,that_nogap/6,filler)	We expect the “filler” to be less surprising in the that_no-gap condition than in the what_no-gap condition, because an upstream wh-word should set up an expectation for a gap.
83.33%	(575,what_subjgap/6,filler) < (576,that_subjgap/6,filler)	We expect the “filler” region to be lower surprisal in the what_gap condition than in the that_gap condition, because gaps must be licensed by upstream wh words (such as “what”).

Tip: hover over region or condition names to highlight the corresponding entries in the Sample Item above. Predictions are evaluated on region-level surprisal values.
Need help? Take a look at the documentation.

← Back to visualization home