This is a beta release of SyntaxGym. Please send questions and comments to contact@syntaxgym.org.

Individual results

View in-depth performance of a single language model on a single test suite.

Region-by-region surprisal

Tip: hover over bars to see surprisal values. Surprisals are averaged over items. Error bars show 95% confidence intervals.
More info: view full details for model Transformer XL or test suite Filler-Gap Dependencies (extraction from prepositional phrase).

Sample item for Filler-Gap Dependencies (extraction from prepositional phrase)

The first item of the test suite is shown below for quick reference. Please visit the page for Filler-Gap Dependencies (extraction from prepositional phrase) to see the full list of items.

Item	Condition	prefix	comp	np1	verb	np2	prep	np3	end
Item	Condition	prefix	comp	np1	verb	np2	prep	np3	end
1	what_nogap	I know	who	our uncle	grabbed	the food	in front of	the guests	at the holiday party
1	that_nogap	I know	that	our uncle	grabbed	the food	in front of	the guests	at the holiday party
1	what_gap	I know	who	our uncle	grabbed	the food	in front of		at the holiday party
1	that_gap	I know	that	our uncle	grabbed	the food	in front of		at the holiday party

More info: view full details for test suite Filler-Gap Dependencies (extraction from prepositional phrase).

Prediction performance for Transformer XL on Filler-Gap Dependencies (extraction from prepositional phrase)

Accuracy	Formula	Description
Accuracy	Prediction	Description
75.00%	(622,what_nogap/7,np3) > (623,that_nogap/7,np3)	We expect the NP3 to be less surprising in the that_no-gap condition than in the what_no-gap condition, because an upstream wh-word should set up an expectation for a gap.
54.17%	(624,what_gap/8,end) < (621,that_gap/8,end)	We expect the end region to be lower in the what_gap condition than in the that_gap condition, because gaps must be licensed by upstream wh words (such as “what”).

Tip: hover over region or condition names to highlight the corresponding entries in the Sample Item above. Predictions are evaluated on region-level surprisal values.
Need help? Take a look at the documentation.

← Back to visualization home