This is a beta release of SyntaxGym. Please send questions and comments to contact@syntaxgym.org.

Individual results

View in-depth performance of a single language model on a single test suite.

Region-by-region surprisal

Tip: hover over bars to see surprisal values. Surprisals are averaged over items. Error bars show 95% confidence intervals.
More info: view full details for model GPT-2 XL or test suite Filler-Gap Dependencies (object extraction).

Sample item for Filler-Gap Dependencies (object extraction)

The first item of the test suite is shown below for quick reference. Please visit the page for Filler-Gap Dependencies (object extraction) to see the full list of items.

Item	Condition	prefix	comp	np1	verb	np2	prep	np3	end
Item	Condition	prefix	comp	np1	verb	np2	prep	np3	end
1	what_nogap	I know	what	our uncle	grabbed	the food	in front of	the guests	at the holiday party
1	that_nogap	I know	that	our uncle	grabbed	the food	in front of	the guests	at the holiday party
1	what_gap	I know	what	our uncle	grabbed		in front of	the guests	at the holiday party
1	that_gap	I know	that	our uncle	grabbed		in front of	the guests	at the holiday party

More info: view full details for test suite Filler-Gap Dependencies (object extraction).

Prediction performance for GPT-2 XL on Filler-Gap Dependencies (object extraction)

Accuracy	Formula	Description
Accuracy	Prediction	Description
100.00%	(640,what_nogap/5,np2) > (641,that_nogap/5,np2)	We expect the NP2 to be less surprising in the that_no-gap condition than in the what_no-gap condition, because an upstream wh-word should set up an expectation for a gap.
95.83%	(642,what_gap/6,prep) < (639,that_gap/6,prep)	We expect the “prep” region to be lower in the what_gap condition than in the that_gap condition, because gaps must be licensed by upstream wh words (such as “what”).

Tip: hover over region or condition names to highlight the corresponding entries in the Sample Item above. Predictions are evaluated on region-level surprisal values.
Need help? Take a look at the documentation.

← Back to visualization home