This is a beta release of SyntaxGym. Please send questions and comments to contact@syntaxgym.org.

Individual results

View in-depth performance of a single language model on a single test suite.

Region-by-region surprisal

Tip: hover over bars to see surprisal values. Surprisals are averaged over items. Error bars show 95% confidence intervals.
More info: view full details for model GPT-2 XL or test suite Filler-Gap Dependencies (4 sentential embeddings).

Sample item for Filler-Gap Dependencies (4 sentential embeddings)

The first item of the test suite is shown below for quick reference. Please visit the page for Filler-Gap Dependencies (4 sentential embeddings) to see the full list of items.

Item	Condition	prefix	comp	embedding	subj	verb	obj	continuation
Item	Condition	prefix	comp	embedding	subj	verb	obj	continuation
1	what_gap	I know	what	our mother said her friend remarked that the park attendant reported the cop thinks	your friend	threw		into the trash can
1	that_gap	I know	that	our mother said her friend remarked that the park attendant reported the cop thinks	your friend	threw		into the trash can
1	what_no-gap	I know	what	our mother said her friend remarked that the park attendant reported the cop thinks	your friend	threw	the plastic	into the trash can
1	that_no-gap	I know	that	our mother said her friend remarked that the park attendant reported the cop thinks	your friend	threw	the plastic	into the trash can

More info: view full details for test suite Filler-Gap Dependencies (4 sentential embeddings).

Prediction performance for GPT-2 XL on Filler-Gap Dependencies (4 sentential embeddings)

Accuracy	Formula	Description
Accuracy	Prediction	Description
61.90%	(627,what_no-gap/6,obj)>(625,that_no-gap/6,obj)	We expect the object to be less surprising in the that_no-gap condition than in the what_no-gap condition, because an upstream wh-word should set up an expectation for a gap.
80.95%	(628,what_gap/7,continuation)<(626,that_gap/7,continuation)	We expect the continuation to be lower in the what_gap condition than in the that_gap condition, because gaps must be licensed by upstream wh words (such as “what”).

Tip: hover over region or condition names to highlight the corresponding entries in the Sample Item above. Predictions are evaluated on region-level surprisal values.
Need help? Take a look at the documentation.

← Back to visualization home