This is a beta release of SyntaxGym. Please send questions and comments to contact@syntaxgym.org.

Individual results

View in-depth performance of a single language model on a single test suite.

Region-by-region surprisal

Tip: hover over bars to see surprisal values. Surprisals are averaged over items. Error bars show 95% confidence intervals.
More info: view full details for model GPT-2 or test suite Cleft Structure (with modifier).

Sample item for Cleft Structure (with modifier)

The first item of the test suite is shown below for quick reference. Please visit the page for Cleft Structure (with modifier) to see the full list of items.

Item	Condition	intro	subj	verb	modifier	passive	verb.1	matrix_v
Item	Condition	intro	subj	verb	modifier	passive	verb.1	matrix_v
1	np_mismatch	What	he	did	after the ingredients had been bought from the store	was		the meal
1	np_match	What	he	ate	after the ingredients had been bought from the store	was		the meal
1	vp_match	What	he	did	after the ingredients had been bought from the store	was	prepare	the meal
1	vp_mismatch	What	he	ate	after the ingredients had been bought from the store	was	prepare	the meal

More info: view full details for test suite Cleft Structure (with modifier).

Prediction performance for GPT-2 on Cleft Structure (with modifier)

Accuracy	Formula	Description
Accuracy	Prediction	Description
92.50%	((607,np_mismatch/7,matrix_v)-(605,np_match/7,matrix_v))+(((606,vp_mismatch/6,verb.1)+(606,vp_mismatch/7,matrix_v))-((608,vp_match/6,verb.1)+(608,vp_match/7,matrix_v)))>0	We expect that the Matrix Verb has lower surprisal in the NP Match condition, where we have a lexicalized verb (“ate” instead of “did”). In addition, we expect that the sum of the Verb 1 + Matrix Verb has lower surprisal in the VP Match condition, where it cannot be the object of a lexicalized verb such as “ate.” Together, the differences between these sums should be greater than zero. We add a VP modifier in this test suite.

Tip: hover over region or condition names to highlight the corresponding entries in the Sample Item above. Predictions are evaluated on region-level surprisal values.
Need help? Take a look at the documentation.

← Back to visualization home