soma-evals¶

Schema-ablation evals for SOMA — measures how progressively richer LinkML schema context improves LLM-based structured extraction from scientific literature.

Prerequisites¶

uv (Python package manager)
just (task runner)
Python 3.12+

Setup¶

git clone https://github.com/EHS-Data-Standards/soma-evals.git
cd soma-evals
just setup

API keys¶

Set keys via the llm key store or environment variables. Use whichever method you prefer — you only need keys for the providers whose models you plan to run.

Option A — key store (recommended):

uv run llm keys set openai       # paste your OpenAI key
uv run llm keys set anthropic    # paste your Anthropic key
uv run llm keys set gemini       # paste your Gemini key

Option B — .env file:

cp .env.example .env

Then edit .env:

OPENAI_API_KEY=sk-your-key-here
ANTHROPIC_API_KEY=sk-ant-your-key-here
GEMINI_API_KEY=AIyour-key-here

CBORG users (LBNL staff): Models prefixed with cborg/ route through the CBORG proxy and are free for lab staff. Authentication is handled by CBORG — no extra API key is needed beyond your CBORG access.

Running evals¶

just list-models        # show available models & tiers
just run-all            # run all four ablation levels (standard tier)
just run-baseline       # run a single level

Run a specific tier or override the default paper:

just run-all cheap
EVAL_PDF=my-paper.pdf EVAL_SLUG=my-slug just run-all

Ablation levels¶

Level	Schema context provided
`baseline`	None — LLM relies on training knowledge only
`class_names`	Class names, descriptions, and mappings
`full_classes`	+ slot definitions with ranges & cardinality
`with_enums`	+ enumeration values and ontology meanings

See the Ablation Levels page for full details on each level, the prompt context it injects, and links to example result YAML.

Results are written to results/<level>/<model>/<paper>.yaml.

Tests & QC¶

just test       # run tests (no API calls)
just coverage   # tests with coverage report
just fix        # auto-fix lint/format (ruff)

License¶

MIT