About
The SOMA
The SOMA is part of the EHS Data Standards initiative, focused on developing standardized data models for environmental health sciences research.
Project Goals
This project aims to:
- Standardize data representation for exposure-outcome relationships in EHS research
- Enable data interoperability across studies, cohorts, and institutions
- Support mechanistic understanding through integration with Adverse Outcome Pathways (AOPs)
- Bridge epidemiological and toxicological data from human studies and model systems
The Data Model
Design Principles
The SOMA follows these principles:
- Ontology-first - All entities are mapped to established biomedical ontologies
- FAIR-compliant - Supports Findable, Accessible, Interoperable, and Reusable data
- Extensible - New assay types can be added without breaking existing data
- Multi-scale - Captures data from molecular to population levels
Technology Stack
The model is built using:
- LinkML - Linked Data Modeling Language for schema definition
- MkDocs with Material theme for documentation
- Python for data validation and transformation
Core Domains
| Domain | Description |
|---|---|
| Assays | Domain-specific assay classes with named measurement slots (e.g., CiliaryFunctionAssay, LungFunctionAssay) |
| Study Subjects | Biological systems under study: cell cultures (CellularSystem), human/animal subjects (InVivoSubject), populations (PopulationSubject) |
| Protocols | Typed experimental procedures: ImagingProtocol, MolecularAssayProtocol, StainingProtocol, SpirometryProtocol |
| AOP Framework | Adverse Outcome Pathways: KeyEvent, AdverseOutcomePathway, with assay linkage via informs_on_key_event |
Contributing
We welcome contributions from the community. To contribute:
- Visit the GitHub repository
- Review the existing schema in
src/soma/schema/ - Open an issue to discuss proposed changes
- Submit a pull request with your contributions
Development
For local development, use uv and
just as the canonical entry points.
The repository may contain underlying Python, npm, and LinkML commands, but
contributors should treat the just recipes as the supported interface for
routine setup, testing, and generation tasks.
Prerequisites
- Python 3.10+
- uv for Python environment and dependency management
- just for repository task automation
nodeandnpmfor DataHarmonizer frontend builds
Setup
Install the Python dependencies managed by the repo:
just install
Common Commands
- Run the full validation workflow:
just test - Regenerate project artifacts:
just gen-project - Regenerate schema documentation:
just gen-doc - Build the DataHarmonizer assets:
just build-dh - Run the local documentation server:
just testdoc - List all available recipes:
just --list
If you need to run a Python tool directly, prefer uv run ... so it executes
inside the managed project environment.
Project Structure
soma/
├── src/
│ ├── docs/ # Documentation source files
│ └── soma/
│ ├── schema/ # LinkML schema definition
│ └── datamodel/ # Generated Python models
├── docs/
│ └── elements/ # Generated schema docs
├── project/ # Generated artifacts
├── tests/
│ └── data/ # Test data files
└── examples/ # Usage examples
License
This project is released under the MIT License.
Acknowledgments
This project uses the linkml-project-copier template for project structure and build tooling.
Contact
For questions or feedback, please open an issue on the GitHub repository.