CA Biositing Data Models
This package contains the SQLModel-based database models for the CA Biositing project. It is implemented as a PEP 420 namespace package that can be shared across multiple components of the application (ETL pipelines, API services, etc.).
Overview
The ca_biositing.datamodels package provides:
- Hand-Written SQLModel Classes: Models organized across domain subdirectories, combining SQLAlchemy ORM and Pydantic validation in a single class hierarchy.
- Materialized Views: Analytical views defined as SQLAlchemy Core
select()expressions (plus one SQL-based aggregate view), managed via Alembic migrations. - Database Configuration: SQLModel-based engine and session management with Docker-aware URL adjustment.
- Model Configuration: Shared configuration for model behavior using Pydantic Settings.
- Alembic Migrations: Version-controlled database schema changes.
Schema Management Workflow
All schema changes are managed through Alembic migrations generated from SQLModel class definitions.
Making Schema Changes
- Edit Models: Modify or add SQLModel classes in the appropriate
subdirectory under
ca_biositing/datamodels/models/. - Auto-Generate Migration: Run the autogenerate command to create a new
migration script:
pixi run migrate-autogenerate -m "Description of your changes" - Review Migration: Check the generated script in
alembic/versions/to ensure it accurately reflects your changes. - Apply Migration: Apply the changes to the database:
pixi run migrate
Managing Materialized Views
Materialized views are defined in ca_biositing/datamodels/views.py and
created/modified through manual Alembic migration scripts (not autogenerated).
After loading new data into the base tables, refresh the views:
pixi run refresh-views
To check the status of materialized views:
pixi run schema-analytics-list
Validation with pgschema (Optional)
The pgschema tool can still be used for validation (diffing the live DB
against reference SQL), but it is no longer used for applying schema changes:
# Diff public schema
pixi run schema-plan
# Diff analytics schema (materialized views)
pixi run schema-analytics-plan
Structure
src/ca_biositing/datamodels/
├── ca_biositing/
│ └── datamodels/
│ ├── __init__.py # Package initialization and version
│ ├── config.py # Model configuration (Pydantic Settings)
│ ├── database.py # SQLModel engine and session management
│ ├── views.py # Materialized view definitions
│ ├── models/ # Hand-written SQLModel classes
│ │ ├── __init__.py # Central re-export of models
│ │ ├── base.py # Base classes (BaseEntity, LookupBase, etc.)
│ │ ├── aim1_records/ # Aim 1 analytical records
│ │ ├── aim2_records/ # Aim 2 processing records
│ │ ├── auth/ # API users and authentication models
│ │ ├── core/ # ETL lineage and run tracking
│ │ ├── data_sources_metadata/ # Data source and dataset metadata
│ │ ├── experiment_equipment/ # Experiments and equipment
│ │ ├── external_data/ # LandIQ, USDA, Billion Ton records
│ │ ├── field_sampling/ # Field samples and collection methods
│ │ ├── general_analysis/ # Observations and analysis types
│ │ ├── infrastructure/ # Infrastructure facility records
│ │ ├── methods_parameters_units/ # Methods, parameters, units
│ │ ├── people/ # Contacts and providers
│ │ ├── places/ # Location and address models
│ │ ├── resource_information/ # Resources, availability, strains
│ │ └── sample_preparation/ # Prepared samples and methods
│ └── sql_schemas/ # Reference SQL files (for pgschema validation)
├── tests/
│ ├── __init__.py
│ ├── conftest.py # Pytest fixtures and configuration
│ ├── test_package.py # Tests for package metadata
│ └── README.md # Test documentation
├── LICENSE # BSD License
├── README.md # This file
└── pyproject.toml # Package metadata and dependencies
Installation
This package is part of the CA Biositing namespace package structure and is designed to be shared across multiple components of the project.
As part of the full project
The recommended way to install is using Pixi (which manages all dependencies):
pixi install
Standalone installation (development)
For development of just the datamodels package:
cd src/ca_biositing/datamodels
pip install -e .
Usage
Importing Models
All models are re-exported from the models package for convenient imports:
from ca_biositing.datamodels.models import (
Resource,
ResourceClass,
FieldSample,
Place,
LocationAddress,
)
# Create a model instance
sample = FieldSample(
name="Sample-001",
resource_id=1,
amount_collected=50.5
)
You can also import from specific domain submodules:
from ca_biositing.datamodels.models.resource_information import Resource
from ca_biositing.datamodels.models.field_sampling import FieldSample
from ca_biositing.datamodels.models.places import Place
Database Operations
from sqlmodel import Session, select
from ca_biositing.datamodels.database import get_engine
from ca_biositing.datamodels.models import Resource
engine = get_engine()
with Session(engine) as session:
# Query models
statement = select(Resource)
resources = session.exec(statement).all()
# Add new records
new_resource = Resource(name="Corn Stover")
session.add(new_resource)
session.commit()
Refreshing Materialized Views
from ca_biositing.datamodels.database import get_engine
from ca_biositing.datamodels.views import refresh_all_views
engine = get_engine()
refresh_all_views(engine)
Testing
The package includes a comprehensive test suite covering model instantiation, field validation, and database persistence.
Run all tests
pixi run pytest src/ca_biositing/datamodels -v
Run specific test files
pixi run pytest src/ca_biositing/datamodels/tests/test_package.py -v
Run with coverage
pixi run pytest src/ca_biositing/datamodels --cov=ca_biositing.datamodels --cov-report=html
See tests/README.md for detailed information about the test suite.
Model Categories
The models are organized into domain subdirectories under models/:
Core and Infrastructure
base.py: Base classes shared across all models (BaseEntity,LookupBase,Aim1RecordBase,Aim2RecordBase).auth/: API authentication models (ApiUser).core/: ETL run tracking and lineage (EtlRun,EntityLineage,LineageGroup).infrastructure/: Infrastructure facility records (biodiesel plants, landfills, ethanol biorefineries, etc.).places/: Location and address models (Place,LocationAddress,LocationResolution).people/: Contact and provider information (Contact,Provider).data_sources_metadata/: Data source tracking (DataSource,Dataset,FileObjectMetadata).
Resources and Sampling
resource_information/: Core resource entities (Resource,ResourceClass,ResourceSubclass,ResourceAvailability,Strain).field_sampling/: Field sampling data (FieldSample,HarvestMethod,CollectionMethod,SoilType).sample_preparation/: Sample processing (PreparedSample,PreparationMethod,ProcessingMethod).
Experiments and Analysis
experiment_equipment/: Experimental setup (Experiment,Equipment,ExperimentAnalysis).methods_parameters_units/: Methods, parameters, and units (Method,Parameter,Unit,MethodCategory).general_analysis/: Observations and analysis results (Observation,AnalysisType,PhysicalCharacteristic).aim1_records/: Aim 1 analytical records (proximate, ultimate, compositional, ICP, XRD, XRF, etc.).aim2_records/: Aim 2 processing records (autoclave, fermentation, gasification, pretreatment).
External Data
external_data/: Integration with external datasets (LandIQ, USDA Census, USDA Survey, Billion Ton 2023, USDA Market).
Dependencies
Core dependencies (defined in pyproject.toml):
- SQLModel: Combines SQLAlchemy + Pydantic for typed ORM models
- SQLAlchemy >= 2.0.0: SQL database interaction
- GeoAlchemy2: PostGIS geometry column support
- Alembic >= 1.13.2: Database migration tool
- psycopg2 >= 2.9.9: PostgreSQL adapter for Python
- Pydantic Settings >= 2.0.0: Settings management
Development
Code Quality
Before committing changes, run pre-commit checks:
pixi run pre-commit run --files src/ca_biositing/datamodels/**/*
Adding New Models
- Create Model: Add a new SQLModel class in the appropriate subdirectory
under
models/, or create a new subdirectory if needed. - Re-Export: Add the import to
models/__init__.pyso the model is available fromca_biositing.datamodels.models. - Generate Migration: Run
pixi run migrate-autogenerate -m "Add new model". - Review: Check the generated migration in
alembic/versions/. - Apply: Run
pixi run migrateto update the database.
Modifying Existing Models
- Edit Model: Modify the SQLModel class in its domain subdirectory.
- Generate Migration: Run
pixi run migrate-autogenerate -m "Describe change". - Review: Check the migration script for accuracy.
- Apply: Run
pixi run migrate.
Package Information
- Package Name:
ca-biositing-datamodels - Version: 0.1.0
- Python: >= 3.12
- License: BSD License
- Repository: https://github.com/sustainability-software-lab/ca-biositing
Contributing
See the main project's CONTRIBUTING.md for guidelines on contributing to this
package.