Table of Contents
- Deterministic Code Generation
Deterministic Code Generation
Why Determinism Matters
Determinism means: Same inputs always produce byte-identical outputs.
This is critical for:
- Reproducible builds: CI/CD can verify generated code hasn't changed unexpectedly
- Git-friendly: Only meaningful changes appear in diffs, not random ordering
- Cacheable: Build systems can cache outputs based on input hashes
- Trustworthy: Developers can confidently regenerate without fear of breaking changes
- Auditable: Verify that generated code matches declared inputs
Without determinism, code generation is unpredictable chaos:
# Non-deterministic generator
$ codegen --input schema.json
# Output: model.rs (1,234 bytes, fields in random order)
$ codegen --input schema.json
# Output: model.rs (1,234 bytes, DIFFERENT field order)
# Git diff shows 50 lines changed, but semantically identical!
With determinism:
# ggen deterministic generator
$ ggen gen model.tmpl --graph schema.ttl
# Output: model.rs (1,234 bytes, SHA256: abc123...)
$ ggen gen model.tmpl --graph schema.ttl
# Output: model.rs (1,234 bytes, SHA256: abc123...)
# Byte-identical. Git diff shows ZERO changes.
The Determinism Guarantee
ggen provides a cryptographic determinism guarantee:
Same RDF graph + Same template + Same variables
⇒ Byte-identical output
⇒ Same SHA-256 hash
This guarantee holds across:
- Machines: Mac, Linux, Windows produce identical output
- Environments: Dev, CI, production generate the same code
- Time: Generate today or next year, result is identical
- Users: Different developers get the same output
How ggen Achieves Determinism
1. Content Hashing
Every input to code generation is hashed using SHA-256:
#![allow(unused)] fn main() { use sha2::{Sha256, Digest}; fn hash_content(content: &str) -> String { let mut hasher = Sha256::new(); hasher.update(content.as_bytes()); format!("{:x}", hasher.finalize()) } }
This produces a deterministic fingerprint of inputs.
2. Sorted RDF Graphs
RDF triples are inherently unordered (they're a set, not a list). To make them deterministic, ggen:
- Serializes the graph to N-Quads format (canonical RDF syntax)
- Sorts triples lexicographically
- Hashes the sorted output
# Input RDF (order may vary)
pc:Product pc:name "Widget" .
pc:Product pc:price 99.99 .
# Sorted N-Quads (deterministic order)
<http://example.org/product_catalog#Product> <http://example.org/product_catalog#name> "Widget" .
<http://example.org/product_catalog#Product> <http://example.org/product_catalog#price> "99.99"^^<http://www.w3.org/2001/XMLSchema#decimal> .
Result: Same RDF graph → Same hash, regardless of input order.
3. Ordered SPARQL Results
SPARQL queries must include ORDER BY to guarantee deterministic results:
# ❌ Non-deterministic (unordered)
SELECT ?property ?datatype WHERE {
?property rdfs:domain pc:Product .
?property rdfs:range ?datatype .
}
# ✅ Deterministic (ordered)
SELECT ?property ?datatype WHERE {
?property rdfs:domain pc:Product .
?property rdfs:range ?datatype .
}
ORDER BY ?property
ggen enforces ORDER BY in matrix queries. Templates without ORDER BY are rejected.
4. Version-Locked Templates
Marketplace gpacks use semantic versioning and lockfiles:
# ggen.lock
[gpacks]
"io.ggen.rust.models" = "0.2.1"
"io.ggen.typescript.types" = "1.3.0"
[dependencies]
"io.ggen.rust.models" = {
version = "0.2.1",
source = "registry",
checksum = "sha256:abc123..."
}
Result: Same gpack version → Same template → Same output.
Manifest Key Calculation
Every generation operation produces a manifest key (SHA-256 hash) that uniquely identifies the inputs.
For Local Templates
K = SHA256(seed || graph_hash || shapes_hash || frontmatter_hash || rows_hash)
Where:
seed: Random seed for reproducibility (default: fixed value)graph_hash: Hash of sorted RDF graph (N-Quads)shapes_hash: Hash of SHACL validation shapes (N-Quads)frontmatter_hash: Hash of template frontmatter (YAML)rows_hash: Hash of SPARQL query results (ordered)
For Marketplace Gpacks
K = SHA256(seed || gpack_version || gpack_deps_hash || graph_hash || shapes_hash || frontmatter_hash || rows_hash)
Additional components:
gpack_version: Exact version fromggen.toml(e.g.,0.2.1)gpack_deps_hash: Hash of all dependency versions
Key insight: Changing any input changes the manifest key, triggering regeneration.
Hash Components Explained
Graph Hash
Purpose: Ensure RDF ontology changes are detected.
Algorithm:
- Load RDF graph into Oxigraph
- Export to N-Quads format (canonical RDF syntax)
- Sort triples lexicographically
- Compute SHA-256 of sorted output
Example:
# Input: product_catalog.ttl
pc:Product a rdfs:Class .
pc:name rdfs:domain pc:Product ; rdfs:range xsd:string .
pc:price rdfs:domain pc:Product ; rdfs:range xsd:decimal .
# Sorted N-Quads
<http://ex.org/product_catalog#Product> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2000/01/rdf-schema#Class> .
<http://ex.org/product_catalog#name> <http://www.w3.org/2000/01/rdf-schema#domain> <http://ex.org/product_catalog#Product> .
<http://ex.org/product_catalog#name> <http://www.w3.org/2000/01/rdf-schema#range> <http://www.w3.org/2001/XMLSchema#string> .
<http://ex.org/product_catalog#price> <http://www.w3.org/2000/01/rdf-schema#domain> <http://ex.org/product_catalog#Product> .
<http://ex.org/product_catalog#price> <http://www.w3.org/2000/01/rdf-schema#range> <http://www.w3.org/2001/XMLSchema#decimal> .
→ SHA256: a3f2c8b1...
Shapes Hash
Purpose: Detect SHACL validation changes.
Algorithm: Same as graph hash, but for SHACL shapes file.
# shapes.ttl
pc:ProductShape a sh:NodeShape ;
sh:targetClass pc:Product ;
sh:property [
sh:path pc:name ;
sh:datatype xsd:string ;
sh:minCount 1 ;
] .
→ Sorted N-Quads → SHA256
Frontmatter Hash
Purpose: Detect template metadata changes.
Algorithm:
- Extract YAML frontmatter from template
- Canonicalize YAML (sorted keys)
- Render Handlebars expressions in frontmatter
- Compute SHA-256
Example:
---
to: src/models/{{ class_name }}.rs
vars:
class_name: Product
matrix:
query: |
SELECT ?property WHERE { ... }
ORDER BY ?property
---
→ Rendered frontmatter → SHA256
Rows Hash
Purpose: Detect SPARQL query result changes.
Algorithm:
- Execute SPARQL query from template
- Serialize results to ordered JSON
- Compute SHA-256
Example:
SELECT ?property ?datatype WHERE {
?property rdfs:domain pc:Product .
?property rdfs:range ?datatype .
}
ORDER BY ?property
{
"results": [
{"property": "pc:name", "datatype": "xsd:string"},
{"property": "pc:price", "datatype": "xsd:decimal"}
]
}
→ SHA256
Chicago TDD Validation
ggen's determinism is validated by a comprehensive end-to-end test using Chicago TDD principles.
The 782-Line End-to-End Test
File: tests/chicago_tdd/ontology_driven_e2e.rs
Test name: test_ontology_to_code_generation_workflow
What it tests:
- Create RDF ontology v1 (Product, Category, Supplier)
- Generate Rust code from ontology v1
- Verify generated code contains expected structs and fields
- Modify ontology to v2 (add SKU, rating, inventory properties)
- Regenerate Rust code from ontology v2
- Verify new properties appear in generated code
- Verify code delta matches ontology delta
Test principles:
- Real RDF graphs (no mocks) loaded into Oxigraph
- Real SPARQL queries executed against Oxigraph
- Real file I/O (templates, generated code)
- Real template rendering with Handlebars
- Real code validation (struct definitions, field types)
What the Test Validates
Determinism aspects:
- Reproducibility: Running generation twice produces identical code
- Graph ordering: RDF graph is processed in deterministic order
- Query ordering: SPARQL results are consistently ordered
- Type mapping:
xsd:string→String,xsd:decimal→f64 - Evolution: Ontology changes propagate correctly to code
Example assertions:
#![allow(unused)] fn main() { // V1 ontology generates V1 code assert_code_contains(&code_v1, "struct Product", "v1 should have Product struct"); assert_code_contains(&code_v1, "name: String", "v1 Product should have name field"); assert_code_contains(&code_v1, "price: f64", "v1 Product should have price field"); assert_code_not_contains(&code_v1, "sku", "v1 should NOT have SKU field yet"); // V2 ontology generates V2 code with NEW fields assert_code_contains(&code_v2, "struct Product", "v2 should still have Product struct"); assert_code_contains(&code_v2, "sku: String", "v2 should have NEW SKU field from ontology"); assert_code_contains(&code_v2, "rating: f64", "v2 should have NEW rating field from ontology"); assert_code_contains(&code_v2, "inventory_count: i32", "v2 should have NEW inventory field from ontology"); // Verify code delta matches ontology delta assert_eq!(code_diff.new_fields, 3, "Should have 3 new fields"); assert_eq!(code_diff.new_methods, 1, "Should have 1 new method"); }
Test Execution
# Run the Chicago TDD end-to-end test
cargo make test --test chicago_tdd ontology_driven_e2e -- --nocapture
# Output shows:
# [1/6] Parsing RDF...
# [2/6] Extracting project structure...
# [3/6] Validating project...
# [4/6] Live Preview...
# [5/6] Generating workspace structure...
# [6/6] Running post-generation hooks...
# ✅ Generation Complete!
Passing this test proves:
- Deterministic RDF loading (Oxigraph)
- Deterministic SPARQL execution (ordered results)
- Deterministic code generation (same inputs → same outputs)
- Deterministic evolution (ontology changes → code changes)
Determinism in Practice
Example 1: Same Inputs → Identical Outputs
# First generation
$ ggen gen rust/models.tmpl --graph product_catalog.ttl
Generated: src/models/product.rs (1,234 bytes)
Manifest key: sha256:a3f2c8b1e4d5f6a7...
# Second generation (identical inputs)
$ ggen gen rust/models.tmpl --graph product_catalog.ttl
Generated: src/models/product.rs (1,234 bytes)
Manifest key: sha256:a3f2c8b1e4d5f6a7...
# Verify byte-identical
$ sha256sum src/models/product.rs
a3f2c8b1e4d5f6a7... src/models/product.rs
Git diff shows ZERO changes:
$ git diff
# No output - files are identical
Example 2: Cross-Environment Consistency
Developer 1 (Mac):
$ ggen gen rust/models.tmpl --graph product_catalog.ttl
Manifest key: sha256:a3f2c8b1...
Developer 2 (Linux):
$ ggen gen rust/models.tmpl --graph product_catalog.ttl
Manifest key: sha256:a3f2c8b1...
CI Pipeline (Ubuntu):
$ ggen gen rust/models.tmpl --graph product_catalog.ttl
Manifest key: sha256:a3f2c8b1...
All three environments produce byte-identical outputs.
Example 3: Git-Friendly Diffs
Scenario: Add rating field to Product ontology.
# product_catalog.ttl
pc:Product a rdfs:Class .
pc:name rdfs:domain pc:Product ; rdfs:range xsd:string .
pc:price rdfs:domain pc:Product ; rdfs:range xsd:decimal .
+pc:rating rdfs:domain pc:Product ; rdfs:range xsd:decimal .
$ ggen gen rust/models.tmpl --graph product_catalog.ttl
Git diff shows ONLY the new field:
# src/models/product.rs
pub struct Product {
pub name: String,
pub price: f64,
+ pub rating: f64,
}
No random reordering. No unrelated changes. Just the semantic diff.
Version Locking with Gpacks
Marketplace gpacks use lockfiles to ensure version determinism.
Lockfile Structure
# ggen.lock
[lockfile]
version = "1.0"
[gpacks]
"io.ggen.rust.models" = "0.2.1"
"io.ggen.typescript.types" = "1.3.0"
[dependencies]
"io.ggen.rust.models" = {
version = "0.2.1",
source = "registry",
checksum = "sha256:abc123def456..."
}
"io.ggen.macros.std" = {
version = "0.2.0",
source = "registry",
checksum = "sha256:789ghi012jkl..."
}
Installing Specific Versions
# Install exact version
$ ggen add io.ggen.rust.models@0.2.1
# Lockfile records version
$ cat ggen.lock
[gpacks]
"io.ggen.rust.models" = "0.2.1"
# All future generations use locked version
$ ggen gen io.ggen.rust.models:models.tmpl --graph product_catalog.ttl
# Uses version 0.2.1 (locked)
Commit the lockfile:
$ git add ggen.lock
$ git commit -m "Lock gpack versions for deterministic builds"
Now CI and other developers use the EXACT same template versions.
Debugging Determinism Issues
Enable Trace Logging
# Show hash components during generation
$ GGEN_TRACE=1 ggen gen rust/models.tmpl --graph product_catalog.ttl
# Output:
# Manifest key calculation:
# seed: 0x00000000
# graph_hash: sha256:a3f2c8b1...
# shapes_hash: sha256:e4d5f6a7...
# frontmatter_hash: sha256:b8c9d0e1...
# rows_hash: sha256:f2a3b4c5...
# → manifest_key: sha256:1234abcd...
Compare Manifest Keys
# Generate on machine A
$ ggen gen rust/models.tmpl --graph product_catalog.ttl
Manifest key: sha256:1234abcd...
# Generate on machine B
$ ggen gen rust/models.tmpl --graph product_catalog.ttl
Manifest key: sha256:5678efgh... # ❌ Different!
# Enable tracing to find the difference
$ GGEN_TRACE=1 ggen gen rust/models.tmpl --graph product_catalog.ttl
# Check which hash component differs
Check SPARQL Ordering
Problem: Query results in different order.
Solution: Add ORDER BY to SPARQL query.
# Before (non-deterministic)
SELECT ?property ?datatype WHERE {
?property rdfs:domain ?class .
?property rdfs:range ?datatype .
}
# After (deterministic)
SELECT ?property ?datatype WHERE {
?property rdfs:domain ?class .
?property rdfs:range ?datatype .
}
ORDER BY ?property ?datatype
Best Practices for Deterministic Generation
-
Always use
ORDER BYin SPARQL queriesSELECT ?x ?y WHERE { ... } ORDER BY ?x ?y -
Pin gpack versions in production
ggen add io.ggen.rust.models@0.2.1 # Not @latest -
Commit lockfiles to version control
git add ggen.lock git commit -m "Lock template versions" -
Validate in CI
# .github/workflows/codegen.yml - name: Verify determinism run: | ggen gen rust/models.tmpl --graph product_catalog.ttl git diff --exit-code src/models/product.rs -
Use canonical RDF formats
- Prefer Turtle (
.ttl) for readability - ggen canonicalizes to N-Quads internally
- Prefer Turtle (
-
Avoid timestamps in templates
// ❌ Non-deterministic // Generated at: {{ current_timestamp }} // ✅ Deterministic // Generated from: product_catalog.ttl -
Test with Chicago TDD principles
- Use real RDF graphs (no mocks)
- Verify byte-identical regeneration
- Test ontology evolution scenarios
The Bottom Line
ggen's determinism guarantee:
Same inputs + Same environment = Byte-identical outputs
This is not a goal. It's a tested, validated, cryptographically-guaranteed property of the system.
The 782-line Chicago TDD test proves it. The SHA-256 manifest keys enforce it. The lockfiles preserve it.
You can trust ggen to generate the exact same code, every single time.