Table of Contents

Deterministic Code Generation

Deterministic Code Generation

Why Determinism Matters

Determinism means: Same inputs always produce byte-identical outputs.

This is critical for:

Reproducible builds: CI/CD can verify generated code hasn't changed unexpectedly
Git-friendly: Only meaningful changes appear in diffs, not random ordering
Cacheable: Build systems can cache outputs based on input hashes
Trustworthy: Developers can confidently regenerate without fear of breaking changes
Auditable: Verify that generated code matches declared inputs

Without determinism, code generation is unpredictable chaos:

# Non-deterministic generator
$ codegen --input schema.json
# Output: model.rs (1,234 bytes, fields in random order)

$ codegen --input schema.json
# Output: model.rs (1,234 bytes, DIFFERENT field order)
# Git diff shows 50 lines changed, but semantically identical!

With determinism:

# ggen deterministic generator
$ ggen gen model.tmpl --graph schema.ttl
# Output: model.rs (1,234 bytes, SHA256: abc123...)

$ ggen gen model.tmpl --graph schema.ttl
# Output: model.rs (1,234 bytes, SHA256: abc123...)
# Byte-identical. Git diff shows ZERO changes.

The Determinism Guarantee

ggen provides a cryptographic determinism guarantee:

Same RDF graph + Same template + Same variables
    ⇒ Byte-identical output
    ⇒ Same SHA-256 hash

This guarantee holds across:

Machines: Mac, Linux, Windows produce identical output
Environments: Dev, CI, production generate the same code
Time: Generate today or next year, result is identical
Users: Different developers get the same output

How ggen Achieves Determinism

1. Content Hashing

Every input to code generation is hashed using SHA-256:

#![allow(unused)]
fn main() {
use sha2::{Sha256, Digest};

fn hash_content(content: &str) -> String {
    let mut hasher = Sha256::new();
    hasher.update(content.as_bytes());
    format!("{:x}", hasher.finalize())
}
}

This produces a deterministic fingerprint of inputs.

2. Sorted RDF Graphs

RDF triples are inherently unordered (they're a set, not a list). To make them deterministic, ggen:

Serializes the graph to N-Quads format (canonical RDF syntax)
Sorts triples lexicographically
Hashes the sorted output

# Input RDF (order may vary)
pc:Product pc:name "Widget" .
pc:Product pc:price 99.99 .

# Sorted N-Quads (deterministic order)
<http://example.org/product_catalog#Product> <http://example.org/product_catalog#name> "Widget" .
<http://example.org/product_catalog#Product> <http://example.org/product_catalog#price> "99.99"^^<http://www.w3.org/2001/XMLSchema#decimal> .

Result: Same RDF graph → Same hash, regardless of input order.

3. Ordered SPARQL Results

SPARQL queries must include ORDER BY to guarantee deterministic results:

# ❌ Non-deterministic (unordered)
SELECT ?property ?datatype WHERE {
    ?property rdfs:domain pc:Product .
    ?property rdfs:range ?datatype .
}

# ✅ Deterministic (ordered)
SELECT ?property ?datatype WHERE {
    ?property rdfs:domain pc:Product .
    ?property rdfs:range ?datatype .
}
ORDER BY ?property

ggen enforces ORDER BY in matrix queries. Templates without ORDER BY are rejected.

4. Version-Locked Templates

Marketplace gpacks use semantic versioning and lockfiles:

# ggen.lock
[gpacks]
"io.ggen.rust.models" = "0.2.1"
"io.ggen.typescript.types" = "1.3.0"

[dependencies]
"io.ggen.rust.models" = {
    version = "0.2.1",
    source = "registry",
    checksum = "sha256:abc123..."
}

Result: Same gpack version → Same template → Same output.

Manifest Key Calculation

Every generation operation produces a manifest key (SHA-256 hash) that uniquely identifies the inputs.

For Local Templates

K = SHA256(seed || graph_hash || shapes_hash || frontmatter_hash || rows_hash)

Where:

seed: Random seed for reproducibility (default: fixed value)
graph_hash: Hash of sorted RDF graph (N-Quads)
shapes_hash: Hash of SHACL validation shapes (N-Quads)
frontmatter_hash: Hash of template frontmatter (YAML)
rows_hash: Hash of SPARQL query results (ordered)

For Marketplace Gpacks

K = SHA256(seed || gpack_version || gpack_deps_hash || graph_hash || shapes_hash || frontmatter_hash || rows_hash)

Additional components:

gpack_version: Exact version from ggen.toml (e.g., 0.2.1)
gpack_deps_hash: Hash of all dependency versions

Key insight: Changing any input changes the manifest key, triggering regeneration.

Hash Components Explained

Graph Hash

Purpose: Ensure RDF ontology changes are detected.

Algorithm:

Load RDF graph into Oxigraph
Export to N-Quads format (canonical RDF syntax)
Sort triples lexicographically
Compute SHA-256 of sorted output

Example:

# Input: product_catalog.ttl
pc:Product a rdfs:Class .
pc:name rdfs:domain pc:Product ; rdfs:range xsd:string .
pc:price rdfs:domain pc:Product ; rdfs:range xsd:decimal .

# Sorted N-Quads
<http://ex.org/product_catalog#Product> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2000/01/rdf-schema#Class> .
<http://ex.org/product_catalog#name> <http://www.w3.org/2000/01/rdf-schema#domain> <http://ex.org/product_catalog#Product> .
<http://ex.org/product_catalog#name> <http://www.w3.org/2000/01/rdf-schema#range> <http://www.w3.org/2001/XMLSchema#string> .
<http://ex.org/product_catalog#price> <http://www.w3.org/2000/01/rdf-schema#domain> <http://ex.org/product_catalog#Product> .
<http://ex.org/product_catalog#price> <http://www.w3.org/2000/01/rdf-schema#range> <http://www.w3.org/2001/XMLSchema#decimal> .

→ SHA256: a3f2c8b1...

Shapes Hash

Purpose: Detect SHACL validation changes.

Algorithm: Same as graph hash, but for SHACL shapes file.

# shapes.ttl
pc:ProductShape a sh:NodeShape ;
    sh:targetClass pc:Product ;
    sh:property [
        sh:path pc:name ;
        sh:datatype xsd:string ;
        sh:minCount 1 ;
    ] .

→ Sorted N-Quads → SHA256

Frontmatter Hash

Purpose: Detect template metadata changes.

Algorithm:

Extract YAML frontmatter from template
Canonicalize YAML (sorted keys)
Render Handlebars expressions in frontmatter
Compute SHA-256

Example:

---
to: src/models/{{ class_name }}.rs
vars:
  class_name: Product
matrix:
  query: |
    SELECT ?property WHERE { ... }
    ORDER BY ?property
---

→ Rendered frontmatter → SHA256

Rows Hash

Purpose: Detect SPARQL query result changes.

Algorithm:

Execute SPARQL query from template
Serialize results to ordered JSON
Compute SHA-256

Example:

SELECT ?property ?datatype WHERE {
    ?property rdfs:domain pc:Product .
    ?property rdfs:range ?datatype .
}
ORDER BY ?property

{
  "results": [
    {"property": "pc:name", "datatype": "xsd:string"},
    {"property": "pc:price", "datatype": "xsd:decimal"}
  ]
}

→ SHA256

Chicago TDD Validation

ggen's determinism is validated by a comprehensive end-to-end test using Chicago TDD principles.

The 782-Line End-to-End Test

File: tests/chicago_tdd/ontology_driven_e2e.rs

Test name: test_ontology_to_code_generation_workflow

What it tests:

Create RDF ontology v1 (Product, Category, Supplier)
Generate Rust code from ontology v1
Verify generated code contains expected structs and fields
Modify ontology to v2 (add SKU, rating, inventory properties)
Regenerate Rust code from ontology v2
Verify new properties appear in generated code
Verify code delta matches ontology delta

Test principles:

Real RDF graphs (no mocks) loaded into Oxigraph
Real SPARQL queries executed against Oxigraph
Real file I/O (templates, generated code)
Real template rendering with Handlebars
Real code validation (struct definitions, field types)

What the Test Validates

Determinism aspects:

Reproducibility: Running generation twice produces identical code
Graph ordering: RDF graph is processed in deterministic order
Query ordering: SPARQL results are consistently ordered
Type mapping: xsd:string → String, xsd:decimal → f64
Evolution: Ontology changes propagate correctly to code

Example assertions:

#![allow(unused)]
fn main() {
// V1 ontology generates V1 code
assert_code_contains(&code_v1, "struct Product", "v1 should have Product struct");
assert_code_contains(&code_v1, "name: String", "v1 Product should have name field");
assert_code_contains(&code_v1, "price: f64", "v1 Product should have price field");
assert_code_not_contains(&code_v1, "sku", "v1 should NOT have SKU field yet");

// V2 ontology generates V2 code with NEW fields
assert_code_contains(&code_v2, "struct Product", "v2 should still have Product struct");
assert_code_contains(&code_v2, "sku: String", "v2 should have NEW SKU field from ontology");
assert_code_contains(&code_v2, "rating: f64", "v2 should have NEW rating field from ontology");
assert_code_contains(&code_v2, "inventory_count: i32", "v2 should have NEW inventory field from ontology");

// Verify code delta matches ontology delta
assert_eq!(code_diff.new_fields, 3, "Should have 3 new fields");
assert_eq!(code_diff.new_methods, 1, "Should have 1 new method");
}

Test Execution

# Run the Chicago TDD end-to-end test
cargo make test --test chicago_tdd ontology_driven_e2e -- --nocapture

# Output shows:
# [1/6] Parsing RDF...
# [2/6] Extracting project structure...
# [3/6] Validating project...
# [4/6] Live Preview...
# [5/6] Generating workspace structure...
# [6/6] Running post-generation hooks...
# ✅ Generation Complete!

Passing this test proves:

Deterministic RDF loading (Oxigraph)
Deterministic SPARQL execution (ordered results)
Deterministic code generation (same inputs → same outputs)
Deterministic evolution (ontology changes → code changes)

Determinism in Practice

Example 1: Same Inputs → Identical Outputs

# First generation
$ ggen gen rust/models.tmpl --graph product_catalog.ttl
Generated: src/models/product.rs (1,234 bytes)
Manifest key: sha256:a3f2c8b1e4d5f6a7...

# Second generation (identical inputs)
$ ggen gen rust/models.tmpl --graph product_catalog.ttl
Generated: src/models/product.rs (1,234 bytes)
Manifest key: sha256:a3f2c8b1e4d5f6a7...

# Verify byte-identical
$ sha256sum src/models/product.rs
a3f2c8b1e4d5f6a7... src/models/product.rs

Git diff shows ZERO changes:

$ git diff
# No output - files are identical

Example 2: Cross-Environment Consistency

Developer 1 (Mac):

$ ggen gen rust/models.tmpl --graph product_catalog.ttl
Manifest key: sha256:a3f2c8b1...

Developer 2 (Linux):

$ ggen gen rust/models.tmpl --graph product_catalog.ttl
Manifest key: sha256:a3f2c8b1...

CI Pipeline (Ubuntu):

$ ggen gen rust/models.tmpl --graph product_catalog.ttl
Manifest key: sha256:a3f2c8b1...

All three environments produce byte-identical outputs.

Example 3: Git-Friendly Diffs

Scenario: Add rating field to Product ontology.

# product_catalog.ttl
pc:Product a rdfs:Class .
pc:name rdfs:domain pc:Product ; rdfs:range xsd:string .
pc:price rdfs:domain pc:Product ; rdfs:range xsd:decimal .
+pc:rating rdfs:domain pc:Product ; rdfs:range xsd:decimal .

$ ggen gen rust/models.tmpl --graph product_catalog.ttl

Git diff shows ONLY the new field:

# src/models/product.rs
pub struct Product {
    pub name: String,
    pub price: f64,
+   pub rating: f64,
}

No random reordering. No unrelated changes. Just the semantic diff.

Version Locking with Gpacks

Marketplace gpacks use lockfiles to ensure version determinism.

Lockfile Structure

# ggen.lock
[lockfile]
version = "1.0"

[gpacks]
"io.ggen.rust.models" = "0.2.1"
"io.ggen.typescript.types" = "1.3.0"

[dependencies]
"io.ggen.rust.models" = {
    version = "0.2.1",
    source = "registry",
    checksum = "sha256:abc123def456..."
}
"io.ggen.macros.std" = {
    version = "0.2.0",
    source = "registry",
    checksum = "sha256:789ghi012jkl..."
}

Installing Specific Versions

# Install exact version
$ ggen add io.ggen.rust.models@0.2.1

# Lockfile records version
$ cat ggen.lock
[gpacks]
"io.ggen.rust.models" = "0.2.1"

# All future generations use locked version
$ ggen gen io.ggen.rust.models:models.tmpl --graph product_catalog.ttl
# Uses version 0.2.1 (locked)

Commit the lockfile:

$ git add ggen.lock
$ git commit -m "Lock gpack versions for deterministic builds"

Now CI and other developers use the EXACT same template versions.

Debugging Determinism Issues

Enable Trace Logging

# Show hash components during generation
$ GGEN_TRACE=1 ggen gen rust/models.tmpl --graph product_catalog.ttl

# Output:
# Manifest key calculation:
#   seed: 0x00000000
#   graph_hash: sha256:a3f2c8b1...
#   shapes_hash: sha256:e4d5f6a7...
#   frontmatter_hash: sha256:b8c9d0e1...
#   rows_hash: sha256:f2a3b4c5...
# → manifest_key: sha256:1234abcd...

Compare Manifest Keys

# Generate on machine A
$ ggen gen rust/models.tmpl --graph product_catalog.ttl
Manifest key: sha256:1234abcd...

# Generate on machine B
$ ggen gen rust/models.tmpl --graph product_catalog.ttl
Manifest key: sha256:5678efgh...  # ❌ Different!

# Enable tracing to find the difference
$ GGEN_TRACE=1 ggen gen rust/models.tmpl --graph product_catalog.ttl
# Check which hash component differs

Check SPARQL Ordering

Problem: Query results in different order.

Solution: Add ORDER BY to SPARQL query.

# Before (non-deterministic)
SELECT ?property ?datatype WHERE {
    ?property rdfs:domain ?class .
    ?property rdfs:range ?datatype .
}

# After (deterministic)
SELECT ?property ?datatype WHERE {
    ?property rdfs:domain ?class .
    ?property rdfs:range ?datatype .
}
ORDER BY ?property ?datatype

Best Practices for Deterministic Generation

Always use ORDER BY in SPARQL queries

SELECT ?x ?y WHERE { ... } ORDER BY ?x ?y

Pin gpack versions in production

ggen add io.ggen.rust.models@0.2.1  # Not @latest

Commit lockfiles to version control

git add ggen.lock
git commit -m "Lock template versions"

Validate in CI

# .github/workflows/codegen.yml
- name: Verify determinism
  run: |
    ggen gen rust/models.tmpl --graph product_catalog.ttl
    git diff --exit-code src/models/product.rs

Use canonical RDF formats
- Prefer Turtle (.ttl) for readability
- ggen canonicalizes to N-Quads internally

Avoid timestamps in templates

// ❌ Non-deterministic
// Generated at: {{ current_timestamp }}

// ✅ Deterministic
// Generated from: product_catalog.ttl

Test with Chicago TDD principles
- Use real RDF graphs (no mocks)
- Verify byte-identical regeneration
- Test ontology evolution scenarios

The Bottom Line

ggen's determinism guarantee:

Same inputs + Same environment = Byte-identical outputs

This is not a goal. It's a tested, validated, cryptographically-guaranteed property of the system.

The 782-line Chicago TDD test proves it. The SHA-256 manifest keys enforce it. The lockfiles preserve it.

You can trust ggen to generate the exact same code, every single time.

Keyboard shortcuts

ggen Documentation