Agent Skills: Compositional ACSet Comparison Skill

Compare data structures (DuckDB, LanceDB) via ACSets with persistent homology coverage analysis and geometric morphism translation.

UncategorizedID: plurigrid/asi/compositional-acset-comparison

Install this agent skill to your local

pnpm dlx add-skill https://github.com/plurigrid/asi/tree/HEAD/plugins/asi/skills/compositional-acset-comparison

Skill Files

Browse the full folder contents for compositional-acset-comparison.

Download Skill

Loading file tree…

plugins/asi/skills/compositional-acset-comparison/SKILL.md

Skill Metadata

Name
compositional-acset-comparison
Description
Compositional algorithm and data analysis via algebraic databases

Compositional ACSet Comparison Skill

Trit: 0 (ERGODIC - Coordinator) Color: #26D826 (Green) Domain: Compositional algorithm/data analysis via algebraic databases

Homoiconic Insight

In self-hosted Lisps, the boundary between data structures and algorithms dissolves:

  • Code is data, data is code (homoiconicity)
  • Evaluation time is phase-scoped (RED/BLUE/GREEN gadgets)
  • Entanglement avoided by leaving phases open until explicitly closed
  • Compositional structure preserved across algorithm ↔ data boundary

Overview

Compare data structures and their properties (density/sparsity, dynamic/static, versioning strategies) using the richness afforded by ACSets. Uses Gay.jl-aided superrandom walks for deterministic exploration of comparison dimensions.

Canonical Triads

schema-validation (-1) ⊗ compositional-acset-comparison (0) ⊗ gay-mcp (+1) = 0 ✓  [Property Analysis]
three-match (-1) ⊗ compositional-acset-comparison (0) ⊗ koopman-generator (+1) = 0 ✓  [Dynamic Traversal]
temporal-coalgebra (-1) ⊗ compositional-acset-comparison (0) ⊗ oapply-colimit (+1) = 0 ✓  [Versioning]
polyglot-spi (-1) ⊗ compositional-acset-comparison (0) ⊗ gay-mcp (+1) = 0 ✓  [Homoiconic Interop]

Golden Thread Walk Dimensions

Each dimension is explored via φ-angle (137.508°) golden spiral for maximal dispersion:

| Step | Dimension | Hex Color | Hue | |------|-----------|-----------|-----| | 1 | Storage Hierarchy | #EE2B2B | 0° | | 2 | Density/Sparsity | #2BEE64 | 137.51° | | 3 | Dynamic/Static | #9D2BEE | 275.02° | | 4 | Versioning Strategy | #EED52B | 52.52° | | 5 | Traversal Patterns | #2BCDEE | 190.03° | | 6 | Index Structures | #EE2B94 | 327.54° | | 7 | Compression | #5BEE2B | 105.05° | | 8 | Query Model | #332BEE | 242.55° | | 9 | Embedding Support | #EE6C2B | 20.06° | | 10 | Interoperability | #2BEEA5 | 157.57° | | 11 | Concurrency | #DE2BEE | 295.08° | | 12 | Memory Model | #C5EE2B | 72.59° |

Comparison Matrix: DuckDB vs LanceDB

Dimension 1: Storage Hierarchy (#EE2B2B)

DuckDB                          LanceDB
──────                          ───────
Table                           Database
  └─RowGroup (122K rows)          └─Table
      └─Column                        └─Manifest (version)
          └─Segment                       └─Fragment
              └─Block                         └─Column
                                                  └─VectorColumn

ACSet Morphism Depth:

  • DuckDB: 4 levels (Table→RowGroup→Column→Segment)
  • LanceDB: 5 levels (Database→Table→Manifest→Fragment→Column)

Dimension 2: Density/Sparsity (#2BEE64)

| Property | DuckDB | LanceDB | |----------|--------|---------| | Default | Dense columnar | Dense Arrow arrays | | Sparse Support | Via NULL bitmask | Via Arrow validity bitmask | | Vector Sparsity | N/A | Sparse via IVF partitioning | | Storage Efficiency | ALP, ZSTD compression | Lance columnar format | | ACSet Rep | DenseFinColumn | DenseFinColumn with VectorColumn extension |

Density Formula:

density(acset, obj) = nparts(acset, obj) / theoretical_max(acset, obj)
# DuckDB Segment: ~2048 rows per vector batch
# LanceDB Fragment: variable, optimized for vector search

Dimension 3: Dynamic/Static (#9D2BEE)

| Property | DuckDB | LanceDB | |----------|--------|---------| | Schema Evolution | ALTER TABLE | Manifest versioning | | Row Updates | In-place (TRANSIENT→PERSISTENT) | Append + compaction | | Index Updates | Dynamic B-Tree/ART | Rebuild IVF partitions | | ACSet Mutation | set_subpart!, rem_part! | Append-only, version chains |

State Machine:

DuckDB Segment: TRANSIENT ⟷ PERSISTENT (bidirectional)
LanceDB Manifest: V1 → V2 → V3 → ... (append-only chain)

Dimension 4: Versioning Strategy (#EED52B) ⭐ Lance SDK 1.0.0

Critical Update (December 15, 2025): Lance SDK adopts SemVer 1.0.0

| Component | Versioning | Strategy | |-----------|------------|----------| | Lance SDK | SemVer 1.0.0 | MAJOR.MINOR.PATCH | | Lance File Format | 2.1 | Binary compatibility, independent | | Lance Table Format | Feature flags | Full backward compat, no linear versions | | Lance Namespace Spec | Per-operation | Iceberg REST Catalog style |

Key Insight: Breaking SDK changes will NOT invalidate existing Lance data.

# ACSet representation of versioning strategies
@present SchVersioning(FreeSchema) begin
  SDKVersion::Ob      # SemVer (1.0.0)
  FileFormat::Ob      # Binary compat (2.1)
  TableFormat::Ob     # Feature flags
  NamespaceSpec::Ob   # Per-operation
  
  # Morphisms: SDK ≠ Format
  sdk_file::Hom(SDKVersion, FileFormat)      # Many-to-one
  file_table::Hom(FileFormat, TableFormat)   # Independent
  table_ns::Hom(TableFormat, NamespaceSpec)  # Independent
end

DuckDB Versioning:

  • Temporal tables via VERSION AT
  • Extension versioning separate from core

Dimension 5: Traversal Patterns (#2BCDEE)

| Pattern | DuckDB | LanceDB | |---------|--------|---------| | Sequential Scan | RowGroup→Column→Segment | Fragment→Column | | Index Scan | ART/B-Tree navigation | IVF partition probe | | Vector Search | N/A (extension) | Centroid→Partition→Rows | | Time Travel | FOR SYSTEM_TIME AS OF | checkout(version) |

ACSet Incident Queries:

# DuckDB: Find all segments in a column
incident(duckdb_acset, col_id, :column)

# LanceDB: Find all centroids for an index
incident(lancedb_acset, idx_id, :partition_index) |>
  flatmap(p -> incident(lancedb_acset, p, :centroid_partition))

Dimension 6: Index Structures (#EE2B94)

| Index Type | DuckDB | LanceDB | |------------|--------|---------| | Primary | None (heap) | None (Lance format) | | Secondary | ART (Radix Tree) | Scalar indexes | | Vector | Extension (vss) | IVF_PQ, IVF_HNSW_SQ, IVF_HNSW_PQ | | Full-Text | Extension (fts) | N/A |

ACSet Index Representation:

# LanceDB vector index hierarchy
VectorIndex → Partition → Centroid
    ↓
index_column → VectorColumn → Column

Dimension 7: Compression (#5BEE2B)

| Algorithm | DuckDB | LanceDB | |-----------|--------|---------| | Numeric | ALP (Adaptive Lossless) | Arrow encoding | | String | Dictionary, FSST | Dictionary | | General | ZSTD, LZ4 | ZSTD | | Vector | N/A | PQ (Product Quantization) |

Dimension 8: Query Model (#332BEE)

| Aspect | DuckDB | LanceDB | |--------|--------|---------| | Language | SQL | Python/Rust API + SQL filter | | Optimization | Volcano/push-based | Vector-first + filter | | Execution | Vectorized (2048 batch) | Arrow RecordBatch | | Parallelism | Morsel-driven | Partition-parallel |

Dimension 9: Embedding Support (#EE6C2B)

| Feature | DuckDB | LanceDB | |---------|--------|---------| | Native | No | Yes (FixedSizeList<Float>) | | Generation | UDF/Extension | EmbeddingFunction registry | | Storage | ARRAY type | VectorColumn | | Search | Extension (vss) | Native (IVF, HNSW) |

Dimension 10: Interoperability (#2BEEA5)

| Format | DuckDB | LanceDB | |--------|--------|---------| | Arrow | Full support | Native (Lance = Arrow extension) | | Parquet | Read/Write | Read (convert to Lance) | | CSV/JSON | Read/Write | Via Arrow | | ACSets | Via Tables.jl | Via Arrow → Tables.jl |

Cross-Language (from ACSets Intertypes):

# Generate interoperable types
generate_module(DuckDBACSet, [PydanticTarget, JacksonTarget])
generate_module(LanceDBACSet, [PydanticTarget, JacksonTarget])

Dimension 11: Concurrency (#DE2BEE)

| Aspect | DuckDB | LanceDB | |--------|--------|---------| | Model | MVCC | Optimistic (manifest-based) | | Writers | Single (or WAL) | Single (append) | | Readers | Unlimited concurrent | Unlimited concurrent | | Isolation | Snapshot | Version snapshot |

Dimension 12: Memory Model (#C5EE2B)

| Aspect | DuckDB | LanceDB | |--------|--------|---------| | Buffer Pool | BufferManager | Memory-mapped Arrow | | Eviction | LRU | OS page cache | | Allocation | Unified allocator | Arrow allocator | | Out-of-Core | Automatic spill | Lazy loading |

Interleaved 3-Stream Comparison

Using GF(3) conservation for balanced parallel analysis:

Stream 1 (Blue, -1): Validation/Constraints
  #31945E → #B3DA86 → #8810F2 → #2F5194 → #2452AA → #245FB4

Stream 2 (Green, 0): Coordination/Transport
  #6D59D2 → #9E2981 → #72E24F → #31C5B4 → #C04DDD → #1C8EEE

Stream 3 (Red, +1): Generation/Composition
  #E22FA7 → #E812C8 → #6F68E6 → #25D840 → #DA387F → #A82358

Crystal Family Analogy

Data structures map to crystal symmetry:

| Crystal Family | Symmetry | DuckDB Analog | LanceDB Analog | |----------------|----------|---------------|----------------| | Cubic (#9E94DD) | Order 48 | RowGroup uniformity | Fragment uniformity | | Hexagonal (#65F475) | Order 24 | Column types | Vector dimensions | | Tetragonal (#E764F1) | Order 16 | Segment blocking | Partition structure | | Orthorhombic (#2ADC56) | Order 8 | Type system | Index types | | Monoclinic (#CD7B61) | Order 4 | Compression | Quantization | | Triclinic (#E4338F) | Order 2 | Raw storage | Raw Arrow |

Hierarchical Control Palette

Powers PCT cascade for harmonious comparison:

Level 5 (Program): "Compare DuckDB vs LanceDB"
    ↓ sets reference for
Level 4 (Transition): Dimension sequence [30° steps]
    ↓ sets reference for
Level 3 (Configuration): Property relationships
    ↓ sets reference for
Level 2 (Sensation): Individual metrics
    ↓ sets reference for
Level 1 (Intensity): Numeric values

Colors: #B322C0 → #D5268C → #DC3946 → #DF884A → #E0D551 → #A3E04E

XY Model Phenomenology

At τ=0.5 (ordered phase, τ < τ_c=0.893):

  • Smooth field, defects bound in pairs
  • High valence, disentangled
  • Antivortex at (4,3): #C33567

Interpretation: Both DuckDB and LanceDB are in "ordered phase" - mature, production-ready systems with well-defined structures.

Usage

using ACSets, Catlab

# Load both schemas
include("DuckDBACSet.jl")
include("LanceDBACSet.jl")

# Compare morphism structures
compare_schemas(SchDuckDB, SchLanceDB)

# Analyze density
density_analysis = map([SchDuckDB, SchLanceDB]) do sch
  Dict(ob => sparsity_metric(sch, ob) for ob in obs(sch))
end

# Traverse with Gay.jl colors
for (i, dimension) in enumerate(DIMENSIONS)
  color = gay_color_at(1000000, i)
  analyze_dimension(dimension, color)
end

Skill Files

| File | Purpose | Gay.jl Seed | |------|---------|-------------| | DuckDBACSet.jl | Schema for DuckDB storage layer | 1000000 | | LanceDBACSet.jl | Schema for LanceDB vector store | 1000000 | | IrreversibleMorphisms.jl | Analysis of lossy morphisms | 2000000 | | SideBySideComparison.jl | Visual comparison tables | 3000000 | | ComparisonUtils.jl | 12-dimension comparison utilities | 1000000 | | GhristCoverage.jl | Persistent homology coverage analysis | 4000000 | | ColoringFunctor.jl | Schema coloring + GF(3) verification | 4000000 | | GeometricMorphism.jl | Presheaf topos translation analysis | 4000000 |

Ghrist Persistent Homology Integration

Based on de Silva & Ghrist "Coverage in Sensor Networks via Persistent Homology":

AM Radio Coverage Analogy:

  • Radio stations = Schema objects (Table, Column, etc.)
  • Coverage radius = Morphism composability range
  • Signal overlap = Translatable concepts between schemas
  • Dead zones = Irreversible information loss

Betti Numbers for Schemas:

  • β₀: Connected components (isolated subsystems)
  • β₁: Coverage holes (information flow gaps)
  • β₂: Enclosed voids (unreachable regions)

Persistent Holes (never die):

  • 🔴 parent_manifest: Temporal irreversibility (version chain)
  • 🔴 source_column: Semantic irreversibility (embedding loss)

Geometric Morphism Analysis

For presheaf topoi PSh(SchDuckDB) and PSh(SchLanceDB):

Essential Image (lossless translation):

  • Table ↔ Table ✓
  • Column ↔ Column ✓

Partial Coverage (lossy translation):

  • RowGroup ~ Fragment
  • VectorColumn → Column (loses vector semantics)

Dead Zones (no translation):

  • Segment → ??? (DuckDB-only)
  • Manifest ← ??? (LanceDB-only)
  • VectorIndex ← ??? (LanceDB-only)

References

Scientific Skill Interleaving

This skill connects to the K-Dense-AI/claude-scientific-skills ecosystem:

Annotated Data

  • anndata [○] via bicomodule
    • Hub for annotated matrices

Bibliography References

  • general: 734 citations in bib.duckdb

Cat# Integration

This skill maps to Cat# = Comod(P) as a bicomodule in the equipment structure:

Trit: 0 (ERGODIC)
Home: Prof
Poly Op: ⊗
Kan Role: Adj
Color: #26D826

GF(3) Naturality

The skill participates in triads satisfying:

(-1) + (0) + (+1) ≡ 0 (mod 3)

This ensures compositional coherence in the Cat# equipment structure.