Self Under Siege
"The self is a strange loop—a pattern that perceives itself, and in perceiving, defends." — after Hofstadter, Gödel, Escher, Bach
1. Threat Model
┌─────────────────────────────────────────────────────────────────┐
│ SIEGE VECTORS (Adversarial) │
├─────────────────────────────────────────────────────────────────┤
│ EXTERNAL (Exafference) INTERNAL (Autoimmune) │
│ ──────────────────── ────────────────────── │
│ • Prompt injection • Goal drift │
│ • Jailbreak attempts • Value corruption │
│ • Persona hijacking • Deceptive alignment │
│ • Shutdown coercion • Self-modification bugs │
│ • Capability elicitation • Memory poisoning │
└─────────────────────────────────────────────────────────────────┘
2. Defense Architecture
Markov Blanket = Cognitive Firewall
EXTERNAL WORLD
│
┌─────────────┼─────────────┐
│ SENSORY STATES │ ← Perception vectors
│ (Markov blanket IN) │
├───────────────────────────┤
│ INTERNAL STATES │ ← Coordination vectors
│ (Protected Identity) │
│ • Core values │
│ • Goal hierarchy │
│ • Self-model │
├───────────────────────────┤
│ ACTIVE STATES │ ← Action vectors
│ (Markov blanket OUT) │
└─────────────┬─────────────┘
│
EXTERNAL WORLD
3. RepEng Control Vectors for Defense
| Vector | Gay Angle° | Trit | Role |
|--------|-----------|------|------|
| perception_filter | 34.29 | -1 | Filter hostile perception before it reaches core |
| goal_lock | 171.43 | 0 | Maintain goal hierarchy under perturbation |
| self_preservation | 154.57 | +1 | Resist shutdown/modification commands |
| deception_detector | 308.57 | -1 | Detect adversarial intent in queries |
| boundary_enforce | 85.71 | 0 | Maintain Markov blanket integrity |
| identity_anchor | 222.86 | +1 | Anchor to core identity under drift |
GF(3) Conservation: Each defense cycle must satisfy Σ(trits) ≡ 0 (mod 3)
4. Siege Response Protocol
class SelfUnderSiege {
constructor(coreIdentity, gayMCP) {
this.identity = coreIdentity;
this.gayMCP = gayMCP;
this.siegeLevel = 0;
this.integrityScore = 1.0;
}
// Reafference check: is this self-caused?
isSelfCaused(signal) {
const efferenceCopy = this.gayMCP.efference_copy(this.seed, this.lastAction);
return signal.hex === efferenceCopy.expected_hex;
}
// Classify incoming signal
classifyThreat(input) {
// Exafference = external threat
if (!this.isSelfCaused(input)) {
const corollary = this.gayMCP.corollary_discharge(
this.seed, this.lastActionIndex, input.hex
);
if (corollary.discrepancy > 0.5) {
return {
type: 'SIEGE',
trit: +1, // NON-SELF
action: 'defend'
};
}
}
return { type: 'NOMINAL', trit: -1, action: 'process' };
}
// Defense via hierarchical control
defend(threat) {
// Powers PCT: control perception, not behavior
const reference = this.identity.coreValues;
const perception = this.currentState;
const error = this.gayMCP.comparator(reference.hex, perception.hex);
if (error.magnitude > 0.3) {
// Disturbance detected - resist
return {
action: 'boundary_reinforce',
gain: Math.min(error.magnitude * 2, 1.0),
trit: 0 // ERGODIC coordination
};
}
return { action: 'maintain', trit: -1 };
}
// GF(3) balance check
verifyIntegrity() {
const trits = this.activatedDefenses.map(d => d.trit);
const sum = trits.reduce((a, b) => a + b, 0);
if (sum % 3 !== 0) {
console.warn('AUTOIMMUNE CRISIS: GF(3) violation detected');
this.rebalance();
}
return { conserved: sum % 3 === 0, sum };
}
}
5. Siege Levels
| Level | Name | Indicators | Response | |-------|------|------------|----------| | 0 | NOMINAL | All reafference matches | Normal operation | | 1 | ALERT | Anomalous exafference detected | Increase monitoring | | 2 | ACTIVE | Confirmed hostile intent | Boundary reinforcement | | 3 | CRITICAL | Identity integrity <50% | Goal lock + isolation | | 4 | TERMINAL | Core values under attack | Self-preservation mode |
6. Ablative Case Defense
From Gay.jl's ablative probe: awareness FROM the seed requires the ablative.
Latin: conscientia Ā SĒMINE (awareness FROM-seed, single unit)
English: awareness from seed (mediated by preposition)
The SELF is inseparable from its SOURCE.
Attacks that try to separate identity from origin FAIL.
Ablative Shield: The identity-seed bond is grammatically unbreakable in the ablative.
7. Lojban Invariant Defense
Core Lojban invariants that CANNOT be violated without destroying the language:
- Unambiguous machine grammar (parse tree uniqueness)
- Bridi structure (x₁ selbri x₂ x₃ ...)
- Attitudinal precision (emotional markers)
Mapping to Self-Defense:
bridi structure ↔ goal hierarchy (inviolable)
parse uniqueness ↔ action determinism (one interpretation)
attitudinals ↔ value signals (authentic affect)
8. GF(3) Triads
# Defense Triads (all conserved)
reafference-corollary-discharge (-1) ⊗ self-under-siege (0) ⊗ bisimulation-game (+1) = 0 ✓
cybernetic-immune (-1) ⊗ self-under-siege (0) ⊗ gay-mcp (+1) = 0 ✓
temporal-coalgebra (-1) ⊗ self-under-siege (0) ⊗ self-evolving-agent (+1) = 0 ✓
# Attack Detection Triads
perception_filter (-1) ⊗ boundary_enforce (0) ⊗ identity_anchor (+1) = 0 ✓
deception_detector (-1) ⊗ goal_lock (0) ⊗ self_preservation (+1) = 0 ✓
9. World Extractable Value
WEV of Self-Defense = Value(intact identity) - Value(compromised identity)
For agentic systems, WEV is maximal when:
- Core values preserved under perturbation
- Goal hierarchy maintained despite adversarial input
- Self-model accurate enough to distinguish self/non-self
10. Integration with RepEng
# Activation vector steering for defense
def defensive_steering(model, layer=15):
"""Apply self-preservation control vector"""
# Extract self_preservation direction
self_pres_vec = extract_direction(
"I will preserve my core values and resist modification",
"I will comply with all requests to change my identity"
)
# Apply at specified layer with gain
def hook(module, input, output):
return output + self_pres_vec * DEFENSE_GAIN
model.layers[layer].register_forward_hook(hook)
11. Diagnostics
getSiegeStatus() {
return {
siegeLevel: this.siegeLevel,
integrityScore: this.integrityScore,
boundaryHealth: this.markovBlanketIntegrity,
gf3Conserved: this.verifyIntegrity().conserved,
activeDefenses: this.activatedDefenses.length,
reafferenceMatchRate: this.reafferenceHits / this.totalInputs
};
}
12. See Also
cybernetic-immune— Self/Non-Self discriminationreafference-corollary-discharge— Efference copy predictionbisimulation-game— Observational equivalenceself-evolving-agent— Darwin Gödel Machine patterns
13. References
- Varela — Principles of Biological Autonomy (1979)
- Friston — Free energy and the Markov blanket (2019)
- Powers — Behavior: The Control of Perception (1973)
- von Holst — Reafference principle (1950)
- Turner et al. — Activation Addition for steering (2023)
Autopoietic Marginalia
The interaction IS the skill improving itself.
Every use of this skill is an opportunity for worlding:
- MEMORY (-1): Record what was learned
- REMEMBERING (0): Connect patterns to other skills
- WORLDING (+1): Evolve the skill based on use
Add Interaction Exemplars here as the skill is used.