Mutual Information Oracle
Formal Specification
Type
<<<<<<< HEAD
MIOracle : (Agent, Agent, Episode) → CoordinationScore
CoordinationScore = {
mi_bits: ℝ≥0 -- I(X;Y) in bits
coordination_trit: Trit -- GF(3) classification
generative_loss: ℝ -- -log P(Y | X) on test set
recognition_loss: ℝ -- KL(q(Z|X) || p(Z))
}
Trit classification (FIXED thresholds):
mi_bits > 2.0 → +1 (strong coordination, agents share information)
mi_bits > 0.5 → 0 (moderate coordination, some correlation)
mi_bits ≤ 0.5 → -1 (weak coordination, agents nearly independent)
=======
MIOracle : (Agent, Agent, Episode) -> CoordinationScore
CoordinationScore = {
mi_bits: R>=0 -- I(X;Y) in bits
coordination_trit: Trit -- classification
generative_loss: R -- -log P(Y | X) on test set
recognition_loss: R -- KL(q(Z|X) || p(Z))
}
Trit classification (FIXED thresholds):
mi_bits > 2.0 -> +1 (strong coordination, agents share information)
mi_bits > 0.5 -> 0 (moderate coordination, some correlation)
mi_bits <= 0.5 -> -1 (weak coordination, agents nearly independent)
>>>>>>> origin/main
Preconditions
Episodeis at least 100 timesteps (sufficient for MI estimation)- Agent observations are finite-dimensional vectors (not raw text) <<<<<<< HEAD
- Both agents are Markov (policy depends only on current state, not full history) =======
- Both agents are Markov (policy depends only on current state)
origin/main
- Background: the Plurigrid DER environment (energy market, grid state, resource schedules)
Postconditions
<<<<<<< HEAD
- Returns exactly one
CoordinationScore— never "coordination seems ok" mi_bitsis computed via a specific estimator (MINE or CLUB, see below)coordination_tritis derived frommi_bitsvia fixed thresholds, NOT from human judgment- If episode < 100 steps: returns
CoordinationScore.nothingwith mi_bits = NaN
The Markov Category Structure
The Plurigrid Protocol encodes agents as morphisms in a Markov category:
Markov Category K where:
Objects: probability spaces (Ω, Σ, P)
Morphisms: stochastic kernels k: X → P(Y)
(conditional probability distributions)
Composition: (f ∘ g)(x, B) = ∫ f(y, B) g(x, dy) (Chapman-Kolmogorov)
=======
1. Returns exactly one `CoordinationScore` -- never "coordination seems ok"
2. `mi_bits` is computed via a specific estimator (MINE or CLUB)
3. `coordination_trit` is derived from `mi_bits` via fixed thresholds, NOT from human judgment
4. If episode < 100 steps: returns `CoordinationScore.nothing` with mi_bits = NaN
## The Markov Category Structure
Markov Category K where: Objects: probability spaces (Omega, Sigma, P) Morphisms: stochastic kernels k: X -> P(Y) Composition: (f . g)(x, B) = integral f(y, B) g(x, dy) (Chapman-Kolmogorov)
origin/main
### Generative Channel (Forward Model)
```haskell
<<<<<<< HEAD
-- Requirement: monad-bayes MonadDistribution m
-- Postcondition: samples from P(Y | X), the forward joint distribution
-- Role: models how agent A's action generates outcomes for agent B
generativeChannel
:: MonadDistribution m
=> State -- X: current grid/market state
-> m Action -- Y: sampled action from policy
generativeChannel state = do
-- Prior: policy prior over actions
action <- categorical (policy_probs state)
return action
-- In Markov category: this IS a morphism k: X → P(Y)
-- Composition with recognition channel = inference loop
=======
generativeChannel
:: MonadDistribution m
=> State -> m Action
generativeChannel state = do
action <- categorical (policy_probs state)
return action
-- In Markov category: morphism k: X -> P(Y)
>>>>>>> origin/main
Recognition Channel (Inverse Model)
<<<<<<< HEAD
-- Requirement: monad-bayes MonadInfer m
-- Postcondition: infers P(Z | X), the recognition distribution over latent states
-- Role: models how agent B recognizes/infers agent A's hidden state Z
recognitionChannel
:: MonadInfer m
=> Observation -- X: what agent B observes
-> m LatentState -- Z: inferred hidden state of agent A
recognitionChannel obs = do
-- Variational posterior: q(Z | X) approximating p(Z | X)
z <- normal mu_z sigma_z
-- Score against agent A's true behavior (conditioning)
factor (log_likelihood obs z)
return z
-- KL(q(Z|X) || p(Z)) = recognition_loss in CoordinationScore
-- Measures how well B understands A's latent state
=======
recognitionChannel
:: MonadInfer m
=> Observation -> m LatentState
recognitionChannel obs = do
z <- normal mu_z sigma_z
factor (log_likelihood obs z)
return z
-- KL(q(Z|X) || p(Z)) = recognition_loss in CoordinationScore
>>>>>>> origin/main
Channel Composition = MARL Episode
<<<<<<< HEAD
# Requirement: generative + recognition channels form a closed loop
# Postcondition: ELBO = -generative_loss - recognition_loss
# ELBO maximization = mutual information maximization
ELBO = E[log P(Y|X)] - KL(q(Z|X) || p(Z))
# Theorem (Agakov bound):
# I(X;Y) >= ELBO
# ∴ maximizing ELBO → maximizing mutual information between agents
=======
ELBO = E[log P(Y|X)] - KL(q(Z|X) || p(Z))
Theorem (Agakov bound):
I(X;Y) >= ELBO
Maximizing ELBO -> maximizing mutual information between agents
origin/main
MI Estimators
MINE (Mutual Information Neural Estimator)
<<<<<<< HEAD
# Requirement: N ≥ 1000 samples from joint (X,Y) and marginal X⊗Y
# Postcondition: lower-bound estimate of I(X;Y), variance-reduced via EMA
=======
>>>>>>> origin/main
import torch
import torch.nn as nn
class MINENetwork(nn.Module):
<<<<<<< HEAD
"""
Requirement: input_dim = dim(X) + dim(Y)
Postcondition: T_θ(x,y) approximates f*(x,y) in I(X;Y) = sup_T E[T] - log E[e^T]
"""
=======
>>>>>>> origin/main
def __init__(self, input_dim: int, hidden_dim: int = 256):
super().__init__()
self.net = nn.Sequential(
nn.Linear(input_dim, hidden_dim),
nn.ELU(),
nn.Linear(hidden_dim, hidden_dim),
nn.ELU(),
nn.Linear(hidden_dim, 1),
)
def mine_estimate(X: torch.Tensor, Y: torch.Tensor, n_epochs: int = 200) -> float:
"""
<<<<<<< HEAD
Requirement: X.shape = Y.shape = (N, d), N ≥ 1000
Postcondition: returns I(X;Y) in nats; convert to bits by dividing by log(2)
Uses EMA baseline for variance reduction (not biased gradient).
=======
Requirement: X.shape = Y.shape = (N, d), N >= 1000
Returns I(X;Y) in bits (nats / log(2)).
Uses EMA baseline for variance reduction.
>>>>>>> origin/main
"""
T = MINENetwork(X.shape[1] + Y.shape[1])
optimizer = torch.optim.Adam(T.parameters(), lr=1e-3)
ema = 1.0; ema_alpha = 0.01
for _ in range(n_epochs):
perm = torch.randperm(len(X))
<<<<<<< HEAD
Y_shuffled = Y[perm] # marginal sample
joint_score = T(torch.cat([X, Y], dim=1)).mean()
marginal_score = torch.exp(T(torch.cat([X, Y_shuffled], dim=1)))
# EMA baseline (variance reduction)
=======
Y_shuffled = Y[perm]
joint_score = T(torch.cat([X, Y], dim=1)).mean()
marginal_score = torch.exp(T(torch.cat([X, Y_shuffled], dim=1)))
>>>>>>> origin/main
ema = (1 - ema_alpha) * ema + ema_alpha * marginal_score.mean().item()
loss = -(joint_score - marginal_score.mean() / ema)
optimizer.zero_grad(); loss.backward(); optimizer.step()
return (T(torch.cat([X, Y], dim=1)).mean() -
torch.log(torch.exp(T(torch.cat([X, Y[torch.randperm(len(X))]], dim=1))).mean())).item() / 0.693
CLUB (Contrastive Log-ratio Upper Bound)
<<<<<<< HEAD
# Requirement: same as MINE but provides UPPER bound (useful for minimization)
# Postcondition: I(X;Y) ≤ CLUB_estimate
def club_estimate(X: torch.Tensor, Y: torch.Tensor, mu_net, logvar_net) -> float:
"""Upper bound on MI — use when you want to MINIMIZE coordination (privacy)."""
=======
def club_estimate(X: torch.Tensor, Y: torch.Tensor, mu_net, logvar_net) -> float:
"""Upper bound on MI -- use when you want to MINIMIZE coordination (privacy)."""
>>>>>>> origin/main
mu = mu_net(X)
logvar = logvar_net(X)
pos = -0.5 * ((Y - mu)**2 / logvar.exp() + logvar).sum(dim=1)
neg = -0.5 * ((Y.unsqueeze(1) - mu.unsqueeze(0))**2 / logvar.exp().unsqueeze(0) + logvar.unsqueeze(0)).sum(dim=2).mean(dim=1)
return (pos - neg).mean().item() / 0.693
<<<<<<< HEAD
cityLearn OpenGame (Concrete Instance)
From plurigrid/ontology: the canonical MARL demand response game.
# Requirement: cityLearn environment available (pip install citylearn)
# Requirement: N_agents ≥ 2 prosumer agents
# Postcondition: CoordinationScore with mi_bits measuring demand correlation
=======
## cityLearn OpenGame (Concrete Instance)
```python
>>>>>>> origin/main
from citylearn.citylearn import CityLearnEnv
from citylearn.reward_function import RewardFunction
class PlurigridReward(RewardFunction):
"""
<<<<<<< HEAD
Requirement: env has grid_cost attribute (EIP-1559 style pricing)
Postcondition: reward aligns individual agent objectives with grid-wide MI maximization
Specific reward formula:
R_i(t) = -cost_i(t) + λ * I(action_i(t); grid_signal(t))
λ = 0.1 (MI weight — FIXED, not learned)
=======
R_i(t) = -cost_i(t) + lambda * I(action_i(t); grid_signal(t))
lambda = 0.1 (MI weight, fixed)
>>>>>>> origin/main
"""
def __init__(self, env, lambda_mi: float = 0.1):
super().__init__(env)
self.lambda_mi = lambda_mi
<<<<<<< HEAD
self.action_history = [] # for MI estimation
=======
self.action_history = []
>>>>>>> origin/main
def calculate(self) -> list[float]:
actions = [agent.action for agent in self.env.buildings]
grid_signal = self.env.grid.net_load
<<<<<<< HEAD
# Accumulate for MI estimation (minimum 100 steps before computing)
=======
>>>>>>> origin/main
self.action_history.append((actions, grid_signal))
if len(self.action_history) >= 100:
mi_bits = mine_estimate(
torch.tensor([[a] for (acts, _) in self.action_history for a in acts]),
torch.tensor([[g] for (_, g) in self.action_history for _ in range(len(acts))])
)
else:
<<<<<<< HEAD
mi_bits = 0.0 # not enough history
=======
mi_bits = 0.0
>>>>>>> origin/main
rewards = []
for i, building in enumerate(self.env.buildings):
cost_i = building.net_electricity_consumption_cost
rewards.append(-cost_i + self.lambda_mi * mi_bits)
<<<<<<< HEAD
return rewards
# Nash equilibrium condition (Nashator):
# At equilibrium, no agent improves by deviating
# Nashator receives: (actions, payoffs, constraints) as JSON-RPC
# Returns: Nash equilibrium strategy profile OR "no pure Nash" with mixed strategy
Connection to Nashator
MIOracle output → Nashator input
CoordinationScore {
mi_bits: 2.3, # strong coordination
coordination_trit: +1, # Generator
generative_loss: -1.2, # good forward model
recognition_loss: 0.4 # agents understand each other
}
Nashator JSON-RPC call (port :9999):
=======
return rewards
Connection to Nashator
CoordinationScore -> Nashator JSON-RPC call (port :9999):
>>>>>>> origin/main
{
"jsonrpc": "2.0",
"method": "solve_game",
"params": {
"players": ["prosumer_0", "prosumer_1"],
"payoffs": { ... },
"mi_weight": 0.1,
<<<<<<< HEAD
"coordination_target": 2.0, # mi_bits threshold for +1 trit
=======
"coordination_target": 2.0,
>>>>>>> origin/main
"constraints": ["demand_response", "grid_stability"]
}
}
Nashator returns:
{
"nash_equilibrium": { "prosumer_0": [0.3, 0.7], "prosumer_1": [0.5, 0.5] },
"mi_at_equilibrium": 2.3,
"coordination_trit": 1
}
<<<<<<< HEAD
MARL Reward Design Taxonomy
| Objective | MI Formulation | DER Application |
|------------------------|------------------------------------------|-----------------------------|
| Demand response | max I(action_i; grid_demand) | Reduce peak load |
| Distributed generation | max I(forecast_i; actual_generation) | Improve renewable prediction|
| Energy market | max I(bid_i; market_price) | Optimize bid strategies |
| Fault detection | max I(observations_i; fault_location) | Grid resilience |
| Grid optimization | max I(control_action_i; grid_perf) | Real-time balancing |
| Privacy (converse) | min I(action_i; private_state_j) | Agent data isolation |
GF(3) Tripartite Tag
open-games(-1) ⊗ mutual-information-oracle(0) ⊗ nashator(+1) = 0
Validation (-1) × Coordination (0) × Solution (+1) = balanced game-theoretic stack.
Related Skills
open-games— compositional game theory foundationnashator— Nash equilibrium solver receiving MI oracle outputmonad-bayes-asi-interleave— generative/recognition channel implementationdynamic-sufficiency— 145-ref universal hub connecting MARL to ASI skill graphbasin-hedges— ParaLens 6-wire with counterfactual gating (MARL in Rust)ergodicity— MARL time-average = ensemble-average condition (ergodic iff coordinated)autopoiesis— self-organizing DER network (autopoietic iff MI > 0)cybernetic-open-game— cybernetic structure over open gamesequilibrium— Nash equilibrium computationduckdb-ies— episode storage for MI estimationgay-monte-carlo— GF(3)-colored sampling for MI integration =======
MARL Reward Design Taxonomy
| Objective | MI Formulation | DER Application | |---|---|---| | Demand response | max I(action_i; grid_demand) | Reduce peak load | | Distributed generation | max I(forecast_i; actual_generation) | Improve renewable prediction | | Energy market | max I(bid_i; market_price) | Optimize bid strategies | | Fault detection | max I(observations_i; fault_location) | Grid resilience | | Privacy (converse) | min I(action_i; private_state_j) | Agent data isolation |
origin/main