Agent Skills: OneNote — Rate Limit Handling & Request Throttling

|

UncategorizedID: jeremylongshore/claude-code-plugins-plus-skills/onenote-rate-limits

Install this agent skill to your local

pnpm dlx add-skill https://github.com/jeremylongshore/claude-code-plugins-plus-skills/tree/HEAD/plugins/saas-packs/onenote-pack/skills/onenote-rate-limits

Skill Files

Browse the full folder contents for onenote-rate-limits.

Download Skill

Loading file tree…

plugins/saas-packs/onenote-pack/skills/onenote-rate-limits/SKILL.md

Skill Metadata

Name
onenote-rate-limits
Description
|

OneNote — Rate Limit Handling & Request Throttling

Overview

Microsoft Graph rate limits OneNote at 600 requests per 60 seconds per user and 10,000 requests per 10 minutes per app/tenant. When you exceed either limit, the API returns 429 Too Many Requests with a Retry-After header specifying how many seconds to wait. Most implementations either ignore this header entirely (retrying immediately, making things worse) or use a fixed backoff that wastes capacity.

This skill implements a token bucket rate limiter, queue-based request throttling, and proper Retry-After header parsing. For multi-user apps, it tracks per-user and per-tenant budgets independently.

Key pain points addressed:

  • The Retry-After header value is in seconds (not milliseconds) — many implementations parse this wrong
  • The per-user limit (600/60s) is separate from the per-tenant limit (10,000/10min) — you can hit one without the other
  • Batch requests ($batch) count as one request toward the limit, regardless of how many operations are inside
  • After a 429, subsequent requests to ANY OneNote endpoint are throttled — not just the endpoint that triggered it

Prerequisites

  • Azure app registration with delegated permissions: Notes.ReadWrite
  • App-only auth deprecated March 31, 2025 — use delegated auth only
  • Python: pip install msgraph-sdk azure-identity
  • Node/TypeScript: npm install @microsoft/microsoft-graph-client @azure/identity @azure/msal-node
  • Optional: npm install p-queue for production queue management

Instructions

Step 1 — Understand the Rate Limit Structure

| Limit | Scope | Window | Threshold | |-------|-------|--------|-----------| | Per-user | Single user's delegated token | 60 seconds (rolling) | 600 requests | | Per-tenant | All users + all apps in the tenant | 10 minutes (rolling) | 10,000 requests |

When either limit is hit:

  • Response status: 429 Too Many Requests
  • Response header: Retry-After: <seconds> (integer, not milliseconds)
  • All subsequent OneNote requests for that scope are blocked until the window resets
  • Non-OneNote Graph endpoints (Outlook, OneDrive) are not affected

Step 2 — Token Bucket Rate Limiter (TypeScript)

A token bucket preemptively throttles requests to stay below the limit, avoiding 429s entirely:

class TokenBucket {
  private tokens: number;
  private lastRefill: number;
  private readonly maxTokens: number;
  private readonly refillRate: number; // tokens per millisecond

  constructor(maxTokens: number, refillWindowMs: number) {
    this.maxTokens = maxTokens;
    this.tokens = maxTokens;
    this.lastRefill = Date.now();
    this.refillRate = maxTokens / refillWindowMs;
  }

  private refill(): void {
    const now = Date.now();
    const elapsed = now - this.lastRefill;
    this.tokens = Math.min(this.maxTokens, this.tokens + elapsed * this.refillRate);
    this.lastRefill = now;
  }

  async acquire(): Promise<void> {
    this.refill();
    if (this.tokens >= 1) {
      this.tokens -= 1;
      return;
    }
    // Wait until a token is available
    const waitMs = Math.ceil((1 - this.tokens) / this.refillRate);
    await new Promise((resolve) => setTimeout(resolve, waitMs));
    this.tokens -= 1;
  }

  get available(): number {
    this.refill();
    return Math.floor(this.tokens);
  }
}

// Per-user bucket: 600 requests per 60 seconds
const userBucket = new TokenBucket(600, 60_000);

// Use with a safety margin (80% of limit)
const safeUserBucket = new TokenBucket(480, 60_000);

Step 3 — Queue-Based Request Throttling

Wrap all OneNote API calls through a throttled queue that respects both the token bucket and Retry-After headers:

import { Client } from "@microsoft/microsoft-graph-client";

class ThrottledOneNoteClient {
  private bucket: TokenBucket;
  private queue: Array<{
    resolve: (value: any) => void;
    reject: (error: any) => void;
    fn: () => Promise<any>;
  }> = [];
  private processing = false;
  private retryAfterUntil: number = 0; // Timestamp when retry-after expires

  constructor(
    private client: Client,
    maxRequestsPerMinute: number = 480 // 80% safety margin
  ) {
    this.bucket = new TokenBucket(maxRequestsPerMinute, 60_000);
  }

  async request<T>(fn: (client: Client) => Promise<T>): Promise<T> {
    return new Promise((resolve, reject) => {
      this.queue.push({ resolve, reject, fn: () => fn(this.client) });
      this.processQueue();
    });
  }

  private async processQueue(): Promise<void> {
    if (this.processing) return;
    this.processing = true;

    while (this.queue.length > 0) {
      // Respect Retry-After if we've been throttled
      const now = Date.now();
      if (this.retryAfterUntil > now) {
        const waitMs = this.retryAfterUntil - now;
        console.warn(`Rate limited — waiting ${Math.ceil(waitMs / 1000)}s`);
        await new Promise((r) => setTimeout(r, waitMs));
      }

      await this.bucket.acquire();
      const item = this.queue.shift()!;

      try {
        const result = await item.fn();
        item.resolve(result);
      } catch (err: any) {
        if (err.statusCode === 429) {
          const retryAfter = parseInt(err.headers?.["retry-after"] ?? "30", 10);
          this.retryAfterUntil = Date.now() + retryAfter * 1000;
          // Re-queue the failed request
          this.queue.unshift(item);
          console.warn(`429 received — Retry-After: ${retryAfter}s`);
        } else {
          item.reject(err);
        }
      }
    }

    this.processing = false;
  }
}

// Usage
const throttled = new ThrottledOneNoteClient(client);
const notebooks = await throttled.request((c) =>
  c.api("/me/onenote/notebooks").get()
);

Step 4 — Per-User Tracking for Multi-User Apps

Multi-user apps must track rate limits per user, not globally:

class MultiUserRateLimiter {
  private userBuckets: Map<string, TokenBucket> = new Map();
  private tenantBucket: TokenBucket;

  constructor() {
    // Tenant-wide: 10,000 per 10 minutes
    this.tenantBucket = new TokenBucket(8_000, 600_000); // 80% safety margin
  }

  async acquire(userId: string): Promise<void> {
    // Get or create per-user bucket
    if (!this.userBuckets.has(userId)) {
      this.userBuckets.set(userId, new TokenBucket(480, 60_000));
    }
    const userBucket = this.userBuckets.get(userId)!;

    // Must acquire from BOTH buckets
    await userBucket.acquire();
    await this.tenantBucket.acquire();
  }

  getStatus(userId: string): { userRemaining: number; tenantRemaining: number } {
    const userBucket = this.userBuckets.get(userId);
    return {
      userRemaining: userBucket?.available ?? 480,
      tenantRemaining: this.tenantBucket.available,
    };
  }
}

Step 5 — Exponential Backoff with Jitter

For 429 responses without a Retry-After header (rare but possible), use exponential backoff with jitter:

async function withBackoff<T>(
  fn: () => Promise<T>,
  maxRetries: number = 5
): Promise<T> {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await fn();
    } catch (err: any) {
      if (err.statusCode !== 429 || attempt === maxRetries) throw err;

      const retryAfter = err.headers?.["retry-after"];
      let delayMs: number;

      if (retryAfter) {
        // Prefer server-specified delay (in seconds)
        delayMs = parseInt(retryAfter, 10) * 1000;
      } else {
        // Exponential backoff: 1s, 2s, 4s, 8s, 16s + jitter
        const base = Math.pow(2, attempt) * 1000;
        const jitter = Math.random() * 1000;
        delayMs = base + jitter;
      }

      console.warn(`Retry ${attempt + 1}/${maxRetries} in ${Math.ceil(delayMs / 1000)}s`);
      await new Promise((r) => setTimeout(r, delayMs));
    }
  }
  throw new Error("Unreachable");
}

// Usage
const pages = await withBackoff(() =>
  client.api("/me/onenote/pages").top(50).get()
);

Step 6 — Batch Requests to Reduce Call Count

The Graph $batch endpoint lets you send up to 20 operations in a single HTTP request. The entire batch counts as one request toward your rate limit:

async function batchGetPages(client: Client, pageIds: string[]): Promise<any[]> {
  const batchSize = 20; // Graph batch limit
  const allResults: any[] = [];

  for (let i = 0; i < pageIds.length; i += batchSize) {
    const chunk = pageIds.slice(i, i + batchSize);
    const batchBody = {
      requests: chunk.map((id, idx) => ({
        id: String(idx + 1),
        method: "GET",
        url: `/me/onenote/pages/${id}?$select=id,title,lastModifiedDateTime`,
      })),
    };

    const batchResponse = await client.api("/$batch").post(batchBody);
    for (const response of batchResponse.responses) {
      if (response.status === 200) {
        allResults.push(response.body);
      } else {
        console.warn(`Batch item ${response.id} failed: ${response.status}`);
      }
    }
  }
  return allResults;
}

// 100 pages = 5 HTTP requests instead of 100
const pages = await batchGetPages(client, hundredPageIds);

Step 7 — Python Rate Limiter with asyncio

import asyncio
import time

class RateLimiter:
    """Token bucket rate limiter for OneNote Graph API."""

    def __init__(self, max_requests: int = 480, window_seconds: int = 60):
        self.max_tokens = max_requests
        self.tokens = float(max_requests)
        self.refill_rate = max_requests / window_seconds
        self.last_refill = time.monotonic()
        self._lock = asyncio.Lock()

    async def acquire(self):
        async with self._lock:
            now = time.monotonic()
            elapsed = now - self.last_refill
            self.tokens = min(self.max_tokens, self.tokens + elapsed * self.refill_rate)
            self.last_refill = now

            if self.tokens < 1:
                wait = (1 - self.tokens) / self.refill_rate
                await asyncio.sleep(wait)
                self.tokens = 0
            else:
                self.tokens -= 1

# Usage — combines token bucket with Retry-After handling
limiter = RateLimiter(max_requests=480, window_seconds=60)

async def safe_get_pages(client, section_id: str, max_retries: int = 3):
    for attempt in range(max_retries):
        await limiter.acquire()
        try:
            return await client.me.onenote.sections.by_onenote_section_id(
                section_id
            ).pages.get()
        except Exception as e:
            # Handle 429 with Retry-After header
            if hasattr(e, "response") and e.response.status_code == 429 and attempt < max_retries - 1:
                retry_after = int(e.response.headers.get("Retry-After", "30"))
                await asyncio.sleep(retry_after)
            else:
                raise
    raise RuntimeError("Max retries exceeded for OneNote API call")

Step 8 — Monitor and Adjust Preemptively

Track your 429 rate over time and adjust thresholds:

class RateLimitMonitor {
  private requestCount = 0;
  private throttleCount = 0;
  private windowStart = Date.now();

  record(wasThrottled: boolean): void {
    this.requestCount++;
    if (wasThrottled) this.throttleCount++;
  }

  getMetrics(): { total: number; throttled: number; throttleRate: number; windowMinutes: number } {
    const windowMinutes = (Date.now() - this.windowStart) / 60_000;
    return {
      total: this.requestCount,
      throttled: this.throttleCount,
      throttleRate: this.throttleCount / Math.max(this.requestCount, 1),
      windowMinutes: Math.round(windowMinutes * 10) / 10,
    };
  }

  // Alert if throttle rate exceeds threshold
  shouldReduceRate(): boolean {
    return this.getMetrics().throttleRate > 0.05; // >5% throttled = slow down
  }
}

Output

Rate limit handling produces:

  • Preemptive throttling via token bucket — requests are delayed before sending, not after 429
  • Retry-After compliance — exact server-specified delays honored
  • Batch consolidation — 20 operations per HTTP request for bulk workloads
  • Monitoring metrics — request count, throttle count, throttle rate percentage

Error Handling

| Status | Cause | Fix | |--------|-------|-----| | 429 (with Retry-After) | Per-user or per-tenant limit exceeded | Wait exactly Retry-After seconds; do not retry sooner | | 429 (no Retry-After) | Rare edge case, limit exceeded | Exponential backoff with jitter starting at 1 second | | 503 | Service throttling under load | Treat like 429 — backoff and retry | | 500 | Internal error during throttled state | Do not count as rate limit; retry with normal backoff |

Examples

Calculate request budget for polling + CRUD:

const BUDGET_PER_MINUTE = 600;
const SAFETY_MARGIN = 0.8; // Use 80% of limit
const safeBudget = BUDGET_PER_MINUTE * SAFETY_MARGIN; // 480

// Allocate budget
const pollingSections = 20;
const pollIntervalSec = 30;
const pollRequestsPerMin = pollingSections * (60 / pollIntervalSec); // 40/min

const remainingForCrud = safeBudget - pollRequestsPerMin; // 440/min for user operations
console.log(`Polling: ${pollRequestsPerMin}/min | CRUD: ${remainingForCrud}/min`);

Production health check:

const monitor = new RateLimitMonitor();
// After each API call:
monitor.record(/* wasThrottled */ false);

// Periodic check
setInterval(() => {
  const metrics = monitor.getMetrics();
  if (monitor.shouldReduceRate()) {
    console.warn(`High throttle rate: ${(metrics.throttleRate * 100).toFixed(1)}%`);
    // Dynamically increase poll interval or reduce batch concurrency
  }
}, 60_000);

Resources

Next Steps

  • See onenote-webhooks-events for polling patterns that consume rate budget
  • See onenote-performance-tuning for batch operations and $select to reduce payload size
  • See onenote-core-workflow-a for CRUD operations that benefit from throttled clients