OneNote Cost Tuning
Overview
OneNote Graph API calls have no per-request cost — they are included in every Microsoft 365 E3/E5/Business license. However, rate limits create an effective ceiling that functions like a cost constraint: 600 requests per user per 60 seconds and 10,000 requests per app per 10 minutes at the tenant level. Exceeding these limits returns 429 errors with Retry-After headers, degrading user experience the same way budget overruns degrade service. This skill covers the practical optimization strategies that keep you well under those ceilings: metadata caching, JSON batch requests, delta sync, payload minimization with $select/$expand, and content deduplication. A naive integration that polls every user's notebooks every minute burns through the tenant limit in under 10 minutes. An optimized one handles thousands of users within the same budget.
Prerequisites
- Microsoft 365 license (E3/E5/Business) — OneNote API is included, no additional billing
- Azure AD app registration with delegated permissions
- Python:
pip install msgraph-sdk azure-identityor Node:npm install @microsoft/microsoft-graph-client @azure/identity - Understanding of HTTP caching headers (ETag, If-None-Match)
Instructions
Licensing Model and True Cost
| Component | Cost | |-----------|------| | OneNote API calls | Included in M365 license (no per-call charge) | | Rate limit: per user | 600 requests / 60 seconds | | Rate limit: per tenant | 10,000 requests / 10 minutes | | Retry-After penalty | Blocked for N seconds (header value) | | Graph metered billing | Optional; extends limits for high-volume apps |
The real cost is operational: every 429 response adds latency, retry logic consumes compute, and throttled users see failures. Optimization is about reliability, not billing.
Strategy 1: Cache Metadata Aggressively
Notebook and section metadata changes rarely (names, IDs, hierarchy). Cache it locally and refresh on a schedule, not per-request:
interface CachedMetadata {
notebooks: any[];
sections: Map<string, any[]>; // notebookId -> sections
fetchedAt: number;
ttlMs: number;
}
class MetadataCache {
private cache: CachedMetadata = {
notebooks: [],
sections: new Map(),
fetchedAt: 0,
ttlMs: 15 * 60 * 1000, // 15 minutes — notebooks/sections rarely change
};
isStale(): boolean {
return Date.now() - this.cache.fetchedAt > this.cache.ttlMs;
}
async getNotebooks(client: any): Promise<any[]> {
if (!this.isStale() && this.cache.notebooks.length > 0) {
return this.cache.notebooks; // 0 API calls
}
const response = await client.api("/me/onenote/notebooks")
.select("id,displayName,lastModifiedDateTime")
.get();
this.cache.notebooks = response.value;
this.cache.fetchedAt = Date.now();
return this.cache.notebooks; // 1 API call
}
async getSections(client: any, notebookId: string): Promise<any[]> {
if (!this.isStale() && this.cache.sections.has(notebookId)) {
return this.cache.sections.get(notebookId)!;
}
const response = await client
.api(`/me/onenote/notebooks/${notebookId}/sections`)
.select("id,displayName")
.get();
this.cache.sections.set(notebookId, response.value);
return response.value;
}
invalidate(): void {
this.cache.fetchedAt = 0;
}
}
Savings: A typical app listing notebooks 100 times/hour drops from 100 calls to 4 calls (one refresh per 15-minute TTL window).
Strategy 2: JSON Batch Requests
The $batch endpoint combines up to 20 requests into a single HTTP call, reducing call count by up to 20x:
import httpx
async def batch_get_sections(access_token: str, notebook_ids: list[str]) -> dict:
"""Fetch sections for multiple notebooks in a single HTTP call."""
requests = []
for i, nb_id in enumerate(notebook_ids[:20]): # Max 20 per batch
requests.append({
"id": str(i),
"method": "GET",
"url": f"/me/onenote/notebooks/{nb_id}/sections?$select=id,displayName",
})
async with httpx.AsyncClient() as http:
response = await http.post(
"https://graph.microsoft.com/v1.0/$batch",
headers={
"Authorization": f"Bearer {access_token}",
"Content-Type": "application/json",
},
json={"requests": requests},
)
batch_result = response.json()
# Map responses back to notebook IDs
result = {}
for resp in batch_result["responses"]:
idx = int(resp["id"])
nb_id = notebook_ids[idx]
if resp["status"] == 200:
result[nb_id] = resp["body"]["value"]
else:
result[nb_id] = [] # Handle individual failures gracefully
return result
Savings: Fetching sections for 20 notebooks: 20 calls becomes 1 call. For 100 notebooks, 100 calls becomes 5 calls.
Strategy 3: Delta Sync Instead of Full Sync
Delta queries return only changes since your last sync, replacing full-list operations that grow linearly with data size:
class DeltaSyncer {
private deltaLinks: Map<string, string> = new Map();
async syncPages(client: any, sectionId: string): Promise<{
added: any[];
modified: any[];
deleted: string[];
}> {
const deltaLink = this.deltaLinks.get(sectionId);
const url = deltaLink || `/me/onenote/sections/${sectionId}/pages/delta`;
const response = await client.api(url).get();
const result = { added: [] as any[], modified: [] as any[], deleted: [] as string[] };
for (const page of response.value || []) {
if (page["@removed"]) {
result.deleted.push(page.id);
} else if (deltaLink) {
result.modified.push(page);
} else {
result.added.push(page);
}
}
// Store the delta link for next sync
if (response["@odata.deltaLink"]) {
this.deltaLinks.set(sectionId, response["@odata.deltaLink"]);
}
return result;
}
}
Savings: A section with 500 pages that changes 3 pages/hour: full sync = 500+ items per response, delta sync = 3 items. Call count may be same (1), but payload size drops 99%.
Strategy 4: Payload Minimization with $select and $expand
Every field you do not request is bandwidth you save and parsing you skip:
# BAD: Returns all fields including content URLs, permissions, links (2-5 KB per page)
GET /me/onenote/sections/{id}/pages
# GOOD: Returns only what you need (200-300 bytes per page)
GET /me/onenote/sections/{id}/pages?$select=id,title,createdDateTime,lastModifiedDateTime&$top=50&$orderby=lastModifiedDateTime desc
Use $expand to eliminate follow-up calls:
# Without $expand: 1 call for notebooks + N calls for sections = N+1 calls
GET /me/onenote/notebooks
GET /me/onenote/notebooks/{id1}/sections
GET /me/onenote/notebooks/{id2}/sections
# With $expand: 1 call total
GET /me/onenote/notebooks?$expand=sections($select=id,displayName)&$select=id,displayName
Strategy 5: Content Deduplication
Hash page content before writing to avoid duplicate POST calls:
import hashlib
def content_hash(html_body: str) -> str:
"""Generate a stable hash of page content for dedup."""
return hashlib.sha256(html_body.encode("utf-8")).hexdigest()
class DeduplicatedWriter:
def __init__(self):
self.written_hashes: dict[str, str] = {} # hash -> page_id
async def write_page(self, client, section_id: str, title: str, html_body: str):
h = content_hash(html_body)
if h in self.written_hashes:
return self.written_hashes[h] # Skip duplicate write
full_html = (
f"<html><head><title>{title}</title></head>"
f"<body>{html_body}</body></html>"
)
page = await client.me.onenote.sections.by_onenote_section_id(
section_id
).pages.post(content=full_html.encode())
self.written_hashes[h] = page.id
return page.id
Cost Modeling Template
Estimate your API call budget per user per day:
| Operation | Calls/occurrence | Frequency/user/day | Daily calls | |-----------|:---:|:---:|:---:| | List notebooks | 1 | 4 (cached 15min) | 4 | | List sections | 1 (batched) | 4 | 4 | | List pages (delta) | 1 | 24 (hourly) | 24 | | Read page content | 1 | 10 | 10 | | Create/update page | 1 | 5 | 5 | | Total per user | | | 47 | | 100 users / tenant | | | 4,700 | | Tenant limit (10min) | | | 10,000 |
At 100 users, you are using 47% of the 10-minute tenant budget — safe headroom. At 250+ users, consider Graph metered billing or reduce polling frequency.
Monitoring Dashboard Metrics
Track these metrics to detect optimization drift:
import time
from collections import defaultdict
class ApiMetrics:
def __init__(self):
self.calls_per_user: dict[str, int] = defaultdict(int)
self.throttle_count = 0
self.total_latency_ms = 0.0
self.call_count = 0
def record_call(self, user_id: str, latency_ms: float, status: int):
self.calls_per_user[user_id] += 1
self.total_latency_ms += latency_ms
self.call_count += 1
if status == 429:
self.throttle_count += 1
def report(self) -> dict:
return {
"total_calls": self.call_count,
"avg_latency_ms": self.total_latency_ms / max(self.call_count, 1),
"throttle_rate": self.throttle_count / max(self.call_count, 1),
"top_users": sorted(
self.calls_per_user.items(), key=lambda x: x[1], reverse=True
)[:10],
}
Alert thresholds:
- Throttle rate > 1%: investigate hotspot user or batch consolidation
- Avg latency > 2000ms: Graph service degradation or oversized payloads
- Single user > 300 calls/hour: likely missing cache or polling too frequently
Output
After applying this skill, your OneNote integration will have: metadata caching reducing repetitive calls by 95%, batch requests combining up to 20 calls into 1, delta sync for incremental page updates, minimized payloads via $select/$expand, content deduplication preventing duplicate writes, and a monitoring framework to detect optimization regression.
Error Handling
| Error | Cause | Fix |
|-------|-------|-----|
| 429 Too Many Requests | Exceeded 600/min (user) or 10K/10min (tenant) | Read Retry-After header; back off for that many seconds; check for missing cache |
| 507 Insufficient Storage | Per-section page limit exceeded | Archive old pages or split across sections |
| Batch response with mixed status codes | Some requests in batch succeeded, others failed | Process each batch response individually; retry only failed items |
| Delta link expired | Too long between delta syncs | Fall back to full sync, then resume delta |
| 504 Gateway Timeout on large $expand | Too many nested resources | Reduce $top count or remove $expand and make separate calls |
Examples
Quick audit of current API usage:
# Check rate limit headers in any Graph response
curl -sI -H "Authorization: Bearer $TOKEN" \
"https://graph.microsoft.com/v1.0/me/onenote/notebooks" | \
grep -i "ratelimit\|retry-after\|x-ms"
Batch request via curl:
curl -X POST "https://graph.microsoft.com/v1.0/\$batch" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"requests": [
{"id": "1", "method": "GET", "url": "/me/onenote/notebooks?$select=id,displayName"},
{"id": "2", "method": "GET", "url": "/me/onenote/sections?$select=id,displayName&$top=50"}
]
}'
Resources
Next Steps
- Apply
onenote-rate-limitsfor detailed Retry-After handling and queue-based throttling - Use
onenote-performance-tuningfor response time optimization - See
onenote-prod-checklistfor full production readiness review