PyTiDB (pytidb) Skill | Agent Skills

PyTiDB (pytidb)

Use this skill to connect to TiDB from Python via pytidb, define tables, and build search / AI features on top.

You want a Python ORM-like experience on TiDB via pytidb (built on SQLAlchemy).
You want vector search / full-text search / hybrid search on TiDB with high-level APIs.
You want runnable starter templates (scripts + small examples) you can adapt.

Need to provision a TiDB Cloud cluster first? Use tidbx (TiDB X) for cluster lifecycle guidance.

Never hardcode credentials; use env vars (.env) and document required variables.
Prefer python -m venv .venv and pinned deps for reproducibility.
When editing requirements.txt, do not invent pytidb versions, use an unpinned pytidb by default unless the user explicitly requests it and the version has been verified to exist.
Keep examples minimal and runnable; avoid framework-specific assumptions unless the user asks.
Use parameterized SQL for any dynamic value (SQL injection safety).
For interactive environments, avoid “table already defined” errors (use extend_existing / open_table / if rows()==0 patterns).

Each guide is a self-contained walkthrough with a checklist and phases:

guides/quickstart.md — one-file “connect → create table → insert → vector search”
guides/search.md — vector / full-text / hybrid: when to use which, plus gotchas
guides/demos.md — examples playbook (vector/hybrid/image)
guides/agent-apps.md — agent-ish examples (RAG / memory / text2sql)
guides/troubleshooting.md — connection, TLS, embedding, and index/search issues
guides/custom-embedding.md — implement a custom embedding function (example: BGE-M3)

I’ll infer your intent (CRUD vs search vs “agent app”), then point you to the smallest guide and template set that gets you running.

Each template is a complete file you can copy into your project. Choose the smallest one that matches your goal.

templates/quickstart.py — minimal end-to-end: connect → create table → insert → vector search
templates/crud.py — basic table modeling + CRUD lifecycle (create/truncate/insert/query/update/delete)
templates/auto_embedding.py — auto embedding with pluggable providers (env-driven)
templates/vector_search.py — vector search example (optional metadata filter + threshold)
templates/hybrid_search.py — hybrid search example (FullTextField + vector field) with fused scoring

templates/image_search.py — image-to-image or text-to-image search (requires multimodal embedding + Pillow)
templates/image_search_data_loader.py — loads Oxford Pets dataset into TiDB (used by image_search.py)

templates/custom_embedding_function.py — example BaseEmbeddingFunction implementation (BGE-M3 via FlagEmbedding)
templates/custom_embedding.py — uses the custom embedder with auto embedding + vector search

templates/rag.py — minimal RAG: retrieve via vector search, then generate via local LLM (Ollama via LiteLLM)
templates/memory_lib.py — reusable “memory” library (extract facts → store → retrieve)
templates/memory.py — CLI memory chat example using memory_lib.py
templates/text2sql.py — interactive Text2SQL (generates SQL via OpenAI; asks before executing)

scripts/validate_connection.py — quick connection + SELECT 1 smoke test (supports params or DATABASE_URL)

I will:

Confirm your TiDB deployment (Cloud Starter vs self-managed) and how you want to connect (params vs DATABASE_URL).
Help you set env vars, validate the connection, and choose the right path:
- CRUD/table modeling
- vector/full-text/hybrid search (and embedding provider)
- example templates
Generate the minimal set of files and commands to get you running.