Back to tags
Tag

Agent Skills with tag: document-processing

16 skills match this tag. Use tags to discover related Agent Skills and explore similar workflows.

ai-multimodal

Process and generate multimedia content using Google Gemini API. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (captioning, object detection, OCR, visual Q&A, segmentation), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image, editing, composition, refinement). Use when working with audio/video files, analyzing images or screenshots, processing PDF documents, extracting structured data from media, creating images from text prompts, or implementing multimodal AI features. Supports multiple models (Gemini 2.5/2.0) with context windows up to 2M tokens. | Sử dụng khi: AI, LLM, vision, embedding, phân tích hình ảnh, Gemini API.

google-geminimultimodalimage-processingaudio-processing
wollfoo
wollfoo
0

pdf-to-markdown

Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.

pdfmarkdowntext-extractiondocument-processing
aliceisjustplaying
aliceisjustplaying
11

rag-systems

Retrieval Augmented Generation systems with vector search, document processing, and hybrid retrieval.

retrieval-augmented-generationvector-searchdocument-processinghybrid-retrieval
pluginagentmarketplace
pluginagentmarketplace
1

tapestry

Unified content extraction and action planning. Use when user says "tapestry <URL>", "weave <URL>", "help me plan <URL>", "extract and plan <URL>", "make this actionable <URL>", or similar phrases indicating they want to extract content and create an action plan. Automatically detects content type (YouTube video, article, PDF) and processes accordingly.

content-extractiondocument-processingsummarizationtask-planning
gupsammy
gupsammy
0

skill-creator

Use when the user has a document (PDF, markdown, book notes, research paper, methodology guide) containing theoretical knowledge or frameworks and wants to convert it into an actionable, reusable skill. Invoke when the user mentions "create a skill from this document", "turn this into a skill", "extract a skill from this file", or when analyzing documents with methodologies, frameworks, processes, or systematic approaches that could be made actionable for future use.

skill-creationdocument-processingpdf-processingmarkdown
lyndonkl
lyndonkl
0

pdfco

PDF processing API for conversion, extraction, merging, splitting and more

pdfdocument-processingfile-conversionapi
vm0-ai
vm0-ai
0

pdf4me

Comprehensive PDF processing API for conversion, merge, split, compress, OCR, and more

apidocument-processingpdf-processingOCR
vm0-ai
vm0-ai
0

markdown-video

Convert Deckset-format markdown slides with speaker notes to presentation video with TTS narration. Use when user requests to create video from slides, generate presentation video, or convert slides to MP4 format.

markdownslide-deckdocument-processingtts
jykim
jykim
0

markdown-to-epub-converter

Convert markdown documents and chat summaries into formatted EPUB ebook files that can be read on any device or uploaded to Kindle.

markdownepubfile-conversiondocument-processing
smerchek
smerchek
0

save-web-page

Guide for saving a web page for offline use using the monolith CLI. Use this when instructed to save a web page.

document-processingshell-scriptingtool-useoffline-access
maragudk
maragudk
0

rtl-document-translation

Translate structured documents (DOCX) to RTL languages (Arabic, Hebrew, Urdu) while preserving exact formatting, table structures, colors, and layouts. Handles quote normalization, multi-pass translation matching, and RTL-specific formatting patterns.

translationdocument-processingdocxrtl-formatting
belumume
belumume
0

document-quality-standards

Use when creating or editing documents (DOCX, PDF, XLSX, PPTX) that need professional output. Adds visual verification, typography hygiene, and formula patterns to the document-skills plugin.

document-processingdocument-templatesformattingbest-practices
belumume
belumume
0

docx-advanced-patterns

Advanced python-docx patterns for handling nested tables, complex cell structures, and content extraction beyond basic .text property. Complements the official docx skill with specialized techniques for forms, checklists, and complex layouts.

pythondocxdocument-processingtable-extraction
belumume
belumume
0

tapestry

Unified content extraction and action planning. Use when user says "tapestry <URL>", "weave <URL>", "help me plan <URL>", "extract and plan <URL>", "make this actionable <URL>", or similar phrases indicating they want to extract content and create an action plan. Automatically detects content type (YouTube video, article, PDF) and processes accordingly.

document-processingtext-extractionweb-scrapingtask-planning
michalparkola
michalparkola
0

article-extractor

Extract clean article content from URLs (blog posts, articles, tutorials) and save as readable text. Use when user wants to download, extract, or save an article/blog post from a URL without ads, navigation, or clutter.

web-scrapingtext-extractiondocument-processinghtml
michalparkola
michalparkola
0

nano-pdf

Edit PDFs with natural-language instructions using the nano-pdf CLI.

clipdf-processingdocument-processingnatural-language-processing
steipete
steipete
0