Agent Skills: Glean Core Workflow B: Indexing & Connectors

|

UncategorizedID: jeremylongshore/claude-code-plugins-plus-skills/glean-core-workflow-b

Install this agent skill to your local

pnpm dlx add-skill https://github.com/jeremylongshore/claude-code-plugins-plus-skills/tree/HEAD/plugins/saas-packs/glean-pack/skills/glean-core-workflow-b

Skill Files

Browse the full folder contents for glean-core-workflow-b.

Download Skill

Loading file tree…

plugins/saas-packs/glean-pack/skills/glean-core-workflow-b/SKILL.md

Skill Metadata

Name
glean-core-workflow-b
Description
'Execute Glean secondary workflow: bulk document indexing, custom datasource

Glean Core Workflow B: Indexing & Connectors

Overview

Build custom Glean connectors: set up datasources, bulk index documents, manage content lifecycle, and configure permissions.

Instructions

Step 1: Create Custom Datasource

await fetch(`${GLEAN}/index/v1/adddatasource`, {
  method: 'POST', headers: idxHeaders,
  body: JSON.stringify({
    name: 'internal_docs',
    displayName: 'Internal Documentation',
    datasourceCategory: 'PUBLISHED_CONTENT',
    urlRegex: 'https://docs.internal.company.com/.*',
    isOnPrem: false,
  }),
});

Step 2: Bulk Index Documents

// Bulk indexing replaces ALL documents in the datasource
const uploadId = `upload-${Date.now()}`;

// Send documents in batches of 100
for (let i = 0; i < allDocs.length; i += 100) {
  const batch = allDocs.slice(i, i + 100);
  const isFirst = i === 0;
  const isLast = i + 100 >= allDocs.length;

  await fetch(`${GLEAN}/index/v1/bulkindexdocuments`, {
    method: 'POST', headers: idxHeaders,
    body: JSON.stringify({
      datasource: 'internal_docs',
      uploadId,
      isFirstPage: isFirst,
      isLastPage: isLast,
      documents: batch.map(doc => ({
        id: doc.id,
        title: doc.title,
        url: doc.url,
        body: { mimeType: 'text/html', textContent: doc.content },
        author: { email: doc.authorEmail },
        updatedAt: doc.updatedAt,
        permissions: { allowAnonymousAccess: true },
      })),
    }),
  });
  console.log(`Indexed batch ${i/100 + 1} (${batch.length} docs)`);
}

Step 3: Set Document Permissions

// Control who can see documents in search results
await fetch(`${GLEAN}/index/v1/indexdocuments`, {
  method: 'POST', headers: idxHeaders,
  body: JSON.stringify({
    datasource: 'internal_docs',
    documents: [{
      id: 'confidential-001',
      title: 'Board Meeting Notes',
      url: 'https://docs.internal.company.com/board/q1-2025',
      body: { mimeType: 'text/plain', textContent: '...' },
      permissions: {
        allowedUsers: [{ email: 'ceo@company.com' }, { email: 'cfo@company.com' }],
      },
    }],
  }),
});

Step 4: Delete Documents

// Remove specific documents from the index
await fetch(`${GLEAN}/index/v1/deletedocument`, {
  method: 'POST', headers: idxHeaders,
  body: JSON.stringify({
    datasource: 'internal_docs',
    objectType: 'Document',
    id: 'doc-to-delete',
  }),
});

Error Handling

| Error | Cause | Solution | |-------|-------|----------| | uploadId already used | Reusing bulk upload ID | Generate unique uploadId per run | | document too large | Content exceeds limit | Truncate body to ~100KB | | invalid permissions | Malformed user/group | Use valid email addresses |

Resources

Next Steps

For common errors, see glean-common-errors.