Microsoft Purview Data Map Skill

Microsoft Purview Data Map

The Data Map is the foundational layer of Microsoft Purview data governance: it discovers, scans, and maps metadata and classifications from data sources across Azure, multicloud, SaaS, and on-premises - powering the Unified Catalog, lineage, and downstream protection decisions.

When to use

Building an enterprise-wide inventory of data assets and their classifications as the basis for governance, security, and data product publishing.

Do not use this skill for in-tenant Microsoft 365 classification (use purview-data-classification) or for AI prompt visibility (use purview-dspm-ai).

Pick the right integration runtime

| Source location | Use this runtime | |---|---| | Azure (Storage, SQL, Synapse, Fabric, Cosmos) | Managed (Azure-hosted) runtime | | AWS S3 / RDS, GCP BigQuery, public SaaS | Managed runtime, with credentials in Key Vault | | On-premises SQL, Oracle, file shares | Self-hosted integration runtime (SHIR) on a domain-joined Windows host | | Private-endpoint-only Azure sources | SHIR or VNet-integrated managed runtime | | Power BI tenant | Native connector, no runtime config |

Rule of thumb: managed runtime first; deploy SHIR only when network or private-endpoint reach requires it - and treat the SHIR host as Tier-0 infrastructure.

Approach

Plan collections - Design the collection hierarchy (by business domain or geography) before registering sources; collections drive RBAC and asset organisation. Verify: a draft collection tree exists and maps to data-owner accountability.
Register sources - Connect data sources (Azure Storage, SQL, Synapse, Fabric, AWS S3, databases, Power BI, etc.) into the right collection with appropriate credentials. Verify: each registered source shows correct subscription/account and target collection.
Choose an integration runtime - Use the managed runtime for cloud sources; deploy a self-hosted integration runtime to reach on-premises or private-network sources securely. Verify: SHIR status is Running and self-update is enabled.
Configure scans and rule sets - Schedule scans with scan rule sets; apply classifications (built-in SITs and custom) and lineage extraction where supported. Start incremental, not full. Verify: scan history shows successful runs with classified assets counted.
Curate - Review discovered assets, apply glossary terms, and assign data owners/stewards; tune custom classifications based on real matches. Verify: top assets have owners and at least one glossary term.
Govern access - Use collections to organise assets and scope permissions by domain; prefer collection-level role assignment over root. Verify: collection-admin roles assigned to domain owners, not platform team only.
Operate - Monitor scan failures, classification drift, and cost; right-size scan frequency per source criticality. Verify: a weekly scan-health report exists.

Guardrails

Scope and schedule scans to manage cost and source load; avoid scanning everything at full depth on day one - sampling first, full second.
Secure the self-hosted integration runtime host as sensitive infrastructure - it holds credentials and reaches into production data sources; patch, restrict logon, monitor.
Validate classification accuracy before relying on it for downstream protection - sample matches per SIT and tune confidence levels.
Use Key Vault for credentials; never embed secrets in scan configuration.
Plan capacity - Data Map is metered by capacity units; oversized scans inflate cost without governance value.

Common anti-patterns

Registering every source into the root collection and assigning everyone Data Reader.
Scheduling weekly full scans on petabyte data lakes - blows out capacity and budget.
Running SHIR on a workstation or shared jump host.
Skipping the glossary and ownership step - assets get classified but nobody acts on them.
Treating Data Map as a one-off load instead of a continuously curated catalogue.

Example prompts

Register and scan data sources in the Purview Data Map.
Configure an integration runtime for multicloud data scanning.
How do I map enterprise data and run classification scans?
Plan data discovery across cloud and on-prem sources.
Design a Purview collection hierarchy aligned to business domains.

Microsoft Learn

Data Map overview: https://learn.microsoft.com/purview/concept-elastic-data-map
Register & scan sources: https://learn.microsoft.com/purview/scan-data-sources
Manage integration runtimes: https://learn.microsoft.com/purview/manage-integration-runtimes
Classifications: https://learn.microsoft.com/purview/concept-classification
Collections & access control: https://learn.microsoft.com/purview/how-to-create-and-manage-collections
Pricing & capacity: https://learn.microsoft.com/purview/concept-elastic-data-map

Agent Skills: Microsoft Purview Data Map

Install this agent skill to your local

Skill Files