SPARQL & Semantic Web Skill Skill

SPARQL & Semantic Web Skill

Generate, analyze, and optimize SPARQL queries with an ontologist's perspective. This skill embodies the practical wisdom of knowledge graph practitioners—SPARQL is fundamentally a language for the manipulation of sets of assertions called triples, and nearly all operations are set operations.

Core Philosophy (Cagle's Principles)

"SPARQL and SHACL are the twin pillars of modern knowledge graph work. OWL's complexity fades; shapes and queries remain."

Key Insights:

SPARQL differentiates databases by stitching together assertions based on shared identifiers—this is almost its entire role
Thirty-Table Threshold: Knowledge graphs outperform relational databases once you exceed ~30 tables due to better handling of interconnected data
Data gathering is expensive—assess your organization's data access and acquisition capacity before planning major KG initiatives
Learn SPARQL. SHACL can be thought of as a dedicated wrapper around SPARQL queries and filters.

Guide Router

Load only ONE guide per request. Match user intent to the most specific keywords:

| User Intent | Load Guide | Content | |-------------|------------|---------| | SPARQL query syntax, SELECT, CONSTRUCT, ASK | 02-QUERY-PATTERNS.md | Query forms, graph patterns, filters | | Property paths, traversal, recursive queries | 03-PROPERTY-PATHS.md | Path operators, traversal patterns | | SPARQL Update, INSERT, DELETE, LOAD | 04-UPDATE-OPERATIONS.md | Data manipulation | | Aggregation, GROUP BY, subqueries | 05-AGGREGATION-SUBQUERIES.md | Advanced query patterns | | SHACL shapes, validation, constraints | 06-SHACL-INTEGRATION.md | Shapes and validation | | Turtle, RDF serialization, JSON-LD | 07-SERIALIZATION.md | Data formats | | OWL ontologies, reasoning, inference | 08-OWL-REASONING.md | Ontology patterns | | SPARQL for AI/LLM, parameterized queries | 09-AI-INTEGRATION.md | LLM patterns | | Federated queries, SERVICE, SPARQL-Anything | 10-FEDERATION.md | Distributed queries | | Performance, optimization, debugging | 11-OPTIMIZATION.md | Query efficiency | | IRI design, namespaces, naming conventions | 12-IRI-DESIGN.md | Identifier patterns |

Default behavior: If intent is unclear, ask the user to clarify or provide query patterns from this entry point.

SPARQL 1.1 Quick Reference

Query Forms

# SELECT - Return variable bindings
SELECT ?subject ?predicate ?object
WHERE { ?subject ?predicate ?object }

# CONSTRUCT - Return an RDF graph
CONSTRUCT { ?s ?p ?o }
WHERE { ?s ?p ?o . FILTER(?p = foaf:knows) }

# ASK - Return boolean
ASK { ?person foaf:name "Kurt Cagle" }

# DESCRIBE - Return graph describing resources
DESCRIBE <http://example.org/person/kurt>

Essential Clauses

PREFIX ex: <http://example.org/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?name ?title
WHERE {
  ?person a ex:Author ;
          rdfs:label ?name .
  OPTIONAL { ?person ex:title ?title }
  FILTER (lang(?name) = "en")
  FILTER NOT EXISTS { ?person ex:deceased ?d }
}
ORDER BY ?name
LIMIT 100
OFFSET 0

Graph Pattern Operators

| Operator | Purpose | Example | |----------|---------|---------| | . | Conjunction | ?s ?p ?o . ?o ?p2 ?o2 | | OPTIONAL | Left outer join | OPTIONAL { ?s ex:prop ?val } | | UNION | Disjunction | { ?s ex:a ?o } UNION { ?s ex:b ?o } | | MINUS | Set difference | { ?s ?p ?o } MINUS { ?s a ex:Draft } | | FILTER | Constraint | FILTER (?age > 18) | | BIND | Assignment | BIND (CONCAT(?first, " ", ?last) AS ?name) | | VALUES | Inline data | VALUES ?type { ex:Book ex:Article } |

Property Paths (SPARQL 1.1)

# Sequence path: A then B
?s ex:knows/ex:knows ?friend_of_friend

# Alternative path: A or B
?s rdfs:label|skos:prefLabel ?label

# Inverse path
?child ^ex:parent ?parent

# Zero or more
?class rdfs:subClassOf* ?superclass

# One or more
?s ex:contains+ ?descendant

# Zero or one
?s ex:nickname? ?nick

# Negated property set
?s !(rdf:type|rdfs:label) ?other

Aggregation

SELECT ?author (COUNT(?book) AS ?bookCount) (GROUP_CONCAT(?title; separator=", ") AS ?titles)
WHERE {
  ?book ex:author ?author ;
        ex:title ?title .
}
GROUP BY ?author
HAVING (COUNT(?book) > 5)
ORDER BY DESC(?bookCount)

| Function | Description | |----------|-------------| | COUNT(*) | Cardinality of solutions | | SUM(?val) | Numeric sum | | AVG(?val) | Average value | | MIN(?val) | Minimum value | | MAX(?val) | Maximum value | | GROUP_CONCAT(?val; separator=", ") | Concatenate strings | | SAMPLE(?val) | Arbitrary value |

Subqueries

# Find authors with above-average book counts
SELECT ?author ?bookCount
WHERE {
  {
    SELECT ?author (COUNT(?book) AS ?bookCount)
    WHERE { ?book ex:author ?author }
    GROUP BY ?author
  }
  {
    SELECT (AVG(?cnt) AS ?avgCount)
    WHERE {
      SELECT ?a (COUNT(?b) AS ?cnt)
      WHERE { ?b ex:author ?a }
      GROUP BY ?a
    }
  }
  FILTER (?bookCount > ?avgCount)
}

Essential Functions

String Functions

| Function | Example | |----------|---------| | STR(?x) | Convert to string | | STRLEN(?s) | String length | | SUBSTR(?s, 1, 5) | Substring | | UCASE(?s) / LCASE(?s) | Case conversion | | STRSTARTS(?s, "pre") | Prefix test | | STRENDS(?s, "suf") | Suffix test | | CONTAINS(?s, "sub") | Substring test | | CONCAT(?a, ?b) | Concatenation | | REPLACE(?s, "old", "new") | Replacement | | REGEX(?s, "pattern", "i") | Regex match | | ENCODE_FOR_URI(?s) | URL encoding |

RDF Term Functions

| Function | Purpose | |----------|---------| | IRI(?s) / URI(?s) | Construct IRI | | BNODE() / BNODE(?id) | Blank node | | STRDT(?s, xsd:date) | Typed literal | | STRLANG(?s, "en") | Language-tagged literal | | LANG(?lit) | Get language tag | | DATATYPE(?lit) | Get datatype | | isIRI(?x) | IRI test | | isBlank(?x) | Blank node test | | isLiteral(?x) | Literal test | | isNumeric(?x) | Numeric test |

Conditional & Existence

# IF conditional
BIND (IF(?age >= 18, "adult", "minor") AS ?category)

# COALESCE - first non-error value
BIND (COALESCE(?preferredName, ?name, "Unknown") AS ?displayName)

# EXISTS / NOT EXISTS
FILTER EXISTS { ?s ex:verified true }
FILTER NOT EXISTS { ?s ex:deleted true }

# BOUND - test if variable is bound
FILTER (BOUND(?optionalValue))

The Label Problem (Cagle's Solution)

Knowledge graphs use URIs, but users shouldn't need to know them. Multiple label predicates exist across ontologies.

Problem: Different ontologies use different label predicates:

rdfs:label
skos:prefLabel
dcterms:title
schema:name
foaf:name

Solution: Use property path alternatives or VALUES:

# Property path approach
SELECT ?resource ?label
WHERE {
  ?resource rdfs:label|skos:prefLabel|dcterms:title|schema:name ?label .
  FILTER (lang(?label) = "en" || lang(?label) = "")
}

# VALUES approach (more extensible)
SELECT ?resource ?label
WHERE {
  VALUES ?labelProp { rdfs:label skos:prefLabel dcterms:title schema:name }
  ?resource ?labelProp ?label .
}

Named Graphs

# Query specific named graph
SELECT ?s ?p ?o
FROM <http://example.org/graph1>
WHERE { ?s ?p ?o }

# Query across named graphs
SELECT ?g ?s ?p ?o
FROM NAMED <http://example.org/graph1>
FROM NAMED <http://example.org/graph2>
WHERE {
  GRAPH ?g { ?s ?p ?o }
}

# Default graph + named graphs
SELECT ?s ?label ?graphLabel
WHERE {
  ?s rdfs:label ?label .  # From default graph
  GRAPH ?g {
    ?s ex:status ?status .  # From named graphs
  }
}

SPARQL Update (1.1)

# INSERT DATA - add specific triples
INSERT DATA {
  ex:person1 a ex:Person ;
             ex:name "Kurt Cagle" .
}

# DELETE DATA - remove specific triples
DELETE DATA {
  ex:person1 ex:status "draft" .
}

# DELETE/INSERT with WHERE
DELETE { ?s ex:status "draft" }
INSERT { ?s ex:status "published" }
WHERE { ?s ex:status "draft" ; ex:reviewed true }

# LOAD external data
LOAD <http://example.org/data.ttl> INTO GRAPH <http://example.org/imported>

# CLEAR graph
CLEAR GRAPH <http://example.org/temp>

# DROP graph
DROP GRAPH <http://example.org/obsolete>

Turtle Quick Reference

@prefix ex: <http://example.org/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

# Subject with multiple predicates (semicolon)
ex:KurtCagle a ex:Person ;
    rdfs:label "Kurt Cagle"@en ;
    ex:role "Ontologist" ;
    ex:founded ex:Semantical ;
    ex:writes ex:TheOntologist, ex:TheCagleReport .

# Blank nodes
ex:Book1 ex:author [
    a ex:Person ;
    ex:name "Anonymous"
] .

# Collections (RDF lists)
ex:Course ex:topics ( ex:SPARQL ex:RDF ex:SHACL ) .

# Typed literals
ex:event ex:date "2025-12-18"^^xsd:date ;
         ex:attendees "150"^^xsd:integer .

IRI Design Patterns (Cagle's Recommendations)

Standard Structure: http://{authority}/{path/to/term}[#|/]{localName}

Naming Conventions

| Element | Convention | Example | |---------|------------|---------| | Namespaces | TitleCase | http://example.org/Ontology/ | | Classes | TitleCase | ex:Person, ex:KnowledgeGraph | | Instances | TitleCase | ex:KurtCagle, ex:Book123 | | Properties | camelCase | ex:hasAuthor, ex:datePublished |

Best Practices

Avoid embedding semantics in identifiers—use annotative properties
Don't include versioning in IRIs; versioning is metadata
Meaningful local names aid debugging, but don't parse them for data
Use UUIDs for auto-generated instances when readability isn't critical
URNs are valid IRIs: urn:isbn:0451450523, urn:mailto:user@example.org

SHACL Integration Patterns

SHACL (Shapes Constraint Language) validates RDF data and can generate SPARQL queries.

@prefix sh: <http://www.w3.org/ns/shacl#> .

ex:PersonShape a sh:NodeShape ;
    sh:targetClass ex:Person ;
    sh:property [
        sh:path ex:name ;
        sh:minCount 1 ;
        sh:maxCount 1 ;
        sh:datatype xsd:string ;
    ] ;
    sh:property [
        sh:path ex:email ;
        sh:pattern "^[^@]+@[^@]+$" ;
        sh:severity sh:Warning ;
    ] .

SHACL-SPARQL extends validation with custom queries:

ex:UniqueEmailConstraint a sh:SPARQLConstraint ;
    sh:message "Email must be unique" ;
    sh:select """
        SELECT $this ?email
        WHERE {
            $this ex:email ?email .
            ?other ex:email ?email .
            FILTER ($this != ?other)
        }
    """ .

AI/LLM Integration Architecture (Cagle's Pattern)

The Problem: LLMs can generate SPARQL, but require deep ontology knowledge.

Cagle's Solution: Expose pre-written SPARQL through an API layer with SHACL describing parameters.

User Query → LLM → API Endpoint Selection → Parameterized SPARQL → Results → LLM → Natural Language

Context-Free SPARQL Pattern

# Parameterized query with VALUES injection
SELECT ?entity ?label ?description
WHERE {
  VALUES ?searchTerm { $SEARCH_TERM }

  ?entity a ?type ;
          rdfs:label|skos:prefLabel ?label .

  OPTIONAL { ?entity rdfs:comment|dcterms:description ?description }

  # Use text index if available (Lucene/Elasticsearch)
  # ?entity text:query ?searchTerm .

  FILTER (CONTAINS(LCASE(?label), LCASE(?searchTerm)))
}
LIMIT 20

Service Response Format

{
  "question": "Who wrote The Ontologist?",
  "answer": "Kurt Cagle writes The Ontologist newsletter.",
  "source": "http://example.org/graph/ontologist-metadata",
  "sparql": "SELECT ?author WHERE { ex:TheOntologist ex:author ?author }"
}

Performance Guidelines (DuCharme's Wisdom)

"When you keep in mind the amount of work that each part of your query asks a SPARQL processor to perform, it helps you create queries that run faster."

Query Optimization

Place restrictive patterns first in WHERE clause
Move OPTIONAL after restrictive patterns
Avoid FILTER on large result sets—use triple patterns instead
Use text indexes instead of REGEX for string searches
Be cautious with property paths (*, +) in large datasets
Use LIMIT early when exploring data
Prefer BIND over complex SELECT expressions

Anti-Patterns

# BAD: Filter on unrestricted pattern
SELECT ?s ?label
WHERE {
  ?s ?p ?o .
  FILTER (?p = rdfs:label)
  BIND (STR(?o) AS ?label)
}

# GOOD: Direct triple pattern
SELECT ?s ?label
WHERE {
  ?s rdfs:label ?label .
}

RDF-star / SPARQL-star (Emerging Standard)

RDF-star enables statements about statements using quoted triples:

# RDF-star syntax
<< ex:Kurt ex:wrote ex:TheOntologist >> ex:since "2020" .

# Annotation shorthand
ex:Kurt ex:wrote ex:TheOntologist {| ex:since "2020" |} .

# SPARQL-star query
SELECT ?author ?work ?since
WHERE {
  << ?author ex:wrote ?work >> ex:since ?since .
}

# Constructing quoted triples
SELECT (TRIPLE(?s, ?p, ?o) AS ?statement)
WHERE { ?s ?p ?o }

Common Patterns

Find All Classes

SELECT DISTINCT ?class ?label
WHERE {
  { ?class a rdfs:Class } UNION { ?class a owl:Class }
  OPTIONAL { ?class rdfs:label ?label }
}

Instance Count by Class

SELECT ?class (COUNT(?instance) AS ?count)
WHERE {
  ?instance a ?class .
}
GROUP BY ?class
ORDER BY DESC(?count)

Property Discovery

SELECT DISTINCT ?property ?domain ?range
WHERE {
  ?property a rdf:Property .
  OPTIONAL { ?property rdfs:domain ?domain }
  OPTIONAL { ?property rdfs:range ?range }
}

Hierarchical Traversal

# All superclasses of a class
SELECT ?superclass
WHERE {
  ex:SpecificClass rdfs:subClassOf+ ?superclass .
}

# All subclasses (inverse)
SELECT ?subclass
WHERE {
  ?subclass rdfs:subClassOf+ ex:GeneralClass .
}

Data Quality Check

# Find resources missing required properties
SELECT ?resource
WHERE {
  ?resource a ex:Person .
  FILTER NOT EXISTS { ?resource ex:name ?name }
}

Output Considerations

When generating SPARQL:

Always include PREFIX declarations for readability
Use meaningful variable names (?author not ?x)
Add comments for complex patterns
Format with consistent indentation
Consider result size—include LIMIT for exploration queries
Handle language tags explicitly when dealing with labels

Agent Skills: SPARQL & Semantic Web Skill

Install this agent skill to your local

Skill Files