Agent Skills: Polars Fast DataFrame Library

Use when "Polars", "fast dataframe", "lazy evaluation", "Arrow backend", or asking about "pandas alternative", "parallel dataframe", "large CSV processing", "ETL pipeline", "expression API"

UncategorizedID: eyadsibai/ltk/polars

Install this agent skill to your local

pnpm dlx add-skill https://github.com/eyadsibai/ltk/tree/HEAD/plugins/ltk-data/skills/polars

Skill Files

Browse the full folder contents for polars.

Download Skill

Loading file tree…

plugins/ltk-data/skills/polars/SKILL.md

Skill Metadata

Name
polars
Description
Use when "Polars", "fast dataframe", "lazy evaluation", "Arrow backend", or asking about "pandas alternative", "parallel dataframe", "large CSV processing", "ETL pipeline", "expression API"

Polars Fast DataFrame Library

Lightning-fast DataFrame library with lazy evaluation and parallel execution.

When to Use

  • Pandas is too slow for your dataset
  • Working with 1-100GB datasets that fit in RAM
  • Need lazy evaluation for query optimization
  • Building ETL pipelines
  • Want parallel execution without extra config

Lazy vs Eager Evaluation

| Mode | Function | Executes | Use Case | |------|----------|----------|----------| | Eager | read_csv() | Immediately | Small data, exploration | | Lazy | scan_csv() | On .collect() | Large data, pipelines |

Key concept: Lazy mode builds a query plan that gets optimized before execution. The optimizer applies predicate pushdown (filter early) and projection pushdown (select columns early).


Core Operations

Data Selection

| Operation | Purpose | |-----------|---------| | select() | Choose columns | | filter() | Choose rows by condition | | with_columns() | Add/modify columns | | drop() | Remove columns | | head(n) / tail(n) | First/last n rows |

Aggregation

| Operation | Purpose | |-----------|---------| | group_by().agg() | Group and aggregate | | pivot() | Reshape wide | | melt() | Reshape long | | unique() | Distinct values |

Joins

| Join Type | Description | |-----------|-------------| | inner | Matching rows only | | left | All left + matching right | | outer | All rows from both | | cross | Cartesian product | | semi | Left rows with match | | anti | Left rows without match |


Expression API

Key concept: Polars uses expressions (pl.col()) instead of indexing. Expressions are lazily evaluated and optimized.

Common Expressions

| Expression | Purpose | |------------|---------| | pl.col("name") | Reference column | | pl.lit(value) | Literal value | | pl.all() | All columns | | pl.exclude(...) | All except |

Expression Methods

| Category | Methods | |----------|---------| | Aggregation | .sum(), .mean(), .min(), .max(), .count() | | String | .str.contains(), .str.replace(), .str.to_lowercase() | | DateTime | .dt.year(), .dt.month(), .dt.day() | | Conditional | .when().then().otherwise() | | Window | .over(), .rolling_mean(), .shift() |


Pandas Migration

| Pandas | Polars | |--------|--------| | df['col'] | df.select('col') | | df[df['col'] > 5] | df.filter(pl.col('col') > 5) | | df['new'] = df['col'] * 2 | df.with_columns((pl.col('col') * 2).alias('new')) | | df.groupby('col').mean() | df.group_by('col').agg(pl.all().mean()) | | df.apply(func) | df.map_rows(func) (avoid if possible) |

Key concept: Polars prefers explicit operations over implicit indexing. Use .alias() to name computed columns.


File I/O

| Format | Read | Write | Notes | |--------|------|-------|-------| | CSV | read_csv() / scan_csv() | write_csv() | Human readable | | Parquet | read_parquet() / scan_parquet() | write_parquet() | Fast, compressed | | JSON | read_json() / scan_ndjson() | write_json() | Newline-delimited | | IPC/Arrow | read_ipc() / scan_ipc() | write_ipc() | Zero-copy |

Key concept: Use Parquet for performance. Use scan_* for large files to enable lazy optimization.


Performance Tips

| Tip | Why | |-----|-----| | Use lazy mode | Query optimization | | Use Parquet | Column-oriented, compressed | | Select columns early | Projection pushdown | | Filter early | Predicate pushdown | | Avoid Python UDFs | Breaks parallelism | | Use expressions | Vectorized operations | | Set dtypes on read | Avoid inference overhead |


vs Alternatives

| Tool | Best For | Limitations | |------|----------|-------------| | Polars | 1-100GB, speed critical | Must fit in RAM | | Pandas | Small data, ecosystem | Slow, memory hungry | | Dask | Larger than RAM | More complex API | | Spark | Cluster computing | Infrastructure overhead | | DuckDB | SQL interface | Different API style |

Resources