fenic 0.6.0 expands in three directions: broader model support (Claude 4.5, GPT-5.1, Gemini 3 Pro), significant new DataFrame operations (array functions, regex, explode variants, distinct aggregations), and infrastructure improvements (LLM response caching, configurable timeouts, parallel with_columns).
What's in it for you
- Persistent LLM response caching with SQLite-backed storage — cut costs and latency during iterative development and repeated batch runs.
- Claude 4.5, GPT-5.1, and Gemini 3 Pro support with reasoning controls and extended context windows.
- 15 array functions and 5 regex functions following PySpark semantics — manipulate arrays and text without leaving the DataFrame.
- Explode variants (
explode_outer,posexplode,posexplode_outer) that preserve rows with null or empty arrays. - Distinct aggregations (
count_distinct,approx_count_distinct,sum_distinct) for precise analytics. - Configurable timeouts on all semantic operators — set per-call limits instead of relying on a global default.
- PDF parsing on OpenAI and OpenRouter with three configurable parsing engines.
- Fenic Agents for Claude Code — a feature developer agent and a PR review agent ship with the repo.
LLM response caching
All semantic LLM calls can now be cached with a persistent SQLite-backed store. Cache keys are built from the full request signature (model, messages, temperature, schema fingerprint) to ensure correctness.
from fenic.api.session.config import SessionConfig, SemanticConfig, LLMResponseCacheConfig
config = SessionConfig(
app_name="my_app",
semantic=SemanticConfig(
language_models={ ... },
llm_response_cache=LLMResponseCacheConfig(
enabled=True,
ttl="1h",
max_size_mb=1000,
)
)
)Key properties:
- Configurable TTL with human-readable durations (
"30m","1h","7d") - Automatic LRU eviction when cache exceeds size limits
- Thread-safe with SQLite WAL mode and connection pooling
- Graceful degradation — cache errors never break pipelines
- Automatic corruption recovery
When to use it
- Iterative prompt development where you re-run the same extraction across the same data.
- Repeated batch processing where inputs don't change between runs.
- Cost-sensitive workloads where redundant LLM calls add up fast.
New model support
Claude 4.5
claude-opus-4-5 and claude-haiku-4-5 with reasoning support and 200K context.
GPT-5.1
gpt-5.1 with 400K context. Reasoning can be fully disabled by setting reasoning_effort="none", allowing custom temperature control.
Gemini 3 Pro Preview
Supports thinking_level parameter (HIGH/LOW) instead of token budgets. Media resolution parameter for parse_pdf() (low/medium/high). Auto-creates "low" and "high" profiles.
Array functions
15 new array manipulation functions matching PySpark semantics:
import fenic.api.functions as fc
# Set operations
fc.arr.union("a", "b")
fc.arr.intersect("a", "b")
fc.arr.array_except("a", "b")
# Manipulation
fc.arr.distinct("nums")
fc.arr.sort("nums")
fc.arr.compact("nums") # remove nulls
fc.arr.flatten("nested")
fc.arr.slice("nums", 0, 3)
fc.arr.remove("nums", 1)
fc.arr.reverse("nums")
fc.arr.repeat(lit(0), 5)
# Access and query
fc.arr.element_at("nums", 2)
fc.arr.min("nums")
fc.arr.max("nums")
fc.arr.arrays_overlap("a", "b")Regular expression functions
Five regex text functions following PySpark conventions. Patterns are validated at plan construction time using Rust regex syntax.
import fenic.api.functions as fc
fc.text.regexp_count("text", r"\d+") # count matches
fc.text.regexp_extract("text", r"(\d+)", 1) # extract capture group
fc.text.regexp_extract_all("text", r"(\d+)", 1) # extract all matches
fc.text.regexp_instr("text", r"\d+", 0) # 1-based position of match
fc.text.regexp_substr("text", r"\d+") # first matching substringExplode variants
PySpark-compatible outer explode semantics that preserve rows with null or empty arrays, plus position-aware variants:
df = session.create_dataframe({
"id": [1, 2, 3],
"tags": [["red", "blue"], [], None],
})
# Regular explode — drops rows with empty/null arrays
df.explode("tags") # id=1: red, blue
# Outer explode — preserves all rows, yields null for empty/null
df.explode_outer("tags") # id=1: red, blue; id=2: null; id=3: null
# Position-aware explode
df.posexplode("tags") # id=1: (0, red), (1, blue)
# Position-aware outer explode
df.posexplode_outer("tags") # all rows, null position/value for empty/nullDistinct aggregation functions
Three new aggregation functions plus a distinct() method on DataFrame as an alias for drop_duplicates().
import fenic.api.functions as fc
fc.count_distinct("category") # exact distinct count
fc.approx_count_distinct("category") # HyperLogLog++ approximation
fc.sum_distinct("value") # sum of unique valuesConfigurable timeouts for semantic operations
All semantic operators now accept an optional request_timeout parameter (in seconds). Default remains 120 seconds.
# Long-running extraction with generous timeout
df.semantic.extract(
"long_document",
schema=MySchema,
request_timeout=300.0 # 5 minutes
)
# Quick sentiment check with tight timeout
df.semantic.analyze_sentiment("tweet", request_timeout=30.0)Applies to: map, extract, predicate, reduce, classify, analyze_sentiment, summarize, parse_pdf, join, and sim_join.
Series support and parallel with_columns
Polars and pandas Series can now be passed directly to with_columns without manual conversion. Series length must match DataFrame height.
import polars as pl
import pandas as pd
df.with_columns({
"bonus": pl.Series([100, 200]),
"score": pd.Series([85.5, 92.0]),
"double_age": col("age") * 2,
})Non-Column values are auto-wrapped with lit(). Also available as PySpark-compatible withColumns().
PDF parsing expanded to OpenAI and OpenRouter
OpenAI models can now be used with semantic.parse_pdf(), with proper token counting and estimation for PDF files.
OpenRouter gains PDF parsing with three configurable parsing engines:
native— use the model's built-in file processingmistralai-ocr— MistralAI OCR ($2/1000 pages)pdf-text— free text extraction for well-structured PDFs
df.semantic.parse_pdf(
"pdf_column",
model=ModelAlias(name="openrouter", profile="ocr")
)A new evaluation harness (tools/eval_parse_pdf/) benchmarks PDF-to-Markdown conversion across multiple models, scoring on text fidelity (Levenshtein-based fuzzy matching) and document structure fidelity (F1 across headings, lists, tables, code blocks).
Fenic Agents for Claude Code
Two specialized Claude Code agents ship with the repo:
- Feature Developer Agent — guides implementation of new operations, expressions, and features following fenic's architecture patterns.
- PR Review Agent — reviews pull requests against team conventions and development guidelines.
A comprehensive development guide (.claude/AGENT_DEVELOPMENT_GUIDE.md) provides architecture walkthroughs, example implementations, and troubleshooting guidance.
Bug fixes
- OpenRouter API compatibility — fixed assumptions about model attributes returned by the API.
- Gemini Vertex 2.0 Flash profiles — these models don't support profiles; fixed accordingly.
- LanceDB pinned to avoid a breaking change in v0.25.0.
- Semantic parse temperature — higher temperature now only applied for Google models, not all providers.
- OpenRouter batch completions — fixed client breakage after LMRequestMessages changes.
parse_pdfreturn type — now correctly returns aMarkdownTypecolumn.- Embedded image links in 120-second demo notebooks fixed.
Try it out and tell us what you build
pip install --upgrade fenicRead the latest docs at docs.fenic.ai. Questions or ideas — file an issue with a small reproduction case.
