Skip to content

Commit eb90eaa

Browse files
Merge pull request #81 from microsoft/v1.9.5
V1.9.5
2 parents 7089f2a + 067d2f5 commit eb90eaa

File tree

12 files changed

+1950
-277
lines changed

12 files changed

+1950
-277
lines changed

CHANGELOG.md

Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,135 @@
11
# Changelog
22

3+
## v1.9.5
4+
5+
### Materialized Lake View Support
6+
7+
#### New materialization: `materialized_lake_view`
8+
9+
dbt-fabricspark now supports [Materialized Lake Views](https://learn.microsoft.com/en-us/fabric/data-engineering/materialized-lake-views/materialized-lake-views) as a first-class materialization. MLVs are precomputed, incrementally-maintained views in Fabric lakehouses that accelerate queries over Delta tables without manual refresh pipelines.
10+
11+
**Requirements:**
12+
- Fabric Runtime 1.3+ (Apache Spark ≥ 3.5)
13+
- Schema-enabled lakehouse
14+
15+
**Model configuration:**
16+
17+
```sql
18+
{{ config(
19+
materialized='materialized_lake_view',
20+
database='my_lakehouse',
21+
schema='dbo',
22+
mlv_on_demand=true,
23+
mlv_schedule={
24+
"enabled": true,
25+
"configuration": {
26+
"startDateTime": "2026-04-10T00:00:00",
27+
"endDateTime": "2027-04-10T00:00:00",
28+
"localTimeZoneId": "Central Standard Time",
29+
"type": "Daily",
30+
"times": ["06:00"]
31+
}
32+
},
33+
mlv_comment='Customer summary refreshed daily',
34+
partitioned_by=['region'],
35+
mlv_constraints=[
36+
{"name": "amount_positive", "expression": "amount > 0", "on_mismatch": "DROP"}
37+
],
38+
tblproperties={"delta.autoOptimize.optimizeWrite": "true"}
39+
) }}
40+
41+
select * from {{ ref('orders') }}
42+
```
43+
44+
**Config options:**
45+
46+
| Option | Type | Required | Description |
47+
|---|---|---|---|
48+
| `mlv_on_demand` | bool | At least one of `mlv_on_demand` or `mlv_schedule` | Trigger an immediate refresh after creation |
49+
| `mlv_schedule` | dict | At least one of `mlv_on_demand` or `mlv_schedule` | Schedule config for periodic refresh. Must include `endDateTime` |
50+
| `mlv_comment` | string | No | Description added to the view |
51+
| `partitioned_by` | list | No | Partition columns |
52+
| `mlv_constraints` | list | No | CHECK constraints with optional `on_mismatch` (DROP or FAIL) |
53+
| `tblproperties` | dict | No | Delta table properties |
54+
55+
---
56+
57+
#### Automatic Change Data Feed (CDF) enablement
58+
59+
MLVs require Change Data Feed on all upstream Delta tables. The adapter automatically enables CDF on every source table before creating the view:
60+
61+
```sql
62+
ALTER TABLE <source> SET TBLPROPERTIES (delta.enableChangeDataFeed = true)
63+
```
64+
65+
This is always-on and not user-configurable.
66+
67+
---
68+
69+
#### On-demand refresh with job polling
70+
71+
When `mlv_on_demand: true`, the adapter triggers an immediate refresh via the Fabric Job Scheduler API and polls until the job reaches a terminal status:
72+
73+
1. `POST .../jobs/RefreshMaterializedLakeViews/instances` → 202 Accepted
74+
2. Extract job instance ID from `Location` header
75+
3. Poll `GET .../jobs/instances/{jobInstanceId}` using `poll_statement_wait` interval (default: 5s)
76+
4. Wait up to `statement_timeout` (default: 3600s)
77+
5. Return on `Completed`, raise `MLVApiError` on `Failed`, `Cancelled`, or `Deduped`
78+
79+
Terminal statuses follow the Fabric `ItemJobStatus` enum: `NotStarted`, `InProgress`, `Completed`, `Failed`, `Cancelled`, `Deduped`.
80+
81+
---
82+
83+
#### Schedule management (create / update / delete)
84+
85+
When `mlv_schedule` is provided, the adapter creates or updates a refresh schedule via the Fabric REST API. The operation is idempotent — if a schedule already exists, it is updated in place.
86+
87+
Supported schedule types:
88+
- **Cron**`interval` in minutes
89+
- **Daily** — list of `times` (e.g., `["06:00", "18:00"]`)
90+
- **Weekly**`weekdays` and `times`
91+
92+
The `endDateTime` field is mandatory in the schedule configuration. The adapter validates its presence before calling the API and raises a clear error if missing.
93+
94+
---
95+
96+
#### Automatic lakehouse ID resolution
97+
98+
The adapter resolves the lakehouse name (from `database` config or `target.lakehouse`) to a lakehouse ID automatically via `GET /v1/workspaces/{workspaceId}/lakehouses`. Results are cached per workspace for the duration of the run. No manual `mlv_lakehouse_id` configuration is required.
99+
100+
---
101+
102+
#### Preflight validation (connection open)
103+
104+
MLV prerequisites are validated eagerly at connection open time (after Spark version detection). The adapter checks:
105+
106+
1. **Not running in local/Docker mode** — MLV requires Fabric Runtime
107+
2. **Spark version ≥ 3.5** — checked via `SELECT split(version(), ' ')[0]`
108+
3. **Schema-enabled lakehouse** — detected automatically on connection open
109+
110+
If any check fails, a warning is logged immediately and the error is cached. When an MLV model executes, it reads the cached error and fails instantly with a clear message — no wasted time running models that cannot succeed. Non-MLV projects are completely unaffected.
111+
112+
---
113+
114+
#### Delta source validation
115+
116+
At model execution time (before `CREATE OR REPLACE`), the adapter checks that all upstream tables referenced by the MLV are Delta format. Non-Delta sources (e.g., views, CSV tables) cause an immediate model failure with a descriptive error.
117+
118+
---
119+
120+
#### REST API error handling with retries
121+
122+
All Fabric REST API calls use automatic retries with exponential backoff:
123+
124+
- **3 attempts** per operation
125+
- **Exponential backoff:** 2s, 4s, 8s between retries
126+
- **Retryable:** HTTP 429, 500, 502, 503, 504, connection errors, timeouts
127+
- **Non-retryable:** HTTP 4xx client errors (except 429)
128+
129+
Errors surface as `MLVApiError` (extends `DbtRuntimeError`) with the operation name, HTTP status, and parsed Fabric error details. Failed API calls always fail the model.
130+
131+
---
132+
3133
## v1.9.3
4134

5135
### Session Lifecycle & Stability

README.md

Lines changed: 208 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,6 @@ dbt is the T in ELT. Organize, cleanse, denormalize, filter, rename, and pre-agg
2222

2323
The `dbt-fabricspark` package contains all of the code enabling dbt to work with Apache Spark in Microsoft Fabric. This adapter connects to Fabric Lakehouses via Livy endpoints and supports both **schema-enabled** and **non-schema** Lakehouse configurations.
2424

25-
**Current version: `1.9.3`**
26-
2725
### Key Features
2826

2927
- **Livy session management** with session reuse across dbt runs
@@ -49,6 +47,14 @@ pip install dbt-fabricspark
4947

5048
Use a Livy endpoint to connect to Apache Spark in Microsoft Fabric. Configure your `profiles.yml` to connect via Livy endpoints.
5149

50+
### Connection Modes
51+
52+
The adapter supports two connection modes via the `livy_mode` setting:
53+
54+
- **Local mode** (`livy_mode: local`) — Connects to a self-hosted Spark instance running in a Docker container (contributed by @mdrakiburrahman). This mode supports the `reuse_session` flag and does not require Fabric compute, making it ideal for offline development and testing.
55+
56+
- **Fabric mode** (`livy_mode: fabric`, default) — Connects to Apache Spark in Microsoft Fabric via the Fabric Livy API. For development workflows, enable `reuse_session: true` to persist the Livy session ID to a local file (configured via `session_id_file`, defaults to `./livy-session-id.txt`). On subsequent `dbt` runs, the adapter reuses the existing session from the persisted file instead of creating a new one. If the file does not exist or the session has expired, a new session is created automatically.
57+
5258
### Lakehouse without Schema
5359

5460
For standard Lakehouses (schema not enabled), use two-part naming. The `schema` field is set to the lakehouse name:
@@ -260,6 +266,206 @@ In this example:
260266
| **Service Principal** | `SPN` | CI/CD and automation. Uses Azure AD app registration. | `client_id`, `tenant_id`, `client_secret` |
261267
| **Fabric Notebook** | `fabric_notebook` | Running dbt inside a Fabric notebook. Uses `notebookutils.credentials`. | None (runs in Fabric runtime) |
262268

269+
### Materialized Lake Views
270+
271+
[Materialized lake views](https://learn.microsoft.com/en-us/fabric/data-engineering/materialized-lake-views/overview-materialized-lake-view) are a Fabric-native construct that materializes a SQL query as a Delta table in your lakehouse, with automatic lineage-based refresh managed by Fabric.
272+
273+
#### Prerequisites
274+
275+
- Schema-enabled lakehouse
276+
- Fabric Runtime 1.3+
277+
- Source tables must be Delta tables
278+
279+
#### Basic Usage
280+
281+
```sql
282+
-- models/silver/silver_cleaned_orders.sql
283+
{{ config(
284+
materialized='materialized_lake_view',
285+
database='silver',
286+
schema='dbo'
287+
) }}
288+
289+
SELECT
290+
o.order_id,
291+
o.product_id,
292+
p.product_name,
293+
o.quantity,
294+
p.price,
295+
o.quantity * p.price AS revenue
296+
FROM {{ ref('bronze_orders') }} o
297+
JOIN {{ ref('bronze_products') }} p
298+
ON o.product_id = p.product_id
299+
```
300+
301+
#### Configuration Options
302+
303+
| Option | Type | Default | Description |
304+
|--------|------|---------|-------------|
305+
| `materialized` | string || Must be `'materialized_lake_view'` |
306+
| `database` | string | target lakehouse | Target lakehouse for cross-lakehouse writes |
307+
| `schema` | string | target schema | Target schema within the lakehouse |
308+
| `partitioned_by` | list || Columns to partition the MLV by |
309+
| `mlv_comment` | string || Description stored with the MLV definition |
310+
| `mlv_constraints` | list | `[]` | Data quality constraints (see below) |
311+
| `tblproperties` | dict || Key-value metadata properties |
312+
| `enable_cdf` | bool | `true` | Auto-enable Change Data Feed on source tables |
313+
| `mlv_on_demand` | bool | `false` | Trigger immediate refresh after creation |
314+
| `mlv_schedule` | dict || Schedule config for periodic refresh (see below) |
315+
316+
#### Data Quality Constraints
317+
318+
```sql
319+
{{ config(
320+
materialized='materialized_lake_view',
321+
mlv_constraints=[
322+
{"name": "valid_quantity", "expression": "quantity > 0", "on_mismatch": "DROP"},
323+
{"name": "valid_price", "expression": "price >= 0", "on_mismatch": "FAIL"}
324+
]
325+
) }}
326+
```
327+
328+
Each constraint has:
329+
- `name` — Constraint identifier
330+
- `expression` — Boolean expression each row must satisfy
331+
- `on_mismatch``DROP` (silently remove violating rows) or `FAIL` (stop refresh with error, default)
332+
333+
#### Change Data Feed
334+
335+
The adapter automatically enables [Change Data Feed](https://learn.microsoft.com/en-us/azure/databricks/delta/delta-change-data-feed) (CDF) on all upstream source tables referenced via `ref()` before creating the MLV. This enables [optimal incremental refresh](https://learn.microsoft.com/en-us/fabric/data-engineering/materialized-lake-views/refresh-materialized-lake-view). To disable:
336+
337+
```sql
338+
{{ config(
339+
materialized='materialized_lake_view',
340+
enable_cdf=false
341+
) }}
342+
```
343+
344+
#### On-Demand Refresh
345+
346+
Trigger an immediate MLV lineage refresh after creation:
347+
348+
```sql
349+
{{ config(
350+
materialized='materialized_lake_view',
351+
mlv_on_demand=true
352+
) }}
353+
```
354+
355+
This calls the Fabric Job Scheduler API:
356+
```
357+
POST /v1/workspaces/{workspaceId}/lakehouses/{lakehouseId}/jobs/RefreshMaterializedLakeViews/instances
358+
```
359+
360+
#### Scheduled Refresh
361+
362+
Create or update a periodic refresh schedule. The adapter uses the [Fabric Job Scheduler API](https://learn.microsoft.com/en-us/fabric/data-engineering/materialized-lake-views/materialized-lake-views-public-api) to manage schedules. Only one active schedule per lakehouse lineage is supported — the adapter automatically updates an existing schedule if one is found.
363+
364+
**Cron schedule** (interval in minutes):
365+
366+
```sql
367+
{{ config(
368+
materialized='materialized_lake_view',
369+
mlv_schedule={
370+
"enabled": true,
371+
"configuration": {
372+
"startDateTime": "2026-04-10T00:00:00",
373+
"endDateTime": "2026-12-31T23:59:59",
374+
"localTimeZoneId": "Central Standard Time",
375+
"type": "Cron",
376+
"interval": 10
377+
}
378+
}
379+
) }}
380+
```
381+
382+
**Daily schedule** (specific times):
383+
384+
```sql
385+
{{ config(
386+
materialized='materialized_lake_view',
387+
mlv_schedule={
388+
"enabled": true,
389+
"configuration": {
390+
"startDateTime": "2026-04-10T00:00:00",
391+
"endDateTime": "2026-12-31T23:59:59",
392+
"localTimeZoneId": "Central Standard Time",
393+
"type": "Daily",
394+
"times": ["06:00", "18:00"]
395+
}
396+
}
397+
) }}
398+
```
399+
400+
**Weekly schedule** (specific days and times):
401+
402+
```sql
403+
{{ config(
404+
materialized='materialized_lake_view',
405+
mlv_schedule={
406+
"enabled": true,
407+
"configuration": {
408+
"startDateTime": "2026-04-10T00:00:00",
409+
"endDateTime": "2026-12-31T23:59:59",
410+
"localTimeZoneId": "Central Standard Time",
411+
"type": "Weekly",
412+
"weekdays": ["Monday", "Wednesday", "Friday"],
413+
"times": ["08:00"]
414+
}
415+
}
416+
) }}
417+
```
418+
419+
#### Full Example with All Options
420+
421+
```sql
422+
{{ config(
423+
materialized='materialized_lake_view',
424+
database='gold',
425+
schema='dbo',
426+
partitioned_by=['product_type'],
427+
mlv_comment='Product sales summary with quality checks',
428+
mlv_constraints=[
429+
{"name": "positive_revenue", "expression": "total_revenue >= 0", "on_mismatch": "DROP"}
430+
],
431+
tblproperties={"quality_tier": "gold"},
432+
enable_cdf=true,
433+
mlv_on_demand=true
434+
) }}
435+
436+
SELECT
437+
product_id,
438+
product_name,
439+
product_type,
440+
SUM(quantity) AS total_quantity_sold,
441+
SUM(revenue) AS total_revenue
442+
FROM {{ ref('silver_order_items') }}
443+
GROUP BY product_id, product_name, product_type
444+
```
445+
446+
Generated SQL:
447+
```sql
448+
CREATE OR REPLACE MATERIALIZED LAKE VIEW gold.dbo.product_sales_summary
449+
(
450+
CONSTRAINT positive_revenue CHECK (total_revenue >= 0) ON MISMATCH DROP
451+
)
452+
PARTITIONED BY (product_type)
453+
COMMENT 'Product sales summary with quality checks'
454+
TBLPROPERTIES ("quality_tier"="gold")
455+
AS
456+
SELECT ...
457+
```
458+
459+
#### Limitations
460+
461+
- **No ALTER on definition** — Changing the SELECT query, constraints, or partitioning requires drop + recreate. The adapter uses `CREATE OR REPLACE` which handles this automatically.
462+
- **Only RENAME via ALTER**`ALTER MATERIALIZED LAKE VIEW ... RENAME TO ...` is the only supported ALTER operation.
463+
- **No DML**`INSERT`, `UPDATE`, `DELETE` are not supported on MLVs.
464+
- **No UDFs** — User-defined functions are not supported in the SELECT query.
465+
- **No time-travel**`VERSION AS OF` / `TIMESTAMP AS OF` syntax is not supported.
466+
- **No temp views as sources** — The SELECT query can reference tables and other MLVs, but not temporary views.
467+
- **Schedule is per-lakehouse** — One active schedule per lakehouse lineage, not per MLV.
468+
263469
## Reporting bugs and contributing code
264470

265471
- Want to report a bug or request a feature? Let us know on [Slack](http://community.getdbt.com/), or open [an issue](https://github.com/microsoft/dbt-fabricspark/issues/new)
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
version = "1.9.4"
1+
version = "1.9.5"

0 commit comments

Comments
 (0)