databricks-apps
PublicRepository: databricks/databricks-agent-skills
Low Risk
No security issues found
Skill manifest does not include a 'license' field. Specifying a license helps users understand usage terms.
Remediation Add 'license' field to SKILL.md frontmatter (e.g., MIT, Apache-2.0)
Description
Build apps on Databricks Apps platform. Use when asked to create dashboards, data apps, analytics tools, or visualizations. Invoke BEFORE starting implementation.
Details
Requires databricks CLI (>= v0.292.0)
- version
- 0.1.0
Skill Files
# Databricks Apps Development
**FIRST**: Use the parent `databricks` skill for CLI basics, authentication, and profile selection.
Build apps that deploy to Databricks Apps platform.
## Required Reading by Phase
| Phase | READ BEFORE proceeding |
|-------|------------------------|
| Scaffolding | Parent `databricks` skill (auth, warehouse discovery); run `databricks apps manifest` and use its plugins/resources to build `databricks apps init` with `--features` and `--set` (see AppKit section below) |
| Writing SQL queries | [SQL Queries Guide](references/appkit/sql-queries.md) |
| Writing UI components | [Frontend Guide](references/appkit/frontend.md) |
| Using `useAnalyticsQuery` | [AppKit SDK](references/appkit/appkit-sdk.md) |
| Adding API endpoints | [tRPC Guide](references/appkit/trpc.md) |
| Using Lakebase (OLTP database) | [Lakebase Guide](references/appkit/lakebase.md) |
## Generic Guidelines
These apply regardless of framework:
- **Deployment**: `databricks apps deploy --profile <PROFILE>` (⚠️ USER CONSENT REQUIRED)
- **Validation**: `databricks apps validate --profile <PROFILE>` before deploying
- **App name**: Must be ≤26 characters, lowercase letters/numbers/hyphens only (no underscores). dev- prefix adds 4 chars, max 30 total.
- **Smoke tests**: ALWAYS update `tests/smoke.spec.ts` selectors BEFORE running validation. Default template checks for "Minimal Databricks App" heading and "hello world" text — these WILL fail in your custom app. See [testing guide](references/testing.md).
- **Authentication**: covered by parent `databricks` skill
## Project Structure (after `databricks apps init --features analytics`)
- `client/src/App.tsx` — main React component (start here)
- `config/queries/*.sql` — SQL query files (queryKey = filename without .sql)
- `server/server.ts` — backend entry (tRPC routers)
- `tests/smoke.spec.ts` — smoke test (⚠️ MUST UPDATE selectors for your app)
- `client/src/appKitTypes.d.ts` — auto-generated types (`npm run typegen`)
## Project Structure (after `databricks apps init --features lakebase`)
- `server/server.ts` — backend with Lakebase pool + tRPC routes
- `client/src/App.tsx` — React frontend
- `app.yaml` — manifest with `database` resource declaration
- `package.json` — includes `@databricks/lakebase` dependency
- Note: **No `config/queries/`** — Lakebase apps use `pool.query()` in tRPC, not SQL files
## Data Discovery
Before writing any SQL, use the parent `databricks` skill for data exploration — search `information_schema` by keyword, then batch `discover-schema` for the tables you need. Do NOT skip this step.
## Development Workflow (FOLLOW THIS ORDER)
**Analytics apps** (`--features analytics`):
1. Create SQL files in `config/queries/`
2. Run `npm run typegen` — verify all queries show ✓
3. Read `client/src/appKitTypes.d.ts` to see generated types
4. **THEN** write `App.tsx` using the generated types
5. Update `tests/smoke.spec.ts` selectors
6. Run `databricks apps validate --profile <PROFILE>`
**DO NOT** write UI code before running typegen — types won't exist and you'll waste time on compilation errors.
**Lakebase apps** (`--features lakebase`): No SQL files or typegen. See [Lakebase Guide](references/appkit/lakebase.md) for the tRPC pattern: initialize schema at startup, write procedures in `server/server.ts`, then build the React frontend.
## When to Use What
- **Read analytics data → display in chart/table**: Use visualization components with `queryKey` prop
- **Read analytics data → custom display (KPIs, cards)**: Use `useAnalyticsQuery` hook
- **Read analytics data → need computation before display**: Still use `useAnalyticsQuery`, transform client-side
- **Read/write persistent data (users, orders, CRUD state)**: Use Lakebase pool via tRPC — see [Lakebase Guide](references/appkit/lakebase.md)
- **Call ML model endpoint**: Use tRPC
- **⚠️ NEVER use tRPC to run SELECT queries against the warehouse** — always use SQL files in `config/queries/`
- **⚠️ NEVER use `useAnalyticsQuery` for Lakebase data** — it queries the SQL warehouse only
## Frameworks
### AppKit (Recommended)
TypeScript/React framework with type-safe SQL queries and built-in components.
**Official Documentation** — the source of truth for all API details:
```bash
npx @databricks/appkit docs # ← ALWAYS start here to see available pages
npx @databricks/appkit docs <query> # view a section by name or doc path
npx @databricks/appkit docs --full # full index with all API entries
npx @databricks/appkit docs "appkit-ui API reference" # example: section by name
npx @databricks/appkit docs ./docs/plugins/analytics.md # example: specific doc file
```
**DO NOT guess doc paths.** Run without args first, pick from the index. The `<query>` argument accepts both section names (from the index) and file paths. Docs are the authority on component props, hook signatures, and server APIs — skill files only cover anti-patterns and gotchas.
**App Manifest and Scaffolding**
**Agent workflow for scaffolding: get the manifest first, then build the init command.**
1. **Get the manifest** (JSON schema describing plugins and their resources):
```bash
databricks apps manifest --profile <PROFILE>
# Custom template:
databricks apps manifest --template <GIT_URL> --profile <PROFILE>
```
The output defines:
- **Plugins**: each has a key (plugin ID for `--features`), plus `requiredByTemplate`, and `resources`.
- **requiredByTemplate**: If **true**, that plugin is **mandatory** for this template — do **not** add it to `--features` (it is included automatically); you must still supply all of its required resources via `--set`. If **false** or absent, the plugin is **optional** — add it to `--features` only when the user's prompt indicates they want that capability (e.g. analytics/SQL), and then supply its required resources via `--set`.
- **Resources**: Each plugin has `resources.required` and `resources.optional` (arrays). Each item has `resourceKey` and `fields` (object: field name → description/env). Use `--set <plugin>.<resourceKey>.<field>=<value>` for each required resource field of every plugin you include.
2. **Scaffold** (DO NOT use `npx`; use the CLI only):
```bash
databricks apps init --name <NAME> --features <plugin1>,<plugin2> \
--set <plugin1>.<resourceKey>.<field>=<value> \
--set <plugin2>.<resourceKey>.<field>=<value> \
--description "<DESC>" --run none --profile <PROFILE>
# --run none: skip auto-run after scaffolding (review code first)
# With custom template:
databricks apps init --template <GIT_URL> --name <NAME> --features ... --set ... --profile <PROFILE>
```
- **Required**: `--name`, `--profile`. Name: ≤26 chars, lowercase letters/numbers/hyphens only. Use `--features` only for **optional** plugins the user wants (plugins with `requiredByTemplate: false` or absent); mandatory plugins must not be listed in `--features`.
- **Resources**: Pass `--set` for every required resource (each field in `resources.required`) for (1) all plugins with `requiredByTemplate: true`, and (2) any optional plugins you added to `--features`. Add `--set` for `resources.optional` only when the user requests them.
- **Discovery**: Use the parent `databricks` skill to resolve IDs (e.g. warehouse: `databricks warehouses list --profile <PROFILE>` or `databricks experimental aitools tools get-default-warehouse --profile <PROFILE>`).
**DO NOT guess** plugin names, resource keys, or property names — always derive them from `databricks apps manifest` output. Example: if the manifest shows plugin `analytics` with a required resource `resourceKey: "sql-warehouse"` and `fields: { "id": ... }`, include `--set analytics.sql-warehouse.id=<ID>`.
**READ [AppKit Overview](references/appkit/overview.md)** for project structure, workflow, and pre-implementation checklist.
### Common Scaffolding Mistakes
```bash
# ❌ WRONG: name is NOT a positional argument
databricks apps init --features analytics my-app-name
# → "unknown command" error
# ✅ CORRECT: use --name flag
databricks apps init --name my-app-name --features analytics --set "..." --profile <PROFILE>
```
### Directory Naming
`databricks apps init` creates directories in kebab-case matching the app name.
App names must be lowercase with hyphens only (≤26 chars).
### Other Frameworks
Databricks Apps supports any framework that can run as a web server (Flask, FastAPI, Streamlit, Gradio, etc.). Use standard framework documentation - this skill focuses on AppKit.
# Databricks App Kit SDK
## TypeScript Import Rules
This template uses strict TypeScript settings with `verbatimModuleSyntax: true`. **Always use `import type` for type-only imports**.
Template enforces `noUnusedLocals` - remove unused imports immediately or build fails.
```typescript
// ✅ CORRECT - use import type for types
import type { MyInterface, MyType } from './types';
// ❌ WRONG - will fail compilation
import { MyInterface, MyType } from './types';
```
## Server Setup
For server configuration, see: `npx @databricks/appkit docs ./docs/plugins.md`
## useAnalyticsQuery Hook
**ONLY use when displaying data in a custom way that isn't a chart or table.** For charts/tables, pass `queryKey` directly to the component — don't double-fetch. Charts also accept a `format` option (`"json"` | `"arrow"` | `"auto"`, default `"auto"`) to control the data transfer format.
Use cases:
- Custom HTML layouts (cards, lists, grids)
- Summary statistics and KPIs
- Conditional rendering based on data values
- Data that needs transformation before display
### ⚠️ Memoize Parameters to Prevent Infinite Loops
```typescript
// ❌ WRONG - creates new object every render → infinite refetch loop
const { data } = useAnalyticsQuery('query', { id: sql.string(selectedId) });
// ✅ CORRECT - memoize parameters
const params = useMemo(() => ({ id: sql.string(selectedId) }), [selectedId]);
const { data } = useAnalyticsQuery('query', params);
```
### Conditional Queries
```typescript
// ❌ WRONG - `enabled` is NOT a valid option (this is a React Query pattern)
const { data } = useAnalyticsQuery('query', params, { enabled: !!selectedId });
// ✅ CORRECT - use autoStart: false
const { data } = useAnalyticsQuery('query', params, { autoStart: false });
// ✅ ALSO CORRECT - conditional rendering (component only mounts when data exists)
{selectedId && <DetailsComponent id={selectedId} />}
```
### Type Inference
When `appKitTypes.d.ts` has been generated (via `npm run typegen`), types are inferred automatically:
```typescript
// ✅ After typegen - types are automatic, no generic needed
const { data } = useAnalyticsQuery('my_query', params);
// ⚠️ Before typegen - data is `unknown`, you must provide type manually
const { data } = useAnalyticsQuery<MyRow[]>('my_query', params);
```
**Common mistake** — don't define interfaces that duplicate generated types:
```typescript
// ❌ WRONG - manual interface may conflict with generated QueryRegistry
interface MyData { id: string; value: number; }
const { data } = useAnalyticsQuery<MyData[]>('my_query', params);
// ✅ CORRECT - run `npm run typegen` and let it provide types
const { data } = useAnalyticsQuery('my_query', params);
```
### Basic Usage
```typescript
import { useAnalyticsQuery, Skeleton } from '@databricks/appkit-ui/react';
import { sql } from '@databricks/appkit-ui/js';
import { useMemo } from 'react';
function CustomDisplay() {
const params = useMemo(() => ({
start_date: sql.date('2024-01-01'),
category: sql.string("tools")
}), []);
const { data, loading, error } = useAnalyticsQuery('query_name', params);
if (loading) return <Skeleton className="h-4 w-3/4" />;
if (error) return <div className="text-destructive">Error: {error}</div>;
if (!data) return null;
return (
<div className="grid gap-4">
{data.map(row => (
<div key={row.column_name} className="p-4 border rounded">
<h3>{row.column_name}</h3>
<p>{Number(row.value).toFixed(2)}</p>
</div>
))}
</div>
);
}
```
# Frontend Guidelines
**For full component API**: run `npx @databricks/appkit docs` and navigate to the component you need.
## Common Anti-Patterns
These mistakes appear frequently — check the official docs for actual prop names:
| Mistake | Why it's wrong | What to do |
|---------|---------------|------------|
| `xAxisKey`, `dataKey` on charts | Recharts naming, not AppKit | Use `xKey`, `yKey` (auto-detected from schema if omitted) |
| `yAxisKeys`, `yKeys` on charts | Recharts naming | Use `yKey` (string or string[]) |
| `config` on charts | Not a valid prop name | Use `options` for ECharts overrides |
| `<XAxis>`, `<YAxis>` children | AppKit charts are ECharts-based, NOT Recharts wrappers — configure via props only | |
| `columns` on DataTable | DataTable auto-generates columns from data | Use `queryKey` + `parameters`; use `transform` for formatting |
| Double-fetching with `useAnalyticsQuery` + chart component | Components handle their own fetching | Just pass `queryKey` to the component |
**Always verify props against docs before using a component.**
## Chart Data Modes
All chart/data components support two modes:
- **Query mode**: pass `queryKey` + `parameters` — component fetches data automatically. `parameters` is REQUIRED even if empty (`parameters={{}}`).
- **Data mode**: pass static data via `data` prop (JSON array or Arrow Table) — no `queryKey`/`parameters` needed.
```tsx
// Query mode (recommended for Databricks SQL)
<BarChart queryKey="sales_by_region" parameters={{}} />
// Data mode (static/pre-fetched data)
<BarChart data={myData} xKey="category" yKey="count" />
```
## Chart Props Quick Reference
All charts accept these core props (verify full list via `npx @databricks/appkit docs`):
```tsx
<BarChart
queryKey="sales_by_region" // SQL query filename without .sql
parameters={{}} // query params — REQUIRED in query mode, even if empty
xKey="region" // X axis field (auto-detected from schema if omitted)
yKey="revenue" // Y axis field(s) — string or string[] (auto-detected if omitted)
format="auto" // "json" | "arrow" | "auto" (default: "auto")
transformer={(d) => d} // transform raw data before rendering
colors={['#40d1f5']} // custom colors (overrides colorPalette)
colorPalette="categorical" // "categorical" | "sequential" | "diverging"
title="Sales by Region" // chart title
showLegend // show legend
options={{}} // additional ECharts options to merge
height={400} // default: 300
orientation="vertical" // "vertical" | "horizontal" (BarChart/LineChart/AreaChart)
stacked // stack bars/areas (BarChart/AreaChart)
/>
<LineChart queryKey="monthly_trend" parameters={{}} xKey="month" yKey={["revenue", "expenses"]}
smooth showSymbol={false} />
```
Charts are **ECharts-based** — configure via props, not Recharts-style children. Components handle data fetching, loading, and error states internally.
> ⚠️ **`parameters` is REQUIRED on all data components**, even when the query has no params. Always include `parameters={{}}`.
```typescript
// ❌ Don't double-fetch
const { data } = useAnalyticsQuery('sales_data', {});
return <BarChart queryKey="sales_data" parameters={{}} />; // fetches again!
```
## DataTable
DataTable auto-generates columns from data and handles fetching, loading, error, and empty states.
**For full props**: `npx @databricks/appkit docs "DataTable"`.
```tsx
// ❌ WRONG - missing required `parameters` prop
<DataTable queryKey="my_query" />
// ✅ CORRECT - minimal
<DataTable queryKey="my_query" parameters={{}} />
// ✅ CORRECT - with filtering and pagination
<DataTable
queryKey="my_query"
parameters={{}}
filterColumn="name"
filterPlaceholder="Filter by name..."
pageSize={10}
pageSizeOptions={[10, 25, 50]}
/>
// ✅ CORRECT - with row selection
<DataTable
queryKey="my_query"
parameters={{}}
enableRowSelection
onRowSelectionChange={(selection) => console.log(selection)}
/>
```
**Custom column formatting** — use the `transform` prop or format in SQL:
```typescript
<DataTable
queryKey="products"
parameters={{}}
transform={(data) => data.map(row => ({
...row,
price: `$${Number(row.price).toFixed(2)}`,
}))}
/>
```
## Available Components (Quick Reference)
**For full prop details**: `npx @databricks/appkit docs "appkit-ui API reference"`.
All data components support both query mode (`queryKey` + `parameters`) and data mode (static `data` prop). Common props across all charts: `format`, `transformer`, `colors`, `colorPalette`, `title`, `showLegend`, `height`, `options`, `ariaLabel`, `testId`.
### Data Components (`@databricks/appkit-ui/react`)
| Component | Extra Props | Use For |
|-----------|-------------|---------|
| `BarChart` | `xKey`, `yKey`, `orientation`, `stacked` | Categorical comparisons |
| `LineChart` | `xKey`, `yKey`, `smooth`, `showSymbol`, `orientation` | Time series, trends |
| `AreaChart` | `xKey`, `yKey`, `smooth`, `showSymbol`, `stacked`, `orientation` | Cumulative/stacked trends |
| `PieChart` | `xKey`, `yKey`, `innerRadius`, `showLabels`, `labelPosition` | Part-of-whole |
| `DonutChart` | `xKey`, `yKey`, `innerRadius`, `showLabels`, `labelPosition` | Donut (pie with inner radius) |
| `ScatterChart` | `xKey`, `yKey`, `symbolSize` | Correlation, distribution |
| `HeatmapChart` | `xKey`, `yKey`, `yAxisKey`, `min`, `max`, `showLabels` | Matrix-style data |
| `RadarChart` | `xKey`, `yKey`, `showArea` | Multi-dimensional comparison |
| `DataTable` | `filterColumn`, `filterPlaceholder`, `transform`, `pageSize`, `enableRowSelection`, `children` | Tabular data display |
### UI Components (`@databricks/appkit-ui/react`)
| Component | Common Props |
|-----------|-------------|
| `Card`, `CardHeader`, `CardTitle`, `CardContent` | Standard container |
| `Badge` | `variant`: "default" \| "secondary" \| "destructive" \| "outline" |
| `Button` | `variant`, `size`, `onClick` |
| `Input` | `placeholder`, `value`, `onChange` |
| `Select`, `SelectTrigger`, `SelectContent`, `SelectItem` | Dropdown; `SelectItem` value cannot be "" |
| `Skeleton` | `className` — use for loading states |
| `Separator` | Visual divider |
| `Tabs`, `TabsList`, `TabsTrigger`, `TabsContent` | Tabbed interface |
All data components **require `parameters={{}}`** even when the query has no params.
## Layout Structure
```tsx
<div className="container mx-auto p-4">
<h1 className="text-2xl font-bold mb-4">Page Title</h1>
<form className="space-y-4 mb-8">{/* form inputs */}</form>
<div className="grid gap-4">{/* list items */}</div>
</div>
```
## Component Organization
- Shared UI components: `@databricks/appkit-ui/react`
- Feature components: `client/src/components/FeatureName.tsx`
- Split components when logic exceeds ~100 lines or component is reused
## Gotchas
- `SelectItem` cannot have `value=""`. Use sentinel value like `"all"` for "show all" options.
- Use `<Skeleton>` components instead of plain "Loading..." text
- Handle nullable fields: `value={field || ''}` for inputs
- For maps with React 19, use react-leaflet v5: `npm install react-leaflet@^5.0.0 leaflet @types/leaflet`
Databricks brand colors: `['#40d1f5', '#4462c9', '#EB1600', '#0B2026', '#4A4A4A', '#353a4a']`
# Lakebase: OLTP Database for Apps
Use Lakebase when your app needs **persistent read/write storage** — forms, CRUD operations, user-generated data. For analytics dashboards reading from a SQL warehouse, use `config/queries/` instead.
## When to Use Lakebase vs Analytics
| Pattern | Use Case | Data Source |
|---------|----------|-------------|
| Analytics | Read-only dashboards, charts, KPIs | Databricks SQL Warehouse |
| Lakebase | CRUD operations, persistent state, forms | PostgreSQL (Lakebase Autoscaling) |
| Both | Dashboard with user preferences/saved state | Warehouse + Lakebase |
## Scaffolding
**ALWAYS scaffold with the correct feature flags** — do not add Lakebase manually to an analytics-only scaffold.
**Lakebase only** (no analytics SQL warehouse):
```bash
databricks apps init --name <NAME> --features lakebase \
--set "lakebase.postgres.branch=<BRANCH_NAME>" \
--set "lakebase.postgres.database=<DATABASE_NAME>" \
--run none --profile <PROFILE>
```
**Both Lakebase and analytics**:
```bash
databricks apps init --name <NAME> --features analytics,lakebase \
--set "analytics.sql-warehouse.id=<WAREHOUSE_ID>" \
--set "lakebase.postgres.branch=<BRANCH_NAME>" \
--set "lakebase.postgres.database=<DATABASE_NAME>" \
--run none --profile <PROFILE>
```
Where `<BRANCH_NAME>` and `<DATABASE_NAME>` are full resource names (e.g. `projects/<PROJECT_ID>/branches/<BRANCH_ID>` and `projects/<PROJECT_ID>/branches/<BRANCH_ID>/databases/<DB_ID>`).
Use the `databricks-lakebase` skill to create a Lakebase project and discover branch/database resource names before running this command.
**Get resource names** (if you have an existing project):
```bash
# List branches → use the name field of a READY branch
databricks postgres list-branches projects/<PROJECT_ID> --profile <PROFILE>
# List databases → use the name field
databricks postgres list-databases projects/<PROJECT_ID>/branches/<BRANCH_ID> --profile <PROFILE>
```
## Project Structure (after `databricks apps init --features lakebase`)
```
my-app/
├── server/
│ └── server.ts # Backend with Lakebase pool + tRPC routes
├── client/
│ └── src/
│ └── App.tsx # React frontend
├── app.yaml # Manifest with database resource declaration
└── package.json # Includes @databricks/lakebase dependency
```
Note: **No `config/queries/` directory** — Lakebase apps use server-side `pool.query()` calls, not SQL files.
## `createLakebasePool` API
```typescript
import { createLakebasePool } from "@databricks/lakebase";
// or: import { createLakebasePool } from "@databricks/appkit";
const pool = createLakebasePool({
// All fields optional — auto-populated from env vars when deployed
host: process.env.PGHOST, // Lakebase hostname
database: process.env.PGDATABASE, // Database name
endpoint: process.env.LAKEBASE_ENDPOINT, // Endpoint resource path
user: process.env.PGUSER, // Service principal client ID
max: 10, // Connection pool size
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 10000,
});
```
Call `createLakebasePool()` **once at module level** (server startup), not inside request handlers.
## Environment Variables (auto-set when deployed with database resource)
| Variable | Description |
|----------|-------------|
| `PGHOST` | Lakebase hostname |
| `PGPORT` | Port (default 5432) |
| `PGDATABASE` | Database name |
| `PGUSER` | Service principal client ID |
| `PGSSLMODE` | SSL mode (`require`) |
| `LAKEBASE_ENDPOINT` | Endpoint resource path |
## tRPC CRUD Pattern
Always use tRPC for Lakebase operations — do NOT call `pool.query()` from the client.
```typescript
// server/server.ts
import { initTRPC } from '@trpc/server';
import { createLakebasePool } from "@databricks/lakebase";
import { z } from 'zod';
import superjson from 'superjson'; // requires: npm install superjson
const pool = createLakebasePool(); // reads env vars automatically
const t = initTRPC.create({ transformer: superjson });
const publicProcedure = t.procedure;
export const appRouter = t.router({
listItems: publicProcedure.query(async () => {
const { rows } = await pool.query(
"SELECT * FROM app_data.items ORDER BY created_at DESC LIMIT 100"
);
return rows;
}),
createItem: publicProcedure
.input(z.object({ name: z.string().min(1) }))
.mutation(async ({ input }) => {
const { rows } = await pool.query(
"INSERT INTO app_data.items (name) VALUES ($1) RETURNING *",
[input.name]
);
return rows[0];
}),
deleteItem: publicProcedure
.input(z.object({ id: z.number() }))
.mutation(async ({ input }) => {
await pool.query("DELETE FROM app_data.items WHERE id = $1", [input.id]);
return { success: true };
}),
});
```
## Schema Initialization
**Always create a custom schema** — the Service Principal has `CONNECT_AND_CREATE` permission but **cannot access the `public` schema**. Initialize tables on server startup:
```typescript
// server/server.ts — run once at startup before handling requests
await pool.query(`
CREATE SCHEMA IF NOT EXISTS app_data;
CREATE TABLE IF NOT EXISTS app_data.items (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW()
);
`);
```
## ORM Integration (Optional)
The pool returned by `createLakebasePool()` is a standard `pg.Pool` — works with any PostgreSQL library:
```typescript
// Drizzle ORM
import { drizzle } from "drizzle-orm/node-postgres";
const db = drizzle(pool);
// Prisma (with @prisma/adapter-pg)
import { PrismaPg } from "@prisma/adapter-pg";
const adapter = new PrismaPg(pool);
const prisma = new PrismaClient({ adapter });
```
## Key Differences from Analytics Pattern
| | Analytics | Lakebase |
|--|-----------|---------|
| SQL dialect | Databricks SQL (Spark SQL) | Standard PostgreSQL |
| Query location | `config/queries/*.sql` files | `pool.query()` in tRPC routes |
| Data retrieval | `useAnalyticsQuery` hook | tRPC query procedure |
| Date functions | `CURRENT_TIMESTAMP()`, `DATEDIFF(DAY, ...)` | `NOW()`, `AGE(...)` |
| Auto-increment | N/A | `SERIAL` or `GENERATED ALWAYS AS IDENTITY` |
| Insert pattern | N/A | `INSERT ... VALUES ($1) RETURNING *` |
| Params | Named (`:param`) | Positional (`$1, $2, ...`) |
**NEVER use `useAnalyticsQuery` for Lakebase data** — it queries the SQL warehouse, not Lakebase.
**NEVER put Lakebase SQL in `config/queries/`** — those files are only for warehouse queries.
## Local Development
The Lakebase env vars (`PGHOST`, `PGDATABASE`, etc.) are auto-set only when deployed. For local development, get the connection details from your endpoint and set them manually:
```bash
# Get endpoint connection details
databricks postgres get-endpoint \
projects/<PROJECT_ID>/branches/<BRANCH_ID>/endpoints/<ENDPOINT_ID> \
--profile <PROFILE>
```
Then create `server/.env` with the values from the endpoint response:
```
PGHOST=<host from endpoint>
PGPORT=5432
PGDATABASE=<your database name>
PGUSER=<your service principal client ID>
PGSSLMODE=require
LAKEBASE_ENDPOINT=projects/<PROJECT_ID>/branches/<BRANCH_ID>/endpoints/<ENDPOINT_ID>
```
Load `server/.env` in your dev server (e.g. via `dotenv` or `node --env-file=server/.env`). Never commit `.env` files — add `server/.env` to `.gitignore`.
## Troubleshooting
| Error | Cause | Solution |
|-------|-------|---------|
| `permission denied for schema public` | Service Principal lacks access to `public` | Create custom schema: `CREATE SCHEMA IF NOT EXISTS app_data` |
| `connection refused` | Pool not connected or wrong env vars | Check `PGHOST`, `PGPORT`, `LAKEBASE_ENDPOINT` are set |
| `relation "X" does not exist` | Tables not initialized | Run `CREATE TABLE IF NOT EXISTS` at startup |
| App builds but pool fails at runtime | Env vars not set locally | Set vars in `server/.env` — see Local Development above |
# AppKit Overview
AppKit is the recommended way to build Databricks Apps - provides type-safe SQL queries, React components, and seamless deployment.
## Choose Your Data Pattern FIRST
Before scaffolding, decide which data pattern the app needs:
| Pattern | When to use | Init command |
|---------|-------------|-------------|
| **Analytics** (read-only) | Dashboards, charts, KPIs from warehouse | `--features analytics --set analytics.sql-warehouse.id=<ID>` |
| **Lakebase (OLTP)** (read/write) | CRUD forms, persistent state, user data | `--features lakebase --set lakebase.postgres.branch=<BRANCH> --set lakebase.postgres.database=<DB>` |
| **Both** | Dashboard + user data or preferences | `--features analytics,lakebase` with all required `--set` flags |
See [Lakebase Guide](lakebase.md) for full Lakebase scaffolding and app-code patterns.
## Workflow
1. **Scaffold**: Run `databricks apps manifest`, then `databricks apps init` with `--features` and `--set` as in parent SKILL.md (App Manifest and Scaffolding)
2. **Develop**: `cd <NAME> && npm install && npm run dev`
3. **Validate**: `databricks apps validate`
4. **Deploy**: `databricks apps deploy --profile <PROFILE>`
## Data Discovery (Before Writing SQL)
**Use the parent `databricks` skill for data discovery** (table search, schema exploration, query execution).
## Pre-Implementation Checklist
Before writing App.tsx, complete these steps:
1. ✅ Create SQL files in `config/queries/`
2. ✅ Run `npm run typegen` to generate query types
3. ✅ Read `client/src/appKitTypes.d.ts` to see available query result types
4. ✅ Verify component props via `npx @databricks/appkit docs` (check the relevant component page)
5. ✅ Plan smoke test updates (default expects "Minimal Databricks App")
**DO NOT** write UI code until types are generated and verified.
## Post-Implementation Checklist
Before running `databricks apps validate`:
1. ✅ Update `tests/smoke.spec.ts` heading selector to match your app title
2. ✅ Update or remove the 'hello world' text assertion
3. ✅ Verify `npm run typegen` has been run after all SQL files are finalized
4. ✅ Ensure all numeric SQL values use `Number()` conversion in display code
## Project Structure
```
my-app/
├── server/
│ ├── server.ts # Backend entry point (AppKit)
│ └── .env # Optional local dev env vars (do not commit)
├── client/
│ ├── index.html
│ ├── vite.config.ts
│ └── src/
│ ├── main.tsx
│ └── App.tsx # <- Main app component (start here)
├── config/
│ └── queries/
│ └── my_query.sql # -> queryKey: "my_query"
├── app.yaml # Deployment config
├── package.json
└── tsconfig.json
```
**Key files to modify:**
| Task | File |
|------|------|
| Build UI | `client/src/App.tsx` |
| Add SQL query | `config/queries/<NAME>.sql` |
| Add API endpoint | `server/server.ts` (tRPC) |
| Add shared helpers (optional) | create `shared/types.ts` or `client/src/lib/formatters.ts` |
| Fix smoke test | `tests/smoke.spec.ts` |
## Type Safety
For type generation details, see: `npx @databricks/appkit docs ./docs/development/type-generation.md`
**Quick workflow:**
1. Add/modify SQL in `config/queries/`
2. Types auto-generate during dev via the Vite plugin (or run `npm run typegen` manually)
3. Types appear in `client/src/appKitTypes.d.ts`
## Adding Visualizations
**Step 1**: Create SQL file `config/queries/my_data.sql`
```sql
SELECT category, COUNT(*) as count FROM my_table GROUP BY category
```
**Step 2**: Use component (types auto-generated!)
```typescript
import { BarChart } from '@databricks/appkit-ui/react';
// Query mode: fetches data automatically
<BarChart queryKey="my_data" parameters={{}} />
// Data mode: pass static data directly (no queryKey/parameters needed)
<BarChart data={myData} xKey="category" yKey="count" />
```
## AppKit Official Documentation
**Always use AppKit docs as the source of truth for API details.**
```bash
npx @databricks/appkit docs # show the docs index (start here)
npx @databricks/appkit docs <query> # look up a section by name or doc path
```
Do not guess paths — run without args first, then pick from the index.
## References
| When you're about to... | Read |
|-------------------------|------|
| Write SQL files | [SQL Queries](sql-queries.md) — parameterization, dialect, sql.* helpers |
| Use `useAnalyticsQuery` | [AppKit SDK](appkit-sdk.md) — memoization, conditional queries |
| Add chart/table components | [Frontend](frontend.md) — component quick reference, anti-patterns |
| Add API mutation endpoints | [tRPC](trpc.md) — only if you need server-side logic |
| Use Lakebase for CRUD / persistent state | [Lakebase](lakebase.md) — createLakebasePool, tRPC patterns, schema init |
## Critical Rules
1. **SQL for data retrieval**: Use `config/queries/` + visualization components. Never tRPC for SELECT.
2. **Numeric types**: SQL numbers may return as strings. Always convert: `Number(row.amount)`
3. **Type imports**: Use `import type { ... }` (verbatimModuleSyntax enabled).
4. **Charts are ECharts**: No Recharts children — use props (`xKey`, `yKey`, `colors`). `xKey`/`yKey` auto-detect from schema if omitted.
5. **Two data modes**: Charts/tables support query mode (`queryKey` + `parameters`) and data mode (static `data` prop).
6. **Conditional queries**: Use `autoStart: false` option or conditional rendering to control query execution.
## Decision Tree
- **Display data from SQL?**
- Chart/Table → `BarChart`, `LineChart`, `DataTable` components
- Custom layout (KPIs, cards) → `useAnalyticsQuery` hook
- **Call Databricks API?** → tRPC (serving endpoints, MLflow, Jobs)
- **Modify data?** → tRPC mutations
# SQL Query Files
**IMPORTANT**: ALWAYS use SQL files in `config/queries/` for data retrieval. NEVER use tRPC for SQL queries.
- Store ALL SQL queries in `config/queries/` directory
- Name files descriptively: `trip_statistics.sql`, `user_metrics.sql`, `sales_by_region.sql`
- Reference by filename (without extension) in `useAnalyticsQuery` or directly in a visualization component passing it as `queryKey`
- App Kit automatically executes queries against configured Databricks warehouse
- Benefits: Built-in caching, proper connection pooling, better performance
## Type Generation
For full type generation details, see: `npx @databricks/appkit docs ./docs/development/type-generation.md`
**Type generation:** Types are auto-regenerated during dev whenever SQL files change.
**Quick workflow:** Add SQL files → Types auto-generate during dev → Types appear in `client/src/appKitTypes.d.ts`
## Query Schemas (Optional)
Create `config/queries/schema.ts` only if you need **runtime validation** with Zod.
```typescript
import { z } from 'zod';
export const querySchemas = {
my_query: z.array(
z.object({
category: z.string(),
// Use z.coerce.number() - handles both string and number from SQL
amount: z.coerce.number(),
})
),
};
```
**Why `z.coerce.number()`?**
- Auto-generated types use `number` based on SQL column types
- But some SQL types (DECIMAL, large BIGINT) return as strings at runtime
- `z.coerce.number()` handles both cases safely
## SQL Type Handling (Critical)
**Understanding Type Generation vs Runtime:**
1. **Auto-generated types** (`appKitTypes.d.ts`): Based on SQL column types
- `BIGINT`, `INT`, `DECIMAL` → TypeScript `number`
- These are the types you'll see in IntelliSense
2. **Runtime JSON values**: Some numeric types arrive as strings
- `DECIMAL` often returns as string (e.g., `"123.45"`)
- Large `BIGINT` values return as string
- `ROUND()`, `AVG()`, `SUM()` results may be strings
**Best Practice - Always convert before numeric operations:**
```typescript
// ❌ WRONG - may fail if value is string at runtime
<span>{row.total_amount.toFixed(2)}</span>
// ✅ CORRECT - convert to number first
<span>{Number(row.total_amount).toFixed(2)}</span>
```
**Helper Functions:**
Create app-specific helpers for consistent numeric formatting (for example in `client/src/lib/formatters.ts`):
```typescript
// client/src/lib/formatters.ts
export const toNumber = (value: number | string): number => Number(value);
export const formatCurrency = (value: number | string): string =>
`$${Number(value).toFixed(2)}`;
export const formatPercent = (value: number | string): string =>
`${Number(value).toFixed(1)}%`;
```
Use them wherever you render query results:
```typescript
import { toNumber, formatCurrency, formatPercent } from './formatters'; // adjust import path to your file layout
// Convert to number
const amount = toNumber(row.amount); // "123.45" → 123.45
// Format as currency
const formatted = formatCurrency(row.amount); // "123.45" → "$123.45"
// Format as percentage
const percent = formatPercent(row.rate); // "85.5" → "85.5%"
```
## Available sql.* Helpers
**Full API reference**: `npx @databricks/appkit docs ./docs/api/appkit/Variable.sql.md` — always check this for the latest available helpers.
```typescript
import { sql } from "@databricks/appkit-ui/js";
// ✅ These exist:
sql.string(value) // For STRING parameters
sql.number(value) // For NUMERIC parameters (INT, BIGINT, DOUBLE, DECIMAL)
sql.boolean(value) // For BOOLEAN parameters
sql.date(value) // For DATE parameters (YYYY-MM-DD format)
sql.timestamp(value) // For TIMESTAMP parameters
sql.binary(value) // For BINARY (returns hex string, use UNHEX() in SQL)
// ❌ These DO NOT exist:
// sql.null() - use sentinel values instead
// sql.array() - use comma-separated sql.string() and split in SQL
// sql.int() - use sql.number()
// sql.float() - use sql.number()
```
**For nullable string parameters**, use sentinel values or empty strings. **For nullable date parameters**, use sentinel dates only (empty strings cause validation errors) — see "Optional Date Parameters" section below.
## Databricks SQL Dialect
Databricks uses Databricks SQL (based on Spark SQL), NOT PostgreSQL/MySQL. Common mistakes:
| PostgreSQL | Databricks SQL |
|------------|---------------|
| `GENERATE_SERIES(1, 10)` | `explode(sequence(1, 10))` |
| `DATEDIFF(date1, date2)` | `DATEDIFF(DAY, date2, date1)` (3 args!) |
| `NOW()` | `CURRENT_TIMESTAMP()` |
| `INTERVAL '7 days'` | `INTERVAL 7 DAY` |
| `STRING_AGG(col, ',')` | `CONCAT_WS(',', COLLECT_LIST(col))` |
| `ILIKE` | `LOWER(col) LIKE LOWER(pattern)` |
**Sample data date ranges** — do NOT use `CURRENT_DATE()` on historical datasets:
- `samples.tpch.*` — historical dates, check with `SELECT MIN(o_orderdate), MAX(o_orderdate) FROM samples.tpch.orders`
- `samples.nyctaxi.trips` — NYC taxi data with specific date ranges
- `samples.tpcds.*` — data from 1998-2003
Always check date ranges before writing date-filtered queries.
## Before Running `npm run typegen`
Verify each SQL file before running typegen:
- [ ] Uses Databricks SQL syntax (NOT PostgreSQL) — check dialect table above
- [ ] `DATEDIFF` has 3 arguments: `DATEDIFF(DAY, start, end)`
- [ ] Uses `LOWER(col) LIKE LOWER(pattern)` instead of `ILIKE`
- [ ] Column aliases in `ORDER BY` match `SELECT` aliases exactly
- [ ] Date columns are not passed to numeric functions like `ROUND()`
- [ ] Date range filters use actual data dates (NOT `CURRENT_DATE()` on historical data — check date ranges first)
## Query Parameterization
SQL queries can accept parameters to make them dynamic and reusable.
**Key Points:**
- Parameters use colon prefix: `:parameter_name`
- Databricks infers types from values automatically
- For optional string parameters, use pattern: `(:param = '' OR column = :param)`
- **For optional date parameters, use sentinel dates** (`'1900-01-01'` and `'9999-12-31'`) instead of empty strings
### SQL Parameter Syntax
```sql
-- config/queries/filtered_data.sql
SELECT *
FROM my_table
WHERE column_value >= :min_value
AND column_value <= :max_value
AND category = :category
AND (:optional_filter = '' OR status = :optional_filter)
```
### Frontend Parameter Passing
```typescript
import { sql } from "@databricks/appkit-ui/js";
const { data } = useAnalyticsQuery('filtered_data', {
min_value: sql.number(minValue),
max_value: sql.number(maxValue),
category: sql.string(category),
optional_filter: sql.string(optionalFilter || ''), // empty string for optional params
});
```
### Date Parameters
Use `sql.date()` for date parameters with `YYYY-MM-DD` format strings.
**Frontend - Using Date Parameters:**
```typescript
import { sql } from '@databricks/appkit-ui/js';
import { useState } from 'react';
function MyComponent() {
const [startDate, setStartDate] = useState<string>('2016-02-01');
const [endDate, setEndDate] = useState<string>('2016-02-29');
const queryParams = {
start_date: sql.date(startDate), // Pass YYYY-MM-DD string to sql.date()
end_date: sql.date(endDate),
};
const { data } = useAnalyticsQuery('my_query', queryParams);
// ...
}
```
**SQL - Date Filtering:**
```sql
-- Filter by date range using DATE() function
SELECT COUNT(*) as trip_count
FROM samples.nyctaxi.trips
WHERE DATE(tpep_pickup_datetime) >= :start_date
AND DATE(tpep_pickup_datetime) <= :end_date
```
**Date Helper Functions:**
```typescript
// Helper to get YYYY-MM-DD string for dates relative to today
const daysAgo = (n: number): string => {
const date = new Date(Date.now() - n * 86400000);
return date.toISOString().split('T')[0]; // "2024-01-15"
};
const params = {
start_date: sql.date(daysAgo(7)), // 7 days ago
end_date: sql.date(daysAgo(0)), // Today
};
```
### Optional Date Parameters - Use Sentinel Dates
Databricks App Kit validates parameter types before query execution. **DO NOT use empty strings (`''`) for optional date parameters** as this causes validation errors.
**✅ CORRECT - Use Sentinel Dates:**
```typescript
// Frontend: Use sentinel dates for "no filter" instead of empty strings
const revenueParams = {
group_by: 'month',
start_date: sql.date('1900-01-01'), // Sentinel: effectively no lower bound
end_date: sql.date('9999-12-31'), // Sentinel: effectively no upper bound
country: sql.string(country || ''),
property_type: sql.string(propertyType || ''),
};
```
```sql
-- SQL: Simple comparison since sentinel dates are always valid
WHERE b.check_in >= CAST(:start_date AS DATE)
AND b.check_in <= CAST(:end_date AS DATE)
```
**Why Sentinel Dates Work:**
- `1900-01-01` is before any real data (effectively no lower bound filter)
- `9999-12-31` is after any real data (effectively no upper bound filter)
- Always valid DATE types, so no parameter validation errors
- All real dates fall within this range, so no filtering occurs
**Parameter Types Summary:**
- ALWAYS use sql.* helper functions from the `@databricks/appkit-ui/js` package to define SQL parameters
- **Strings/Numbers**: Use directly in SQL with `:param_name`
- **Dates**: Use with `CAST(:param AS DATE)` in SQL
- **Optional Strings**: Use empty string default, check with `(:param = '' OR column = :param)`
- **Optional Dates**: Use sentinel dates (`sql.date('1900-01-01')` and `sql.date('9999-12-31')`) instead of empty strings
# tRPC for Custom Endpoints
**CRITICAL**: Do NOT use tRPC for SQL queries or data retrieval. Use `config/queries/` + `useAnalyticsQuery` instead.
Use tRPC ONLY for:
- **Mutations**: Creating, updating, or deleting data (INSERT, UPDATE, DELETE)
- **External APIs**: Calling Databricks APIs (serving endpoints, jobs, MLflow, etc.)
- **Complex business logic**: Multi-step operations that cannot be expressed in SQL
- **File operations**: File uploads, processing, transformations
- **Custom computations**: Operations requiring TypeScript/Node.js logic
## Server-side Pattern
```typescript
// server/trpc.ts
import { initTRPC } from '@trpc/server';
import { getExecutionContext } from '@databricks/appkit';
import { z } from 'zod';
import superjson from 'superjson';
const t = initTRPC.create({ transformer: superjson });
const publicProcedure = t.procedure;
export const appRouter = t.router({
// Example: Query a serving endpoint
queryModel: publicProcedure.input(z.object({ prompt: z.string() })).query(async ({ input: { prompt } }) => {
const { serviceDatabricksClient: client } = getExecutionContext();
const response = await client.servingEndpoints.query({
name: 'your-endpoint-name',
messages: [{ role: 'user', content: prompt }],
});
return response;
}),
// Example: Mutation
createRecord: publicProcedure.input(z.object({ name: z.string() })).mutation(async ({ input }) => {
// Custom logic here
return { success: true, id: 123 };
}),
});
```
## Client-side Pattern
```typescript
// client/src/components/MyComponent.tsx
import { trpc } from '@/lib/trpc';
import { useState, useEffect } from 'react';
function MyComponent() {
const [result, setResult] = useState(null);
useEffect(() => {
trpc.queryModel
.query({ prompt: "Hello" })
.then(setResult)
.catch(console.error);
}, []);
const handleCreate = async () => {
await trpc.createRecord.mutate({ name: "test" });
};
return <div>{/* component JSX */}</div>;
}
```
## Decision Tree for Data Operations
1. **Need to display data from SQL?**
- **Chart or Table?** → Use visualization components (`BarChart`, `LineChart`, `DataTable`, etc.)
- **Custom display (KPIs, cards, lists)?** → Use `useAnalyticsQuery` hook
- **Never** use tRPC for SQL SELECT statements
2. **Need to call a Databricks API?** → Use tRPC
- Serving endpoints (model inference)
- MLflow operations
- Jobs API
- Workspace API
3. **Need to modify data?** → Use tRPC mutations
- INSERT, UPDATE, DELETE operations
- Multi-step transactions
- Business logic with side effects
4. **Need non-SQL custom logic?** → Use tRPC
- File processing
- External API calls
- Complex computations in TypeScript
**Summary:**
- ✅ SQL queries → Visualization components or `useAnalyticsQuery`
- ✅ Databricks APIs → tRPC
- ✅ Data mutations → tRPC
- ❌ SQL queries → tRPC (NEVER do this)
# Testing Guidelines
## Unit Tests (Vitest)
**CRITICAL**: Use vitest for all tests. Put tests next to the code (e.g. src/\*.test.ts)
```typescript
import { describe, it, expect } from 'vitest';
describe('Feature Name', () => {
it('should do something', () => {
expect(true).toBe(true);
});
it('should handle async operations', async () => {
const result = await someAsyncFunction();
expect(result).toBeDefined();
});
});
```
**Best Practices:**
- Use `describe` blocks to group related tests
- Use `it` for individual test cases
- Use `expect` for assertions
- Tests run with `npm test` (runs `vitest run`)
❌ **Do not write unit tests for:**
- SQL files under `config/queries/` - little value in testing static SQL
- Types associated with queries - these are just schema definitions
## Smoke Test (Playwright)
The template includes a smoke test at `tests/smoke.spec.ts` that verifies the app loads correctly.
**⚠️ MUST UPDATE after customizing the app:**
- The heading selector checks for `'Minimal Databricks App'` — change it to match your app's actual title
- The text assertion checks for `'hello world'` — update or remove it to match your app's content
- Failing to update these will cause the smoke test to fail on `databricks apps validate`
```typescript
// tests/smoke.spec.ts - update these selectors:
// ⚠️ PLAYWRIGHT STRICT MODE: each selector must match exactly ONE element.
// Use { exact: true }, .first(), or role-based selectors. See "Playwright Strict Mode" below.
// ❌ Template default - will fail after customization
await expect(page.getByRole('heading', { name: 'Minimal Databricks App' })).toBeVisible();
await expect(page.getByText('hello world')).toBeVisible();
// ✅ Update to match YOUR app
await expect(page.getByRole('heading', { name: 'Your App Title' })).toBeVisible();
await expect(page.locator('h1').first()).toBeVisible({ timeout: 30000 }); // Or just check any h1
```
**What the smoke test does:**
- Opens the app
- Waits for data to load (SQL query results)
- Verifies key UI elements are visible
- Captures screenshots and console logs to `.smoke-test/` directory
- Always captures artifacts, even on test failure
## Playwright Strict Mode
Playwright uses strict mode by default — selectors matching multiple elements WILL FAIL.
### Selector Priority (use in this order)
1. ✅ `getByRole('heading', { name: 'Your App Title' })` — headings (most reliable)
2. ✅ `getByRole('button', { name: 'Submit' })` — interactive elements
3. ✅ `getByText('Unique text', { exact: true })` — exact match for unique strings
4. ⚠️ `getByText('Common text').first()` — last resort for repeated text
5. ❌ `getByText('Revenue')` — NEVER without `exact` or `.first()` (strict mode will fail)
**Common mistake**: text like "Revenue" may appear in a heading, a card, AND a description. Always verify your selector targets exactly ONE element.
```typescript
// ❌ FAILS if "Revenue" appears in multiple places (heading + card + description)
await expect(page.getByText('Revenue')).toBeVisible();
// ✅ Use role-based selectors for headings
await expect(page.getByRole('heading', { name: 'Revenue Dashboard' })).toBeVisible();
// ✅ Use exact matching
await expect(page.getByText('Revenue', { exact: true })).toBeVisible();
// ✅ Use .first() as last resort
await expect(page.getByText('Revenue').first()).toBeVisible();
```
**Keep smoke tests simple:**
- Only verify that the app loads and displays initial data
- Wait for key elements to appear (page title, main content)
- Capture artifacts for debugging
- Run quickly (< 5 seconds)
**For extended E2E tests:**
- Create separate test files in `tests/` directory (e.g., `tests/user-flow.spec.ts`)
- Use `npm run test:e2e` to run all Playwright tests
- Keep complex user flows, interactions, and edge cases out of the smoke test