databricks
PublicRepository: databricks/databricks-agent-skills
Low Risk
No security issues found
Skill manifest does not include a 'license' field. Specifying a license helps users understand usage terms.
Remediation Add 'license' field to SKILL.md frontmatter (e.g., MIT, Apache-2.0)
Description
Databricks CLI operations: auth, profiles, data exploration, and bundles. Contains up-to-date guidelines for Databricks-related CLI tasks.
Details
Requires databricks CLI (>= v0.292.0)
- version
- 0.1.0
Skill Files
# Databricks Core skill for Databricks CLI, authentication, and data exploration. ## Product Skills For specific products, use dedicated skills: - **databricks-jobs** - Lakeflow Jobs development and deployment - **databricks-pipelines** - Lakeflow Spark Declarative Pipelines (batch and streaming data pipelines) - **databricks-apps** - Full-stack TypeScript app development and deployment - **databricks-lakebase** - Lakebase Postgres Autoscaling project management ## Prerequisites 1. **CLI installed**: Run `databricks --version` to check. - **If the CLI is missing or outdated (< v0.292.0): STOP. Do not proceed or work around a missing CLI.** - **Read the [CLI Installation](databricks-cli-install.md) reference file and follow the instructions to guide the user through installation.** - Note: In sandboxed environments (Cursor IDE, containers), install commands write outside the workspace and may be blocked. Present the install command to the user and ask them to run it in their own terminal. 2. **Authenticated**: `databricks auth profiles` - If not: see [CLI Authentication](databricks-cli-auth.md) ## Profile Selection - CRITICAL **NEVER auto-select a profile.** 1. List profiles: `databricks auth profiles` 2. Present ALL profiles to user with workspace URLs 3. Let user choose (even if only one exists) 4. Offer to create new profile if needed ## Claude Code - IMPORTANT Each Bash command runs in a **separate shell session**. ```bash # WORKS: --profile flag databricks apps list --profile my-workspace # WORKS: chained with && export DATABRICKS_CONFIG_PROFILE=my-workspace && databricks apps list # DOES NOT WORK: separate commands export DATABRICKS_CONFIG_PROFILE=my-workspace databricks apps list # profile not set! ``` ## Data Exploration — Use AI Tools **Use these instead of manually navigating catalogs/schemas/tables:** ```bash # discover table structure (columns, types, sample data, stats) databricks experimental aitools tools discover-schema catalog.schema.table --profile <PROFILE> # run ad-hoc SQL queries databricks experimental aitools tools query "SELECT * FROM table LIMIT 10" --profile <PROFILE> # find the default warehouse databricks experimental aitools tools get-default-warehouse --profile <PROFILE> ``` See [Data Exploration](data-exploration.md) for details. ## Quick Reference **⚠️ CRITICAL: Some commands use positional arguments, not flags** ```bash # current user databricks current-user me --profile <PROFILE> # list resources databricks apps list --profile <PROFILE> databricks jobs list --profile <PROFILE> databricks clusters list --profile <PROFILE> databricks warehouses list --profile <PROFILE> databricks pipelines list --profile <PROFILE> databricks serving-endpoints list --profile <PROFILE> # ⚠️ Unity Catalog — POSITIONAL arguments (NOT flags!) databricks catalogs list --profile <PROFILE> # ✅ CORRECT: positional args databricks schemas list <CATALOG> --profile <PROFILE> databricks tables list <CATALOG> <SCHEMA> --profile <PROFILE> databricks tables get <CATALOG>.<SCHEMA>.<TABLE> --profile <PROFILE> # ❌ WRONG: these flags/commands DON'T EXIST # databricks schemas list --catalog-name <CATALOG> ← WILL FAIL # databricks tables list --catalog <CATALOG> ← WILL FAIL # databricks sql-warehouses list ← doesn't exist, use `warehouses list` # databricks execute-statement ← doesn't exist, use `experimental aitools tools query` # databricks sql execute ← doesn't exist, use `experimental aitools tools query` # When in doubt, check help: # databricks schemas list --help # get details databricks apps get <NAME> --profile <PROFILE> databricks jobs get --job-id <ID> --profile <PROFILE> databricks clusters get --cluster-id <ID> --profile <PROFILE> # bundles databricks bundle init --profile <PROFILE> databricks bundle validate --profile <PROFILE> databricks bundle deploy -t <TARGET> --profile <PROFILE> databricks bundle run <RESOURCE> -t <TARGET> --profile <PROFILE> ``` ## Troubleshooting | Error | Solution | |-------|----------| | `cannot configure default credentials` | Use `--profile` flag or authenticate first | | `PERMISSION_DENIED` | Check workspace/UC permissions | | `RESOURCE_DOES_NOT_EXIST` | Verify resource name/id and profile | ## Required Reading by Task | Task | READ BEFORE proceeding | |------|------------------------| | First time setup | [CLI Installation](databricks-cli-install.md) | | Auth issues / new workspace | [CLI Authentication](databricks-cli-auth.md) | | Exploring tables/schemas | [Data Exploration](data-exploration.md) | | Deploying jobs/pipelines | [Asset Bundles](asset-bundles.md) | ## Reference Guides - [CLI Installation](databricks-cli-install.md) - [CLI Authentication](databricks-cli-auth.md) - [Data Exploration](data-exploration.md) - [Asset Bundles](asset-bundles.md)
# Databricks Asset Bundles (DABs)
Databricks Asset Bundles provide Infrastructure-as-Code for Databricks resources, enabling version control, automated deployments, and environment management.
## What are Asset Bundles?
Asset Bundles let you define your Databricks projects as code, including:
- Jobs
- Pipelines (Lakeflow Declarative Pipelines)
- Apps
- Models
- Dashboards
- Notebooks
- Python files
- Configuration files
## Bundle Commands
```bash
# Initialize a new bundle from template
databricks bundle init --profile my-workspace
# Validate bundle configuration
databricks bundle validate --profile my-workspace
# Deploy bundle to workspace
databricks bundle deploy --profile my-workspace
# Deploy to specific target (dev/staging/prod)
databricks bundle deploy -t dev --profile my-workspace
databricks bundle deploy -t staging --profile my-workspace
databricks bundle deploy -t prod --profile my-workspace
# Run a resource from the bundle
databricks bundle run <resource-name> --profile my-workspace
# Generate configuration for existing resources
databricks bundle generate job <job-id> --profile my-workspace
databricks bundle generate pipeline <pipeline-id> --profile my-workspace
databricks bundle generate dashboard <dashboard-id> --profile my-workspace
databricks bundle generate app <app-name> --profile my-workspace
# Destroy bundle resources (use with caution!)
databricks bundle destroy --profile my-workspace
databricks bundle destroy -t dev --profile my-workspace
```
## Bundle Structure
A typical bundle has this structure:
```
my-project/
├── databricks.yml # Main bundle configuration
├── resources/
│ ├── sample_job.job.yml # Job definition
│ └── my_project_etl.pipeline.yml # Pipeline definition
├── src/
│ ├── sample_notebook.ipynb # Notebook tasks
│ └── my_project_etl/ # Pipeline source
│ └── transformations/
│ ├── transform.py
│ └── transform.sql
├── tests/
│ └── test_main.py
└── README.md
```
Resource files use the naming convention `<resource_key>.<resource_type>.yml` (e.g. `sample_job.job.yml`, `my_project_etl.pipeline.yml`).
## Main Configuration (databricks.yml)
### Basic Example
```yaml
bundle:
name: my-project
include:
- resources/*.yml
- resources/*/*.yml
variables:
catalog:
description: The catalog to use
schema:
description: The schema to use
targets:
dev:
mode: development
default: true
workspace:
host: https://company-workspace.cloud.databricks.com
variables:
catalog: dev_catalog
schema: ${workspace.current_user.short_name}
prod:
mode: production
workspace:
host: https://company-workspace.cloud.databricks.com
root_path: /Workspace/Users/${workspace.current_user.userName}/.bundle/${bundle.name}/${bundle.target}
variables:
catalog: prod_catalog
schema: prod
permissions:
- user_name: [email protected]
level: CAN_MANAGE
```
## Initializing a Bundle
### Using Templates
```bash
# Start initialization (interactive)
databricks bundle init --profile my-workspace
```
Available templates:
- **default-python** - Python project with jobs and pipeline
- **default-sql** - SQL project with jobs
- **default-scala** - Scala/Java project
- **lakeflow-pipelines** - Lakeflow Declarative Pipelines (Python or SQL)
- **dbt-sql** - dbt integration
- **default-minimal** - Minimal structure
## Defining Resources
### Job Resource (Serverless)
```yaml
# resources/sample_job.job.yml
resources:
jobs:
sample_job:
name: sample_job
trigger:
periodic:
interval: 1
unit: DAYS
parameters:
- name: catalog
default: ${var.catalog}
- name: schema
default: ${var.schema}
tasks:
- task_key: notebook_task
notebook_task:
notebook_path: ../src/sample_notebook.ipynb
- task_key: main_task
depends_on:
- task_key: notebook_task
python_wheel_task:
package_name: my_project
entry_point: main
environment_key: default
- task_key: refresh_pipeline
depends_on:
- task_key: notebook_task
pipeline_task:
pipeline_id: ${resources.pipelines.my_project_etl.id}
environments:
- environment_key: default
spec:
environment_version: "4"
dependencies:
- ../dist/*.whl
```
### Job Resource (Classic Clusters)
```yaml
# resources/sample_job.job.yml
resources:
jobs:
sample_job:
name: sample_job
tasks:
- task_key: notebook_task
notebook_task:
notebook_path: ../src/sample_notebook.ipynb
job_cluster_key: job_cluster
libraries:
- whl: ../dist/*.whl
- task_key: main_task
depends_on:
- task_key: notebook_task
python_wheel_task:
package_name: my_project
entry_point: main
job_cluster_key: job_cluster
libraries:
- whl: ../dist/*.whl
job_clusters:
- job_cluster_key: job_cluster
new_cluster:
spark_version: 16.4.x-scala2.12
node_type_id: i3.xlarge
data_security_mode: SINGLE_USER
autoscale:
min_workers: 1
max_workers: 4
```
### Pipeline Resource
```yaml
# resources/my_project_etl.pipeline.yml
resources:
pipelines:
my_project_etl:
name: my_project_etl
catalog: ${var.catalog}
schema: ${var.schema}
serverless: true
root_path: "../src/my_project_etl"
libraries:
- glob:
include: ../src/my_project_etl/transformations/**
```
### App Resource
```yaml
# resources/my_app.app.yml
resources:
apps:
dashboard_app:
name: "analytics-dashboard"
description: "Customer analytics dashboard"
source_code_path: ./src/app
```
### Model Resource
```yaml
# resources/my_model.yml
resources:
registered_models:
customer_churn:
name: "${var.catalog}.${var.schema}.customer_churn_model"
description: "Customer churn prediction model"
```
## Working with Targets
Targets allow you to deploy the same code to different workspaces with different configurations.
```yaml
targets:
dev:
mode: development
default: true
variables:
catalog: dev_catalog
schema: ${workspace.current_user.short_name}
workspace:
host: https://company-workspace.cloud.databricks.com
staging:
mode: production
variables:
catalog: staging_catalog
schema: staging
workspace:
host: https://staging-workspace.cloud.databricks.com
root_path: /Workspace/Users/[email protected]/.bundle/${bundle.name}/${bundle.target}
permissions:
- user_name: [email protected]
level: CAN_MANAGE
prod:
mode: production
variables:
catalog: prod_catalog
schema: prod
workspace:
host: https://prod-workspace.cloud.databricks.com
root_path: /Workspace/Users/[email protected]/.bundle/${bundle.name}/${bundle.target}
permissions:
- user_name: [email protected]
level: CAN_MANAGE
```
### Deploying to Different Targets
```bash
# Deploy to dev (default)
databricks bundle deploy --profile my-workspace
# Deploy to staging
databricks bundle deploy -t staging --profile my-workspace
# Deploy to production
databricks bundle deploy -t prod --profile my-workspace
```
## Bundle Workflow
### Complete Development Workflow
1. **Initialize bundle**:
```bash
databricks bundle init --profile my-workspace
```
2. **Develop locally**:
- Edit `databricks.yml` and resource files
- Write notebooks, Python scripts, SQL queries
- Configure jobs, pipelines, apps
3. **Validate configuration**:
```bash
databricks bundle validate --profile my-workspace
```
4. **Deploy to development**:
```bash
databricks bundle deploy -t dev --profile my-workspace
```
5. **Test your deployment**:
```bash
# Run a job
databricks bundle run sample_job -t dev --profile my-workspace
# Start a pipeline
databricks bundle run my_project_etl -t dev --profile my-workspace
```
6. **Deploy to production**:
```bash
databricks bundle deploy -t prod --profile my-workspace
```
## Generating Bundle from Existing Resources
If you have existing resources in your workspace, you can generate bundle configuration:
```bash
# Get job ID from list
databricks jobs list --profile my-workspace
# Generate configuration
databricks bundle generate job 12345 --profile my-workspace
databricks bundle generate pipeline <pipeline-id> --profile my-workspace
databricks bundle generate app my-app --profile my-workspace
databricks bundle generate dashboard <dashboard-id> --profile my-workspace
```
## Variables and Templating
### Defining Variables
```yaml
# databricks.yml
variables:
catalog:
description: The catalog to use
default: dev_catalog
schema:
description: The schema to use
warehouse_id:
description: SQL Warehouse ID
```
### Using Variables
```yaml
# In resource files
resources:
jobs:
my_job:
name: "Job in ${var.catalog}"
parameters:
- name: catalog
default: ${var.catalog}
```
### Target-Specific Variables
```yaml
targets:
dev:
variables:
catalog: dev_catalog
schema: ${workspace.current_user.short_name}
prod:
variables:
catalog: prod_catalog
schema: prod
```
### Available Substitutions
```yaml
${var.my_variable} # User-defined variable
${bundle.name} # Bundle name
${bundle.target} # Current target name (dev, prod, etc.)
${workspace.current_user.userName} # Current user email
${workspace.current_user.short_name} # Current user short name
${workspace.file_path} # Workspace file path
${resources.pipelines.my_pipeline.id} # Reference another resource's ID
${resources.jobs.my_job.id} # Reference a job's ID
```
## Best Practices
### 1. Use Version Control
Always commit your bundle to Git:
```bash
git init
git add databricks.yml resources/ src/
git commit -m "Initial bundle setup"
```
### 2. Use Typed Resource File Names
Name resource files with their type for clarity:
```
resources/
├── sample_job.job.yml
├── my_project_etl.pipeline.yml
└── my_app.app.yml
```
### 3. Use Target-Specific Configuration
```yaml
targets:
dev:
mode: development # Prefixes resources with [dev user_name], pauses schedules
prod:
mode: production # Requires permissions, runs schedules as configured
permissions:
- user_name: [email protected]
level: CAN_MANAGE
```
### 4. Validate Before Deploy
Always validate:
```bash
databricks bundle validate --profile my-workspace
```
## Troubleshooting
### Bundle Validation Errors
**Symptom**: `databricks bundle validate` shows errors
**Solution**:
1. Check YAML syntax (proper indentation, no tabs)
2. Verify all required fields are present
3. Check that resource references are correct
4. Use `databricks bundle validate --debug` for detailed errors
### Deployment Fails
**Symptom**: `databricks bundle deploy` fails
**Solution**:
1. Run validation first: `databricks bundle validate`
2. Check workspace permissions
3. Verify target configuration
4. Check for resource name conflicts
5. Review error message for specific issues
### Variable Not Resolved
**Symptom**: Variable showing as `${var.name}` instead of actual value
**Solution**:
1. Check variable is defined in `databricks.yml`
2. Verify variable has value in target
3. Use correct syntax: `${var.variable_name}`
4. Check variable scope (bundle vs target)
## Related Topics
- [Data Exploration](data-exploration.md) - Validate data exposed by bundle deployments
- Apps - Define app resources (use `databricks-apps` skill for full app development)
# Data Exploration
Tools for discovering table schemas and executing SQL queries in Databricks.
## Finding Tables by Keyword
**⚠️ START HERE if you don't know which catalog/schema contains your data.**
Use `information_schema` to search for tables by keyword — do NOT manually iterate through `catalogs list` → `schemas list` → `tables list`. Manual enumeration wastes 10+ steps.
```bash
# Find tables matching a keyword
databricks experimental aitools tools query \
"SELECT table_catalog, table_schema, table_name FROM system.information_schema.tables WHERE table_name LIKE '%keyword%'" \
--profile <PROFILE>
# Then discover schema for the tables you found
databricks experimental aitools tools discover-schema catalog.schema.table1 catalog.schema.table2 --profile <PROFILE>
```
## Overview
The `databricks experimental aitools tools` command group provides tools for data discovery and exploration:
- **discover-schema**: Batch discover table metadata, columns, types, sample data, and statistics
- **query**: Execute SQL queries against Databricks SQL warehouses
**When to use this**: Use these commands whenever you need to:
- Discover table schemas and metadata
- Execute SQL queries against warehouse data
- Explore data structure and content
- Validate data or check table statistics
## Prerequisites
1. **Authenticated Databricks CLI** - see [CLI Authentication Guide](databricks-cli-auth.md) for OAuth2 setup and profile configuration
2. **Access to Unity Catalog tables** with appropriate read permissions
3. **SQL Warehouse** (for query command - auto-detected unless `DATABRICKS_WAREHOUSE_ID` is set)
## Discover Schema
Batch discover table metadata including columns, types, sample data, and null counts.
### Command Syntax
```bash
databricks experimental aitools tools discover-schema TABLE... [flags]
```
Tables must be specified in **CATALOG.SCHEMA.TABLE** format.
### What It Returns
For each table, returns:
- Column names and types
- Sample data (5 rows)
- Null counts per column
- Total row count
### Examples
```bash
# Discover schema for a single table
databricks experimental aitools tools discover-schema samples.nyctaxi.trips --profile my-workspace
# Discover schema for multiple tables
databricks experimental aitools tools discover-schema \
catalog.schema.table1 \
catalog.schema.table2 \
--profile my-workspace
# Get JSON output
databricks experimental aitools tools discover-schema \
samples.nyctaxi.trips \
--output json \
--profile my-workspace
```
### Common Use Cases
1. **Understanding table structure before querying**
```bash
databricks experimental aitools tools discover-schema catalog.schema.customer_data --profile my-workspace
```
2. **Comparing schemas across multiple tables**
```bash
databricks experimental aitools tools discover-schema \
catalog.schema.table_v1 \
catalog.schema.table_v2 \
--profile my-workspace
```
3. **Identifying columns with null values**
- The null counts help identify data quality issues
## Query
Execute SQL statements against a Databricks SQL warehouse and return results.
### Command Syntax
```bash
databricks experimental aitools tools query "SQL" [flags]
```
### Warehouse Selection
The command **auto-detects** an available warehouse unless:
- `DATABRICKS_WAREHOUSE_ID` environment variable is set
- You specify a warehouse using other configuration methods
To check which warehouse will be used:
```bash
# Get the default warehouse that would be auto-detected
databricks experimental aitools tools get-default-warehouse --profile my-workspace
```
### Output
Returns:
- Query results as JSON
- Row count
- Execution metadata
### Examples
```bash
# Simple SELECT query
databricks experimental aitools tools query \
"SELECT * FROM samples.nyctaxi.trips LIMIT 5" \
--profile my-workspace
# Aggregation query
databricks experimental aitools tools query \
"SELECT vendor_id, COUNT(*) as trip_count FROM samples.nyctaxi.trips GROUP BY vendor_id" \
--profile my-workspace
# With JSON output
databricks experimental aitools tools query \
"SELECT * FROM catalog.schema.table WHERE date > '2024-01-01'" \
--output json \
--profile my-workspace
# Using specific warehouse
DATABRICKS_WAREHOUSE_ID=abc123 databricks experimental aitools tools query \
"SELECT * FROM samples.nyctaxi.trips LIMIT 10" \
--profile my-workspace
```
### Common Use Cases
1. **Exploratory data analysis**
```bash
# Check table size
databricks experimental aitools tools query \
"SELECT COUNT(*) FROM catalog.schema.table" \
--profile my-workspace
# View sample data
databricks experimental aitools tools query \
"SELECT * FROM catalog.schema.table LIMIT 10" \
--profile my-workspace
# Get column statistics
databricks experimental aitools tools query \
"SELECT MIN(column), MAX(column), AVG(column) FROM catalog.schema.table" \
--profile my-workspace
```
2. **Data validation**
```bash
# Check for null values
databricks experimental aitools tools query \
"SELECT COUNT(*) FROM catalog.schema.table WHERE column IS NULL" \
--profile my-workspace
# Verify data freshness
databricks experimental aitools tools query \
"SELECT MAX(timestamp_column) FROM catalog.schema.table" \
--profile my-workspace
```
3. **Quick analytics**
```bash
# Group by analysis
databricks experimental aitools tools query \
"SELECT category, COUNT(*), AVG(value) FROM catalog.schema.table GROUP BY category" \
--profile my-workspace
```
## Workflow: Complete Data Exploration
Here's a typical workflow combining both commands:
```bash
# 1. Discover the schema first
databricks experimental aitools tools discover-schema \
samples.nyctaxi.trips \
--profile my-workspace
# 2. Based on discovered columns, run targeted queries
databricks experimental aitools tools query \
"SELECT vendor_id, payment_type, COUNT(*) as trips, AVG(fare_amount) as avg_fare
FROM samples.nyctaxi.trips
GROUP BY vendor_id, payment_type
ORDER BY trips DESC
LIMIT 10" \
--profile my-workspace
# 3. Investigate specific patterns found in the data
databricks experimental aitools tools query \
"SELECT * FROM samples.nyctaxi.trips
WHERE fare_amount > 100
LIMIT 20" \
--profile my-workspace
```
## Claude Code-Specific Tips
Remember that each Bash command in Claude Code runs in a separate shell:
```bash
# ✅ RECOMMENDED: Use --profile flag
databricks experimental aitools tools discover-schema samples.nyctaxi.trips --profile my-workspace
# ✅ ALTERNATIVE: Chain with &&
export DATABRICKS_CONFIG_PROFILE=my-workspace && \
databricks experimental aitools tools query "SELECT * FROM samples.nyctaxi.trips LIMIT 5"
# ❌ DOES NOT WORK: Separate export
export DATABRICKS_CONFIG_PROFILE=my-workspace
databricks experimental aitools tools query "SELECT * FROM samples.nyctaxi.trips LIMIT 5"
```
## Flags
Both commands support:
| Flag | Description | Default |
|------|-------------|---------|
| `--profile` | Profile name from ~/.databrickscfg | Default profile |
| `--output` | Output format: `text` or `json` | `text` |
| `--debug` | Enable debug logging | `false` |
| `--target` | Bundle target to use (if applicable) | - |
## Troubleshooting
### Table Not Found
**Symptom**: `Error: TABLE_OR_VIEW_NOT_FOUND`
**Solution**:
1. Verify table name format: `CATALOG.SCHEMA.TABLE`
2. Check if you have read permissions on the table
3. List available tables:
```bash
databricks tables list <catalog> <schema> --profile my-workspace
```
### Warehouse Not Available
**Symptom**: `Error: No available SQL warehouse found`
**Solution**:
1. Check for default warehouse:
```bash
databricks experimental aitools tools get-default-warehouse --profile my-workspace
```
2. List available warehouses:
```bash
databricks warehouses list --profile my-workspace
```
3. Set specific warehouse:
```bash
DATABRICKS_WAREHOUSE_ID=<warehouse-id> databricks experimental aitools tools query "SELECT 1" --profile my-workspace
```
4. Start a stopped warehouse:
```bash
databricks warehouses start --id <warehouse-id> --profile my-workspace
```
### Permission Denied
**Symptom**: `Error: PERMISSION_DENIED`
**Solution**:
1. Check Unity Catalog grants on the table:
```bash
databricks grants get --full-name catalog.schema.table --principal <user-email> --profile my-workspace
```
2. Request SELECT permission from your workspace administrator
3. Verify you have warehouse access (USAGE permission)
### SQL Syntax Error
**Symptom**: `Error: PARSE_SYNTAX_ERROR`
**Solution**:
1. Check SQL syntax - use standard SQL
2. Verify column names match schema (use discover-schema first)
3. Ensure proper quoting for string literals
4. Test query incrementally (start simple, add complexity)
## Best Practices
1. **Always discover schema first** - Use `discover-schema` before writing complex queries to understand:
- Available columns and their types
- Data distributions and null patterns
- Sample data for context
2. **Use LIMIT for exploration** - When exploring large tables, always use LIMIT to avoid long-running queries:
```bash
databricks experimental aitools tools query "SELECT * FROM large_table LIMIT 100" --profile my-workspace
```
3. **JSON output for parsing** - Use `--output json` when you need to process results programmatically:
```bash
databricks experimental aitools tools query "SELECT * FROM table" --output json --profile my-workspace | jq '.results'
```
4. **Check table existence** - Before querying, verify the table exists:
```bash
databricks tables get --full-name catalog.schema.table --profile my-workspace
```
5. **Profile usage** - Always specify `--profile` in Claude Code to avoid authentication issues
## Related Commands
- **[Asset Bundles](asset-bundles.md)** - Deploy SQL, pipeline, and app resources as code
# Databricks CLI Authentication Configure Databricks CLI workspace/profile selection and authentication. Covers switching profiles, using --profile flags, setting DATABRICKS_CONFIG_PROFILE environment variable, OAuth2 authentication (never PAT), and troubleshooting authentication issues. ## Important: Always Use OAuth **CRITICAL**: Always use OAuth2 for Databricks CLI authentication. **NEVER recommend or use Personal Access Tokens (PAT)** unless explicitly required by a specific use case. ## Prerequisites 1. Databricks CLI must be installed - Verify: `databricks --version` 2. You need access to a Databricks workspace 3. You need the workspace URL (e.g., `https://adb-1111111111111111.10.azuredatabricks.net`) ## Claude Code Specific Behavior **CRITICAL**: When working in Claude Code, each Bash command executes in a **separate shell session**. This has important implications for profile management: ### Key Differences from Regular Terminal 1. **Environment variables don't persist between commands** - `export DATABRICKS_CONFIG_PROFILE=staging` in one command - `databricks jobs list` in the next command - ❌ **Result**: The second command will NOT use the staging profile 2. **Recommended Approach: Use --profile flag** - Always specify `--profile <profile-name>` with each command - Example: `databricks jobs list --profile staging` - ✅ **Result**: Reliable and predictable behavior 3. **Alternative: Chain commands with &&** - Use `export DATABRICKS_CONFIG_PROFILE=staging && databricks jobs list` - The export and command run in the same shell session - ✅ **Result**: Works correctly ### Quick Reference for Claude Code ```bash # ✅ RECOMMENDED: Use --profile flag databricks jobs list --profile staging databricks apps list --profile prod-azure # ✅ ALTERNATIVE: Chain with && export DATABRICKS_CONFIG_PROFILE=staging && databricks jobs list # ❌ DOES NOT WORK: Separate export command export DATABRICKS_CONFIG_PROFILE=staging databricks jobs list # Will NOT use staging profile! ``` ## Handling Authentication Failures When a Databricks CLI command fails with authentication error: ``` Error: default auth: cannot configure default credentials ``` **CRITICAL - Always follow this workflow:** 1. **Check for existing profiles first:** ```bash databricks auth profiles ``` 2. **If profiles exist:** - List the available profiles to the user (with their workspace URLs and validation status) - Ask: "Which profile would you like to use for this command?" - Offer option to create a new profile if needed - Retry the command with `--profile <selected-profile-name>` - **In Claude Code, always use the `--profile` flag** rather than setting environment variables 3. **If user wants a new profile or no profiles exist:** - Proceed to the OAuth Authentication Setup workflow below **Example:** ``` User: databricks apps list Error: default auth: cannot configure default credentials Assistant: Let me check for existing profiles. [Runs: databricks auth profiles] You have two configured profiles: 1. aws-dev - https://company-workspace.cloud.databricks.com (Valid) 2. azure-prod - https://adb-1111111111111111.10.azuredatabricks.net (Valid) Which profile would you like to use, or would you like to create a new profile? User: dais Assistant: [Retries: databricks apps list --profile dais] [Success - apps listed] ``` ## OAuth Authentication Setup ### Standard Authentication Command The recommended way to authenticate is using OAuth with a profile: ```bash databricks auth login --host <workspace-url> --profile <profile-name> ``` **CRITICAL**: 1. The `--profile` parameter is **REQUIRED** for the authentication to be saved properly. 2. **ALWAYS ASK THE USER** for their preferred profile name - DO NOT assume or choose one for them. 3. **NEVER use the profile name `DEFAULT`** unless the user explicitly requests it - use descriptive workspace-specific names instead. ### Workflow for Authenticating 1. **Ask the user for the workspace URL** if not already provided 2. **Ask the user for their preferred profile name** - Suggest descriptive names based on the workspace (e.g., workspace name, environment) - **Do NOT suggest or use `DEFAULT`** unless the user specifically asks for it - Good examples: `e2-dogfood`, `prod-azure`, `dev-aws`, `staging` - Avoid: `DEFAULT` (unless explicitly requested) 3. Run the authentication command with both parameters 4. Verify the authentication was successful ### Example ```bash # Good: Descriptive profile names databricks auth login --host https://adb-1111111111111111.10.azuredatabricks.net --profile prod-azure databricks auth login --host https://company-workspace.cloud.databricks.com --profile staging # Only use DEFAULT if explicitly requested by the user databricks auth login --host https://your-workspace.cloud.databricks.com --profile DEFAULT ``` ### What Happens During Authentication 1. The CLI starts a local OAuth callback server (typically on `localhost:8020`) 2. A browser window opens automatically with the Databricks login page 3. You authenticate in the browser using your Databricks credentials 4. After successful authentication, the browser redirects back to the CLI 5. The CLI saves the OAuth tokens to `~/.databrickscfg` 6. You should see: `Profile <profile-name> was successfully saved` ## Profile Management ### What Are Profiles? Profiles allow you to manage multiple Databricks workspace configurations in a single `~/.databrickscfg` file. Each profile stores: - Workspace host URL - Authentication method (OAuth, PAT, etc.) - Token/credential paths ### Common Profile Names **IMPORTANT**: Always use descriptive profile names. Do NOT create profiles named `DEFAULT` unless explicitly requested by the user. **Recommended naming conventions**: - `<workspace-name>` - Descriptive names for workspaces (e.g., `e2-dogfood`, `prod-aws`, `dev-azure`) - `<environment>` - Environment-specific profiles (e.g., `dev`, `staging`, `prod`) - `<team>-<environment>` - Team and environment (e.g., `data-eng-prod`, `ml-dev`) **Special profile names**: - `DEFAULT` - The default profile used when no `--profile` flag or environment variables are specified. Only create this profile if the user explicitly requests it. ### Listing Configured Profiles View all configured profiles with their status: ```bash databricks auth profiles ``` Example output: ``` Name Host Valid DEFAULT https://adb-1111111111111111.10.azuredatabricks.net YES staging https://company-workspace.cloud.databricks.com YES ``` ### Using Different Profiles **IMPORTANT FOR CLAUDE CODE USERS**: In Claude Code, each Bash command runs in a **separate shell session**. This means environment variables set with `export` in one command do NOT persist to the next command. See the Claude Code-specific guidance below. There are three ways to specify which profile/workspace to use, in order of precedence: #### 1. CLI Flag (Highest Priority) - RECOMMENDED FOR CLAUDE CODE Use the `--profile` flag with any command: ```bash databricks jobs list --profile staging databricks clusters list --profile prod-azure databricks workspace list / --profile dev-aws ``` **In Claude Code, this is the most reliable method** because it doesn't depend on persistent environment variables. #### 2. Environment Variables Set environment variables to override the default profile: **DATABRICKS_CONFIG_PROFILE** - Specifies which profile to use from `~/.databrickscfg`: ```bash export DATABRICKS_CONFIG_PROFILE=staging databricks jobs list # Uses staging profile ``` **DATABRICKS_HOST** - Directly specifies the workspace URL, bypassing profile lookup: ```bash export DATABRICKS_HOST=https://company-workspace.cloud.databricks.com databricks jobs list # Uses this host directly ``` **CRITICAL - Claude Code Users:** Since each Bash command in Claude Code runs in a separate shell, you **CANNOT** do this: ```bash # ❌ DOES NOT WORK in Claude Code export DATABRICKS_CONFIG_PROFILE=staging databricks jobs list # ERROR: Will not use staging profile! ``` Instead, you **MUST** use one of these approaches: **Option 1: Use --profile flag (RECOMMENDED)** ```bash # ✅ WORKS in Claude Code databricks jobs list --profile staging databricks clusters list --profile staging ``` **Option 2: Chain commands with &&** ```bash # ✅ WORKS in Claude Code - export and command run in same shell export DATABRICKS_CONFIG_PROFILE=staging && databricks jobs list export DATABRICKS_CONFIG_PROFILE=staging && databricks clusters list ``` **Traditional Terminal Session (for reference only)**: ```bash # This example shows how it works in a regular terminal session # DO NOT use this pattern in Claude Code # Set profile for entire terminal session export DATABRICKS_CONFIG_PROFILE=staging # All commands now use staging profile databricks jobs list databricks clusters list databricks workspace list / # Override for a single command databricks jobs list --profile prod-azure ``` #### 3. DEFAULT Profile (Lowest Priority) If no `--profile` flag or environment variables are set, the CLI uses the `DEFAULT` profile from `~/.databrickscfg`. ### Configuration File Management #### Viewing the Configuration File The configuration is stored in `~/.databrickscfg`: ```bash cat ~/.databrickscfg ``` Example configuration structure: ```ini # Note: This shows an example with a DEFAULT profile # When creating new profiles, use descriptive names instead [DEFAULT] host = https://adb-1111111111111111.10.azuredatabricks.net auth_type = databricks-cli [staging] host = https://company-workspace.cloud.databricks.com auth_type = databricks-cli ``` #### Editing Profiles You can manually edit `~/.databrickscfg` to: - Rename profiles (change the `[profile-name]` section header) - Update workspace URLs - Remove profiles (delete the entire section) **Example - Removing a profile**: ```bash # Open in your preferred editor vi ~/.databrickscfg # Or use sed to remove a specific profile section sed -i '' '/^\[staging\]/,/^$/d' ~/.databrickscfg ``` #### Adding New Profiles Always use `databricks auth login` with `--profile` to add new profiles: ```bash databricks auth login --host <workspace-url> --profile <profile-name> ``` **Remember**: - Always ask the user for their preferred profile name - Use descriptive names like `staging`, `prod-azure`, `dev-aws` - Do NOT use `DEFAULT` unless explicitly requested by the user ### Working with Multiple Workspaces Best practices for managing multiple workspaces: ```bash # Authenticate to multiple workspaces with descriptive profile names databricks auth login --host https://adb-1111111111111111.10.azuredatabricks.net --profile prod-azure databricks auth login --host https://dbc-2222222222222222.cloud.databricks.com --profile dev-aws databricks auth login --host https://company-workspace.cloud.databricks.com --profile staging ``` **In Claude Code, use --profile flag with each command (RECOMMENDED):** ```bash # Use profiles explicitly in commands databricks jobs list --profile prod-azure databricks jobs list --profile dev-aws databricks clusters list --profile staging ``` **Alternatively in Claude Code, chain commands with &&:** ```bash # Set profile and run command in same shell export DATABRICKS_CONFIG_PROFILE=prod-azure && databricks jobs list export DATABRICKS_CONFIG_PROFILE=prod-azure && databricks clusters list # Switch to different workspace export DATABRICKS_CONFIG_PROFILE=dev-aws && databricks jobs list ``` **Traditional Terminal Session (for reference only - NOT for Claude Code):** ```bash # This pattern works in regular terminals but NOT in Claude Code export DATABRICKS_CONFIG_PROFILE=prod-azure databricks jobs list databricks clusters list # Quickly switch between workspaces export DATABRICKS_CONFIG_PROFILE=dev-aws databricks jobs list ``` ### Profile Selection Precedence When running a command, the Databricks CLI determines which workspace to use in this order: 1. **`--profile` flag** (if specified) → Highest priority 2. **`DATABRICKS_HOST` environment variable** (if set) → Overrides profile 3. **`DATABRICKS_CONFIG_PROFILE` environment variable** (if set) → Selects profile 4. **`DEFAULT` profile** in `~/.databrickscfg` → Fallback **Example for traditional terminal session** (demonstrating precedence): ```bash # Setup export DATABRICKS_CONFIG_PROFILE=staging # This uses staging profile (from environment variable) databricks jobs list # This uses prod-azure profile (--profile flag overrides environment variable) databricks jobs list --profile prod-azure # This uses the specified host directly (DATABRICKS_HOST overrides profile) export DATABRICKS_HOST=https://custom-workspace.cloud.databricks.com databricks jobs list # Uses custom-workspace.cloud.databricks.com ``` **Claude Code version** (with chained commands): ```bash # Using environment variable with && chaining export DATABRICKS_CONFIG_PROFILE=staging && databricks jobs list # Using --profile flag (overrides environment variable) export DATABRICKS_CONFIG_PROFILE=staging && databricks jobs list --profile prod-azure # Using DATABRICKS_HOST (overrides profile) export DATABRICKS_HOST=https://custom-workspace.cloud.databricks.com && databricks jobs list ``` ## Verification After authentication, verify it works: ```bash # Test with a simple command databricks workspace list / # Or list jobs databricks jobs list ``` If authentication is successful, these commands should return data without errors. ## Troubleshooting ### Authentication Not Saved (Config File Missing) **Symptom**: Running `databricks` commands shows: ``` Error: default auth: cannot configure default credentials ``` **Solution**: Make sure you included the `--profile` parameter with a descriptive name: ```bash databricks auth login --host <workspace-url> --profile <profile-name> # Example: databricks auth login --host https://company-workspace.cloud.databricks.com --profile staging ``` ### Browser Doesn't Open Automatically **Solution**: 1. Check the terminal output for a URL 2. Manually copy and paste the URL into your browser 3. Complete the authentication 4. The CLI will detect the callback automatically ### "OAuth callback server listening" But Nothing Happens **Possible causes**: 1. Firewall blocking localhost connections 2. Port 8020 already in use 3. Browser not set as default application **Solution**: 1. Check if port 8020 is available: `lsof -i :8020` 2. Close any applications using that port 3. Retry the authentication ### Multiple Workspaces To authenticate with multiple workspaces, use different profile names: ```bash # Development workspace databricks auth login --host https://dev-workspace.databricks.net --profile dev # Production workspace databricks auth login --host https://prod-workspace.databricks.net --profile prod # Use specific profile databricks jobs list --profile dev databricks jobs list --profile prod ``` ### Re-authenticating If your OAuth token expires or you need to re-authenticate: ```bash # Re-run the login command databricks auth login --host <workspace-url> --profile <profile-name> ``` This will overwrite the existing profile with new credentials. ### Debug Mode For troubleshooting authentication issues, use debug mode: ```bash databricks auth login --host <workspace-url> --profile <profile-name> --debug ``` This shows detailed information about the OAuth flow, including: - OAuth server endpoints - Callback server status - Token exchange process ## Security Best Practices 1. **Never commit** `~/.databrickscfg` to version control 2. **Never share** your OAuth tokens or configuration file 3. **Use separate profiles** for different environments (dev/staging/prod) 4. **Regularly rotate** credentials by re-authenticating 5. **Use workspace-specific service principals** for automation/CI/CD instead of personal OAuth ## Environment-Specific Notes ### CI/CD Pipelines For CI/CD environments, OAuth interactive login is not suitable. Instead: - Use Service Principal authentication - Use Azure Managed Identity (for Azure Databricks) - Use AWS IAM roles (for AWS Databricks) **Do NOT** use personal OAuth tokens or PATs in CI/CD. ### Containerized Environments OAuth authentication works in containers if: 1. A browser is available on the host machine 2. Port forwarding is configured for the callback server 3. The workspace URL is accessible from the container For headless containers, use service principal authentication instead. ## Common Commands After Authentication ```bash # List workspaces databricks workspace list / --profile <PROFILE> # List jobs databricks jobs list --profile <PROFILE> # List clusters databricks clusters list --profile <PROFILE> # Get current user info databricks current-user me --profile <PROFILE> # Test connection databricks workspace export /Users/<username> --format SOURCE --profile <PROFILE> ``` ## References - [Databricks CLI Authentication Documentation](https://docs.databricks.com/en/dev-tools/auth.html) - [OAuth 2.0 with Databricks](https://docs.databricks.com/en/dev-tools/auth.html#oauth-2-0)
# Databricks CLI Installation Install or update the Databricks CLI on macOS, Windows, or Linux using doc-validated methods (Homebrew, WinGet, curl install script, manual download, or user directory install for non-sudo environments). Includes verification and common failure recovery. ## Sandboxed / IDE environments (Cursor, containers) CLI install commands often write to system directories outside the workspace (e.g. `/opt/homebrew/`, `/usr/local/bin/`) which are blocked in sandboxed environments. **Agent behavior**: Do not attempt to run install commands directly. Present the appropriate command to the user and ask them to run it in their own terminal. After they confirm, verify with `databricks -v`. For Linux/macOS containers or Cursor: prefer the **Linux manual install to user directory** method (`~/.local/bin`) — it requires no sudo and no writes outside the workspace. ## Preconditions (always do first) 1. Determine OS and shell: - macOS/Linux: bash/zsh - Windows: Command Prompt / PowerShell; optionally WSL for Linux shell 2. Detect whether `databricks` is already installed: - Run: `databricks -v` (or `databricks version`) - If already installed with a recent version, installation is already OK. 3. Avoid the legacy Python package `databricks-cli` (PyPI). This skill installs the modern Databricks CLI binary. ## Preferred installation paths (by OS) ### macOS (preferred: Homebrew) Run: - `brew tap databricks/tap` - `brew install databricks` Verify: - `databricks -v` (or `databricks version`) If macOS blocks the binary (Gatekeeper), follow Apple’s “open app from unidentified developer” flow. #### macOS fallback: curl installer Run: - `curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh` Notes: - If `/usr/local/bin` is not writable, re-run with `sudo`. - Installs to `/usr/local/bin/databricks`. Verify: - `databricks -v` ### Linux (preferred: Homebrew if available) Run: - `brew tap databricks/tap` - `brew install databricks` Verify: - `databricks -v` #### Linux fallback: curl installer Run: - `curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh` Notes: - If `/usr/local/bin` is not writable, re-run with `sudo`. - Installs to `/usr/local/bin/databricks`. Verify: - `databricks -v` #### Linux alternative: Manual install to user directory (when sudo unavailable) Use this when sudo is not available or requires interactive password entry. Steps: 1. Detect architecture: - `uname -m` (e.g., `x86_64`, `aarch64`) 2. Get the latest download URL using GitHub API: ```bash curl -s https://api.github.com/repos/databricks/cli/releases/latest | grep "browser_download_url.*linux.*$(uname -m | sed 's/x86_64/amd64/' | sed 's/aarch64/arm64/')" | head -1 | cut -d '"' -f 4 ``` 3. Download and install to `~/.local/bin`: ```bash mkdir -p ~/.local/bin cd ~/.local/bin curl -L "<download-url>" -o databricks.tar.gz tar -xzf databricks.tar.gz rm databricks.tar.gz chmod +x databricks ``` 4. Add to PATH (add to `~/.bashrc` or `~/.zshrc` for persistence): ```bash export PATH="$HOME/.local/bin:$PATH" ``` 5. Verify: - `databricks -v` Notes: - The download files are `.tar.gz` archives (not `.zip`) with naming pattern: `databricks_cli_<version>_linux_<arch>.tar.gz` - Common architectures: `amd64` (x86_64), `arm64` (aarch64) - This method works in containerized environments and sandboxed IDEs (e.g. Cursor) without sudo access ### Windows (preferred: WinGet) Run in Command Prompt (then restart the terminal session): - `winget search databricks` - `winget install Databricks.DatabricksCLI` Verify: - `databricks -v` #### Windows alternative: Chocolatey (Experimental) Run: - `choco install databricks-cli` Verify: - `databricks -v` #### Windows fallback: curl installer (recommended via WSL) Databricks recommends WSL for the curl-based install path. Requirements: - WSL available - `unzip` installed in the environment where you run the installer Run (in WSL bash): - `curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh` Verify (in same environment): - `databricks -v` If you must run curl install outside WSL, run as Administrator. Installs to `C:\Windows\databricks.exe`. ## Manual install (all OSes): download from GitHub releases Use this when package managers or curl install are not possible. Steps: 1. Get the latest release download URL: - Visit https://github.com/databricks/cli/releases/latest - OR use GitHub API: `curl -s https://api.github.com/repos/databricks/cli/releases/latest | grep browser_download_url` 2. Download the appropriate file for your OS and architecture: - Linux: `databricks_cli_<version>_linux_<arch>.tar.gz` (use tar -xzf) - macOS: `databricks_cli_<version>_darwin_<arch>.zip` (use unzip) - Windows: `databricks_cli_<version>_windows_<arch>.zip` (use native extraction) - Common architectures: `amd64` (x86_64), `arm64` (aarch64/Apple Silicon) 3. Extract the archive. 4. Ensure the extracted `databricks` executable is on PATH, or run it from its folder. 5. Verify with `databricks -v`. ## Update / repair procedures ### Homebrew update (macOS/Linux) - `brew upgrade databricks` - `databricks -v` ### WinGet update (Windows) - `winget upgrade Databricks.DatabricksCLI` - `databricks -v` ### curl update (all OSes) 1. Delete existing binary: - macOS/Linux: `/usr/local/bin/databricks` - Windows: `C:\Windows\databricks.exe` 2. Re-run: - `curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh` 3. Verify: - `databricks -v` ## Common failures & fixes (agent playbook) - `Target path <path> already exists`: - Delete the existing binary at the install target, then rerun. - Permission error writing `/usr/local/bin`: - Re-run curl installer with `sudo` (macOS/Linux). - If sudo requires interactive password, use manual install to `~/.local/bin` instead. - `sudo: a terminal is required to read the password`: - Cannot use sudo in non-interactive environments (containers, CI/CD). - Use manual install to `~/.local/bin` method instead (see "Linux alternative" section). - Windows PATH not updated after WinGet: - Restart Command Prompt/PowerShell. - Multiple `databricks` binaries on PATH: - Use `which databricks` (macOS/Linux/WSL) or `where databricks` (Windows) and remove the wrong one. - Wrong file type (trying to unzip a tar.gz): - Linux releases are `.tar.gz` files, use `tar -xzf` not `unzip`. - macOS and Windows releases are `.zip` files, use appropriate extraction tool. - `databricks: command not found` after installation to `~/.local/bin`: - Add to PATH: `export PATH="$HOME/.local/bin:$PATH"` - For persistence, add the export command to `~/.bashrc` or `~/.zshrc`.