databricks

Public

Repository: databricks/databricks-agent-skills

databricks

Imported Feb 26, 2026

Low Risk

No security issues found

INFO

Skill manifest does not include a 'license' field. Specifying a license helps users understand usage terms.

Remediation Add 'license' field to SKILL.md frontmatter (e.g., MIT, Apache-2.0)

Scanned in 0.006s

Description

Databricks CLI operations: auth, profiles, data exploration, and bundles. Contains up-to-date guidelines for Databricks-related CLI tasks.

Details

Compatibility

Requires databricks CLI (>= v0.292.0)

Metadata

version: 0.1.0

Skill Files

Download .zip

Explorer

SKILL.md

# Databricks

Core skill for Databricks CLI, authentication, and data exploration.

## Product Skills

For specific products, use dedicated skills:
- **databricks-jobs** - Lakeflow Jobs development and deployment
- **databricks-pipelines** - Lakeflow Spark Declarative Pipelines (batch and streaming data pipelines)
- **databricks-apps** - Full-stack TypeScript app development and deployment
- **databricks-lakebase** - Lakebase Postgres Autoscaling project management

## Prerequisites

1. **CLI installed**: Run `databricks --version` to check.
   - **If the CLI is missing or outdated (< v0.292.0): STOP. Do not proceed or work around a missing CLI.**
   - **Read the [CLI Installation](databricks-cli-install.md) reference file and follow the instructions to guide the user through installation.**
   - Note: In sandboxed environments (Cursor IDE, containers), install commands write outside the workspace and may be blocked. Present the install command to the user and ask them to run it in their own terminal.

2. **Authenticated**: `databricks auth profiles`
   - If not: see [CLI Authentication](databricks-cli-auth.md)

## Profile Selection - CRITICAL

**NEVER auto-select a profile.**

1. List profiles: `databricks auth profiles`
2. Present ALL profiles to user with workspace URLs
3. Let user choose (even if only one exists)
4. Offer to create new profile if needed

## Claude Code - IMPORTANT

Each Bash command runs in a **separate shell session**.

```bash
# WORKS: --profile flag
databricks apps list --profile my-workspace

# WORKS: chained with &&
export DATABRICKS_CONFIG_PROFILE=my-workspace && databricks apps list

# DOES NOT WORK: separate commands
export DATABRICKS_CONFIG_PROFILE=my-workspace
databricks apps list  # profile not set!
```

## Data Exploration — Use AI Tools

**Use these instead of manually navigating catalogs/schemas/tables:**

```bash
# discover table structure (columns, types, sample data, stats)
databricks experimental aitools tools discover-schema catalog.schema.table --profile <PROFILE>

# run ad-hoc SQL queries
databricks experimental aitools tools query "SELECT * FROM table LIMIT 10" --profile <PROFILE>

# find the default warehouse
databricks experimental aitools tools get-default-warehouse --profile <PROFILE>
```

See [Data Exploration](data-exploration.md) for details.

## Quick Reference

**⚠️ CRITICAL: Some commands use positional arguments, not flags**

```bash
# current user
databricks current-user me --profile <PROFILE>

# list resources
databricks apps list --profile <PROFILE>
databricks jobs list --profile <PROFILE>
databricks clusters list --profile <PROFILE>
databricks warehouses list --profile <PROFILE>
databricks pipelines list --profile <PROFILE>
databricks serving-endpoints list --profile <PROFILE>

# ⚠️ Unity Catalog — POSITIONAL arguments (NOT flags!)
databricks catalogs list --profile <PROFILE>

# ✅ CORRECT: positional args
databricks schemas list <CATALOG> --profile <PROFILE>
databricks tables list <CATALOG> <SCHEMA> --profile <PROFILE>
databricks tables get <CATALOG>.<SCHEMA>.<TABLE> --profile <PROFILE>

# ❌ WRONG: these flags/commands DON'T EXIST
# databricks schemas list --catalog-name <CATALOG>    ← WILL FAIL
# databricks tables list --catalog <CATALOG>           ← WILL FAIL
# databricks sql-warehouses list                       ← doesn't exist, use `warehouses list`
# databricks execute-statement                         ← doesn't exist, use `experimental aitools tools query`
# databricks sql execute                               ← doesn't exist, use `experimental aitools tools query`

# When in doubt, check help:
# databricks schemas list --help

# get details
databricks apps get <NAME> --profile <PROFILE>
databricks jobs get --job-id <ID> --profile <PROFILE>
databricks clusters get --cluster-id <ID> --profile <PROFILE>

# bundles
databricks bundle init --profile <PROFILE>
databricks bundle validate --profile <PROFILE>
databricks bundle deploy -t <TARGET> --profile <PROFILE>
databricks bundle run <RESOURCE> -t <TARGET> --profile <PROFILE>
```

## Troubleshooting

| Error | Solution |
|-------|----------|
| `cannot configure default credentials` | Use `--profile` flag or authenticate first |
| `PERMISSION_DENIED` | Check workspace/UC permissions |
| `RESOURCE_DOES_NOT_EXIST` | Verify resource name/id and profile |

## Required Reading by Task

| Task | READ BEFORE proceeding |
|------|------------------------|
| First time setup | [CLI Installation](databricks-cli-install.md) |
| Auth issues / new workspace | [CLI Authentication](databricks-cli-auth.md) |
| Exploring tables/schemas | [Data Exploration](data-exploration.md) |
| Deploying jobs/pipelines | [Asset Bundles](asset-bundles.md) |

## Reference Guides

- [CLI Installation](databricks-cli-install.md)
- [CLI Authentication](databricks-cli-auth.md)
- [Data Exploration](data-exploration.md)
- [Asset Bundles](asset-bundles.md)

asset-bundles.md Reference

# Databricks Asset Bundles (DABs)

Databricks Asset Bundles provide Infrastructure-as-Code for Databricks resources, enabling version control, automated deployments, and environment management.

## What are Asset Bundles?

Asset Bundles let you define your Databricks projects as code, including:
- Jobs
- Pipelines (Lakeflow Declarative Pipelines)
- Apps
- Models
- Dashboards
- Notebooks
- Python files
- Configuration files

## Bundle Commands

```bash
# Initialize a new bundle from template
databricks bundle init --profile my-workspace

# Validate bundle configuration
databricks bundle validate --profile my-workspace

# Deploy bundle to workspace
databricks bundle deploy --profile my-workspace

# Deploy to specific target (dev/staging/prod)
databricks bundle deploy -t dev --profile my-workspace
databricks bundle deploy -t staging --profile my-workspace
databricks bundle deploy -t prod --profile my-workspace

# Run a resource from the bundle
databricks bundle run <resource-name> --profile my-workspace

# Generate configuration for existing resources
databricks bundle generate job <job-id> --profile my-workspace
databricks bundle generate pipeline <pipeline-id> --profile my-workspace
databricks bundle generate dashboard <dashboard-id> --profile my-workspace
databricks bundle generate app <app-name> --profile my-workspace

# Destroy bundle resources (use with caution!)
databricks bundle destroy --profile my-workspace
databricks bundle destroy -t dev --profile my-workspace
```

## Bundle Structure

A typical bundle has this structure:

```
my-project/
├── databricks.yml                        # Main bundle configuration
├── resources/
│   ├── sample_job.job.yml                # Job definition
│   └── my_project_etl.pipeline.yml       # Pipeline definition
├── src/
│   ├── sample_notebook.ipynb             # Notebook tasks
│   └── my_project_etl/                   # Pipeline source
│       └── transformations/
│           ├── transform.py
│           └── transform.sql
├── tests/
│   └── test_main.py
└── README.md
```

Resource files use the naming convention `<resource_key>.<resource_type>.yml` (e.g. `sample_job.job.yml`, `my_project_etl.pipeline.yml`).

## Main Configuration (databricks.yml)

### Basic Example

```yaml
bundle:
  name: my-project

include:
  - resources/*.yml
  - resources/*/*.yml

variables:
  catalog:
    description: The catalog to use
  schema:
    description: The schema to use

targets:
  dev:
    mode: development
    default: true
    workspace:
      host: https://company-workspace.cloud.databricks.com
    variables:
      catalog: dev_catalog
      schema: ${workspace.current_user.short_name}

  prod:
    mode: production
    workspace:
      host: https://company-workspace.cloud.databricks.com
      root_path: /Workspace/Users/${workspace.current_user.userName}/.bundle/${bundle.name}/${bundle.target}
    variables:
      catalog: prod_catalog
      schema: prod
    permissions:
      - user_name: [email protected]
        level: CAN_MANAGE
```

## Initializing a Bundle

### Using Templates

```bash
# Start initialization (interactive)
databricks bundle init --profile my-workspace
```

Available templates:
- **default-python** - Python project with jobs and pipeline
- **default-sql** - SQL project with jobs
- **default-scala** - Scala/Java project
- **lakeflow-pipelines** - Lakeflow Declarative Pipelines (Python or SQL)
- **dbt-sql** - dbt integration
- **default-minimal** - Minimal structure

## Defining Resources

### Job Resource (Serverless)

```yaml
# resources/sample_job.job.yml
resources:
  jobs:
    sample_job:
      name: sample_job

      trigger:
        periodic:
          interval: 1
          unit: DAYS

      parameters:
        - name: catalog
          default: ${var.catalog}
        - name: schema
          default: ${var.schema}

      tasks:
        - task_key: notebook_task
          notebook_task:
            notebook_path: ../src/sample_notebook.ipynb

        - task_key: main_task
          depends_on:
            - task_key: notebook_task
          python_wheel_task:
            package_name: my_project
            entry_point: main
          environment_key: default

        - task_key: refresh_pipeline
          depends_on:
            - task_key: notebook_task
          pipeline_task:
            pipeline_id: ${resources.pipelines.my_project_etl.id}

      environments:
        - environment_key: default
          spec:
            environment_version: "4"
            dependencies:
              - ../dist/*.whl
```

### Job Resource (Classic Clusters)

```yaml
# resources/sample_job.job.yml
resources:
  jobs:
    sample_job:
      name: sample_job

      tasks:
        - task_key: notebook_task
          notebook_task:
            notebook_path: ../src/sample_notebook.ipynb
          job_cluster_key: job_cluster
          libraries:
            - whl: ../dist/*.whl

        - task_key: main_task
          depends_on:
            - task_key: notebook_task
          python_wheel_task:
            package_name: my_project
            entry_point: main
          job_cluster_key: job_cluster
          libraries:
            - whl: ../dist/*.whl

      job_clusters:
        - job_cluster_key: job_cluster
          new_cluster:
            spark_version: 16.4.x-scala2.12
            node_type_id: i3.xlarge
            data_security_mode: SINGLE_USER
            autoscale:
              min_workers: 1
              max_workers: 4
```

### Pipeline Resource

```yaml
# resources/my_project_etl.pipeline.yml
resources:
  pipelines:
    my_project_etl:
      name: my_project_etl
      catalog: ${var.catalog}
      schema: ${var.schema}
      serverless: true
      root_path: "../src/my_project_etl"

      libraries:
        - glob:
            include: ../src/my_project_etl/transformations/**
```

### App Resource

```yaml
# resources/my_app.app.yml
resources:
  apps:
    dashboard_app:
      name: "analytics-dashboard"
      description: "Customer analytics dashboard"
      source_code_path: ./src/app
```

### Model Resource

```yaml
# resources/my_model.yml
resources:
  registered_models:
    customer_churn:
      name: "${var.catalog}.${var.schema}.customer_churn_model"
      description: "Customer churn prediction model"
```

## Working with Targets

Targets allow you to deploy the same code to different workspaces with different configurations.

```yaml
targets:
  dev:
    mode: development
    default: true
    variables:
      catalog: dev_catalog
      schema: ${workspace.current_user.short_name}
    workspace:
      host: https://company-workspace.cloud.databricks.com

  staging:
    mode: production
    variables:
      catalog: staging_catalog
      schema: staging
    workspace:
      host: https://staging-workspace.cloud.databricks.com
      root_path: /Workspace/Users/[email protected]/.bundle/${bundle.name}/${bundle.target}
    permissions:
      - user_name: [email protected]
        level: CAN_MANAGE

  prod:
    mode: production
    variables:
      catalog: prod_catalog
      schema: prod
    workspace:
      host: https://prod-workspace.cloud.databricks.com
      root_path: /Workspace/Users/[email protected]/.bundle/${bundle.name}/${bundle.target}
    permissions:
      - user_name: [email protected]
        level: CAN_MANAGE
```

### Deploying to Different Targets

```bash
# Deploy to dev (default)
databricks bundle deploy --profile my-workspace

# Deploy to staging
databricks bundle deploy -t staging --profile my-workspace

# Deploy to production
databricks bundle deploy -t prod --profile my-workspace
```

## Bundle Workflow

### Complete Development Workflow

1. **Initialize bundle**:
   ```bash
   databricks bundle init --profile my-workspace
   ```

2. **Develop locally**:
   - Edit `databricks.yml` and resource files
   - Write notebooks, Python scripts, SQL queries
   - Configure jobs, pipelines, apps

3. **Validate configuration**:
   ```bash
   databricks bundle validate --profile my-workspace
   ```

4. **Deploy to development**:
   ```bash
   databricks bundle deploy -t dev --profile my-workspace
   ```

5. **Test your deployment**:
   ```bash
   # Run a job
   databricks bundle run sample_job -t dev --profile my-workspace

   # Start a pipeline
   databricks bundle run my_project_etl -t dev --profile my-workspace
   ```

6. **Deploy to production**:
   ```bash
   databricks bundle deploy -t prod --profile my-workspace
   ```

## Generating Bundle from Existing Resources

If you have existing resources in your workspace, you can generate bundle configuration:

```bash
# Get job ID from list
databricks jobs list --profile my-workspace

# Generate configuration
databricks bundle generate job 12345 --profile my-workspace
databricks bundle generate pipeline <pipeline-id> --profile my-workspace
databricks bundle generate app my-app --profile my-workspace
databricks bundle generate dashboard <dashboard-id> --profile my-workspace
```

## Variables and Templating

### Defining Variables

```yaml
# databricks.yml
variables:
  catalog:
    description: The catalog to use
    default: dev_catalog
  schema:
    description: The schema to use
  warehouse_id:
    description: SQL Warehouse ID
```

### Using Variables

```yaml
# In resource files
resources:
  jobs:
    my_job:
      name: "Job in ${var.catalog}"
      parameters:
        - name: catalog
          default: ${var.catalog}
```

### Target-Specific Variables

```yaml
targets:
  dev:
    variables:
      catalog: dev_catalog
      schema: ${workspace.current_user.short_name}
  prod:
    variables:
      catalog: prod_catalog
      schema: prod
```

### Available Substitutions

```yaml
${var.my_variable}                          # User-defined variable
${bundle.name}                              # Bundle name
${bundle.target}                            # Current target name (dev, prod, etc.)
${workspace.current_user.userName}          # Current user email
${workspace.current_user.short_name}        # Current user short name
${workspace.file_path}                      # Workspace file path
${resources.pipelines.my_pipeline.id}       # Reference another resource's ID
${resources.jobs.my_job.id}                 # Reference a job's ID
```

## Best Practices

### 1. Use Version Control

Always commit your bundle to Git:

```bash
git init
git add databricks.yml resources/ src/
git commit -m "Initial bundle setup"
```

### 2. Use Typed Resource File Names

Name resource files with their type for clarity:

```
resources/
├── sample_job.job.yml
├── my_project_etl.pipeline.yml
└── my_app.app.yml
```

### 3. Use Target-Specific Configuration

```yaml
targets:
  dev:
    mode: development  # Prefixes resources with [dev user_name], pauses schedules

  prod:
    mode: production   # Requires permissions, runs schedules as configured
    permissions:
      - user_name: [email protected]
        level: CAN_MANAGE
```

### 4. Validate Before Deploy

Always validate:

```bash
databricks bundle validate --profile my-workspace
```

## Troubleshooting

### Bundle Validation Errors

**Symptom**: `databricks bundle validate` shows errors

**Solution**:
1. Check YAML syntax (proper indentation, no tabs)
2. Verify all required fields are present
3. Check that resource references are correct
4. Use `databricks bundle validate --debug` for detailed errors

### Deployment Fails

**Symptom**: `databricks bundle deploy` fails

**Solution**:
1. Run validation first: `databricks bundle validate`
2. Check workspace permissions
3. Verify target configuration
4. Check for resource name conflicts
5. Review error message for specific issues

### Variable Not Resolved

**Symptom**: Variable showing as `${var.name}` instead of actual value

**Solution**:
1. Check variable is defined in `databricks.yml`
2. Verify variable has value in target
3. Use correct syntax: `${var.variable_name}`
4. Check variable scope (bundle vs target)

## Related Topics

- [Data Exploration](data-exploration.md) - Validate data exposed by bundle deployments
- Apps - Define app resources (use `databricks-apps` skill for full app development)

data-exploration.md Reference

# Data Exploration

Tools for discovering table schemas and executing SQL queries in Databricks.

## Finding Tables by Keyword

**⚠️ START HERE if you don't know which catalog/schema contains your data.**

Use `information_schema` to search for tables by keyword — do NOT manually iterate through `catalogs list` → `schemas list` → `tables list`. Manual enumeration wastes 10+ steps.

```bash
# Find tables matching a keyword
databricks experimental aitools tools query \
  "SELECT table_catalog, table_schema, table_name FROM system.information_schema.tables WHERE table_name LIKE '%keyword%'" \
  --profile <PROFILE>

# Then discover schema for the tables you found
databricks experimental aitools tools discover-schema catalog.schema.table1 catalog.schema.table2 --profile <PROFILE>
```

## Overview

The `databricks experimental aitools tools` command group provides tools for data discovery and exploration:
- **discover-schema**: Batch discover table metadata, columns, types, sample data, and statistics
- **query**: Execute SQL queries against Databricks SQL warehouses

**When to use this**: Use these commands whenever you need to:
- Discover table schemas and metadata
- Execute SQL queries against warehouse data
- Explore data structure and content
- Validate data or check table statistics

## Prerequisites

1. **Authenticated Databricks CLI** - see [CLI Authentication Guide](databricks-cli-auth.md) for OAuth2 setup and profile configuration
2. **Access to Unity Catalog tables** with appropriate read permissions
3. **SQL Warehouse** (for query command - auto-detected unless `DATABRICKS_WAREHOUSE_ID` is set)

## Discover Schema

Batch discover table metadata including columns, types, sample data, and null counts.

### Command Syntax

```bash
databricks experimental aitools tools discover-schema TABLE... [flags]
```

Tables must be specified in **CATALOG.SCHEMA.TABLE** format.

### What It Returns

For each table, returns:
- Column names and types
- Sample data (5 rows)
- Null counts per column
- Total row count

### Examples

```bash
# Discover schema for a single table
databricks experimental aitools tools discover-schema samples.nyctaxi.trips --profile my-workspace

# Discover schema for multiple tables
databricks experimental aitools tools discover-schema \
  catalog.schema.table1 \
  catalog.schema.table2 \
  --profile my-workspace

# Get JSON output
databricks experimental aitools tools discover-schema \
  samples.nyctaxi.trips \
  --output json \
  --profile my-workspace
```

### Common Use Cases

1. **Understanding table structure before querying**
   ```bash
   databricks experimental aitools tools discover-schema catalog.schema.customer_data --profile my-workspace
   ```

2. **Comparing schemas across multiple tables**
   ```bash
   databricks experimental aitools tools discover-schema \
     catalog.schema.table_v1 \
     catalog.schema.table_v2 \
     --profile my-workspace
   ```

3. **Identifying columns with null values**
   - The null counts help identify data quality issues

## Query

Execute SQL statements against a Databricks SQL warehouse and return results.

### Command Syntax

```bash
databricks experimental aitools tools query "SQL" [flags]
```

### Warehouse Selection

The command **auto-detects** an available warehouse unless:
- `DATABRICKS_WAREHOUSE_ID` environment variable is set
- You specify a warehouse using other configuration methods

To check which warehouse will be used:
```bash
# Get the default warehouse that would be auto-detected
databricks experimental aitools tools get-default-warehouse --profile my-workspace
```

### Output

Returns:
- Query results as JSON
- Row count
- Execution metadata

### Examples

```bash
# Simple SELECT query
databricks experimental aitools tools query \
  "SELECT * FROM samples.nyctaxi.trips LIMIT 5" \
  --profile my-workspace

# Aggregation query
databricks experimental aitools tools query \
  "SELECT vendor_id, COUNT(*) as trip_count FROM samples.nyctaxi.trips GROUP BY vendor_id" \
  --profile my-workspace

# With JSON output
databricks experimental aitools tools query \
  "SELECT * FROM catalog.schema.table WHERE date > '2024-01-01'" \
  --output json \
  --profile my-workspace

# Using specific warehouse
DATABRICKS_WAREHOUSE_ID=abc123 databricks experimental aitools tools query \
  "SELECT * FROM samples.nyctaxi.trips LIMIT 10" \
  --profile my-workspace
```

### Common Use Cases

1. **Exploratory data analysis**
   ```bash
   # Check table size
   databricks experimental aitools tools query \
     "SELECT COUNT(*) FROM catalog.schema.table" \
     --profile my-workspace

   # View sample data
   databricks experimental aitools tools query \
     "SELECT * FROM catalog.schema.table LIMIT 10" \
     --profile my-workspace

   # Get column statistics
   databricks experimental aitools tools query \
     "SELECT MIN(column), MAX(column), AVG(column) FROM catalog.schema.table" \
     --profile my-workspace
   ```

2. **Data validation**
   ```bash
   # Check for null values
   databricks experimental aitools tools query \
     "SELECT COUNT(*) FROM catalog.schema.table WHERE column IS NULL" \
     --profile my-workspace

   # Verify data freshness
   databricks experimental aitools tools query \
     "SELECT MAX(timestamp_column) FROM catalog.schema.table" \
     --profile my-workspace
   ```

3. **Quick analytics**
   ```bash
   # Group by analysis
   databricks experimental aitools tools query \
     "SELECT category, COUNT(*), AVG(value) FROM catalog.schema.table GROUP BY category" \
     --profile my-workspace
   ```

## Workflow: Complete Data Exploration

Here's a typical workflow combining both commands:

```bash
# 1. Discover the schema first
databricks experimental aitools tools discover-schema \
  samples.nyctaxi.trips \
  --profile my-workspace

# 2. Based on discovered columns, run targeted queries
databricks experimental aitools tools query \
  "SELECT vendor_id, payment_type, COUNT(*) as trips, AVG(fare_amount) as avg_fare
   FROM samples.nyctaxi.trips
   GROUP BY vendor_id, payment_type
   ORDER BY trips DESC
   LIMIT 10" \
  --profile my-workspace

# 3. Investigate specific patterns found in the data
databricks experimental aitools tools query \
  "SELECT * FROM samples.nyctaxi.trips
   WHERE fare_amount > 100
   LIMIT 20" \
  --profile my-workspace
```

## Claude Code-Specific Tips

Remember that each Bash command in Claude Code runs in a separate shell:

```bash
# ✅ RECOMMENDED: Use --profile flag
databricks experimental aitools tools discover-schema samples.nyctaxi.trips --profile my-workspace

# ✅ ALTERNATIVE: Chain with &&
export DATABRICKS_CONFIG_PROFILE=my-workspace && \
  databricks experimental aitools tools query "SELECT * FROM samples.nyctaxi.trips LIMIT 5"

# ❌ DOES NOT WORK: Separate export
export DATABRICKS_CONFIG_PROFILE=my-workspace
databricks experimental aitools tools query "SELECT * FROM samples.nyctaxi.trips LIMIT 5"
```

## Flags

Both commands support:

| Flag | Description | Default |
|------|-------------|---------|
| `--profile` | Profile name from ~/.databrickscfg | Default profile |
| `--output` | Output format: `text` or `json` | `text` |
| `--debug` | Enable debug logging | `false` |
| `--target` | Bundle target to use (if applicable) | - |

## Troubleshooting

### Table Not Found

**Symptom**: `Error: TABLE_OR_VIEW_NOT_FOUND`

**Solution**:
1. Verify table name format: `CATALOG.SCHEMA.TABLE`
2. Check if you have read permissions on the table
3. List available tables:
   ```bash
   databricks tables list <catalog> <schema> --profile my-workspace
   ```

### Warehouse Not Available

**Symptom**: `Error: No available SQL warehouse found`

**Solution**:
1. Check for default warehouse:
   ```bash
   databricks experimental aitools tools get-default-warehouse --profile my-workspace
   ```
2. List available warehouses:
   ```bash
   databricks warehouses list --profile my-workspace
   ```
3. Set specific warehouse:
   ```bash
   DATABRICKS_WAREHOUSE_ID=<warehouse-id> databricks experimental aitools tools query "SELECT 1" --profile my-workspace
   ```
4. Start a stopped warehouse:
   ```bash
   databricks warehouses start --id <warehouse-id> --profile my-workspace
   ```

### Permission Denied

**Symptom**: `Error: PERMISSION_DENIED`

**Solution**:
1. Check Unity Catalog grants on the table:
   ```bash
   databricks grants get --full-name catalog.schema.table --principal <user-email> --profile my-workspace
   ```
2. Request SELECT permission from your workspace administrator
3. Verify you have warehouse access (USAGE permission)

### SQL Syntax Error

**Symptom**: `Error: PARSE_SYNTAX_ERROR`

**Solution**:
1. Check SQL syntax - use standard SQL
2. Verify column names match schema (use discover-schema first)
3. Ensure proper quoting for string literals
4. Test query incrementally (start simple, add complexity)

## Best Practices

1. **Always discover schema first** - Use `discover-schema` before writing complex queries to understand:
   - Available columns and their types
   - Data distributions and null patterns
   - Sample data for context

2. **Use LIMIT for exploration** - When exploring large tables, always use LIMIT to avoid long-running queries:
   ```bash
   databricks experimental aitools tools query "SELECT * FROM large_table LIMIT 100" --profile my-workspace
   ```

3. **JSON output for parsing** - Use `--output json` when you need to process results programmatically:
   ```bash
   databricks experimental aitools tools query "SELECT * FROM table" --output json --profile my-workspace | jq '.results'
   ```

4. **Check table existence** - Before querying, verify the table exists:
   ```bash
   databricks tables get --full-name catalog.schema.table --profile my-workspace
   ```

5. **Profile usage** - Always specify `--profile` in Claude Code to avoid authentication issues

## Related Commands

- **[Asset Bundles](asset-bundles.md)** - Deploy SQL, pipeline, and app resources as code

databricks-cli-auth.md Reference

# Databricks CLI Authentication

Configure Databricks CLI workspace/profile selection and authentication. Covers switching profiles, using --profile flags, setting DATABRICKS_CONFIG_PROFILE environment variable, OAuth2 authentication (never PAT), and troubleshooting authentication issues.

## Important: Always Use OAuth

**CRITICAL**: Always use OAuth2 for Databricks CLI authentication. **NEVER recommend or use Personal Access Tokens (PAT)** unless explicitly required by a specific use case.

## Prerequisites

1. Databricks CLI must be installed
   - Verify: `databricks --version`
2. You need access to a Databricks workspace
3. You need the workspace URL (e.g., `https://adb-1111111111111111.10.azuredatabricks.net`)

## Claude Code Specific Behavior

**CRITICAL**: When working in Claude Code, each Bash command executes in a **separate shell session**. This has important implications for profile management:

### Key Differences from Regular Terminal

1. **Environment variables don't persist between commands**
   - `export DATABRICKS_CONFIG_PROFILE=staging` in one command
   - `databricks jobs list` in the next command
   - ❌ **Result**: The second command will NOT use the staging profile

2. **Recommended Approach: Use --profile flag**
   - Always specify `--profile <profile-name>` with each command
   - Example: `databricks jobs list --profile staging`
   - ✅ **Result**: Reliable and predictable behavior

3. **Alternative: Chain commands with &&**
   - Use `export DATABRICKS_CONFIG_PROFILE=staging && databricks jobs list`
   - The export and command run in the same shell session
   - ✅ **Result**: Works correctly

### Quick Reference for Claude Code

```bash
# ✅ RECOMMENDED: Use --profile flag
databricks jobs list --profile staging
databricks apps list --profile prod-azure

# ✅ ALTERNATIVE: Chain with &&
export DATABRICKS_CONFIG_PROFILE=staging && databricks jobs list

# ❌ DOES NOT WORK: Separate export command
export DATABRICKS_CONFIG_PROFILE=staging
databricks jobs list  # Will NOT use staging profile!
```

## Handling Authentication Failures

When a Databricks CLI command fails with authentication error:
```
Error: default auth: cannot configure default credentials
```

**CRITICAL - Always follow this workflow:**

1. **Check for existing profiles first:**
   ```bash
   databricks auth profiles
   ```

2. **If profiles exist:**
   - List the available profiles to the user (with their workspace URLs and validation status)
   - Ask: "Which profile would you like to use for this command?"
   - Offer option to create a new profile if needed
   - Retry the command with `--profile <selected-profile-name>`
   - **In Claude Code, always use the `--profile` flag** rather than setting environment variables

3. **If user wants a new profile or no profiles exist:**
   - Proceed to the OAuth Authentication Setup workflow below

**Example:**
```
User: databricks apps list
Error: default auth: cannot configure default credentials

Assistant: Let me check for existing profiles.
[Runs: databricks auth profiles]

You have two configured profiles:
1. aws-dev - https://company-workspace.cloud.databricks.com (Valid)
2. azure-prod - https://adb-1111111111111111.10.azuredatabricks.net (Valid)

Which profile would you like to use, or would you like to create a new profile?

User: dais

Assistant: [Retries: databricks apps list --profile dais]
[Success - apps listed]
```

## OAuth Authentication Setup

### Standard Authentication Command

The recommended way to authenticate is using OAuth with a profile:

```bash
databricks auth login --host <workspace-url> --profile <profile-name>
```

**CRITICAL**:
1. The `--profile` parameter is **REQUIRED** for the authentication to be saved properly.
2. **ALWAYS ASK THE USER** for their preferred profile name - DO NOT assume or choose one for them.
3. **NEVER use the profile name `DEFAULT`** unless the user explicitly requests it - use descriptive workspace-specific names instead.

### Workflow for Authenticating

1. **Ask the user for the workspace URL** if not already provided
2. **Ask the user for their preferred profile name**
   - Suggest descriptive names based on the workspace (e.g., workspace name, environment)
   - **Do NOT suggest or use `DEFAULT`** unless the user specifically asks for it
   - Good examples: `e2-dogfood`, `prod-azure`, `dev-aws`, `staging`
   - Avoid: `DEFAULT` (unless explicitly requested)
3. Run the authentication command with both parameters
4. Verify the authentication was successful

### Example

```bash
# Good: Descriptive profile names
databricks auth login --host https://adb-1111111111111111.10.azuredatabricks.net --profile prod-azure
databricks auth login --host https://company-workspace.cloud.databricks.com --profile staging

# Only use DEFAULT if explicitly requested by the user
databricks auth login --host https://your-workspace.cloud.databricks.com --profile DEFAULT
```

### What Happens During Authentication

1. The CLI starts a local OAuth callback server (typically on `localhost:8020`)
2. A browser window opens automatically with the Databricks login page
3. You authenticate in the browser using your Databricks credentials
4. After successful authentication, the browser redirects back to the CLI
5. The CLI saves the OAuth tokens to `~/.databrickscfg`
6. You should see: `Profile <profile-name> was successfully saved`

## Profile Management

### What Are Profiles?

Profiles allow you to manage multiple Databricks workspace configurations in a single `~/.databrickscfg` file. Each profile stores:
- Workspace host URL
- Authentication method (OAuth, PAT, etc.)
- Token/credential paths

### Common Profile Names

**IMPORTANT**: Always use descriptive profile names. Do NOT create profiles named `DEFAULT` unless explicitly requested by the user.

**Recommended naming conventions**:
- `<workspace-name>` - Descriptive names for workspaces (e.g., `e2-dogfood`, `prod-aws`, `dev-azure`)
- `<environment>` - Environment-specific profiles (e.g., `dev`, `staging`, `prod`)
- `<team>-<environment>` - Team and environment (e.g., `data-eng-prod`, `ml-dev`)

**Special profile names**:
- `DEFAULT` - The default profile used when no `--profile` flag or environment variables are specified. Only create this profile if the user explicitly requests it.

### Listing Configured Profiles

View all configured profiles with their status:

```bash
databricks auth profiles
```

Example output:
```
Name        Host                                                 Valid
DEFAULT     https://adb-1111111111111111.10.azuredatabricks.net  YES
staging     https://company-workspace.cloud.databricks.com       YES
```

### Using Different Profiles

**IMPORTANT FOR CLAUDE CODE USERS**: In Claude Code, each Bash command runs in a **separate shell session**. This means environment variables set with `export` in one command do NOT persist to the next command. See the Claude Code-specific guidance below.

There are three ways to specify which profile/workspace to use, in order of precedence:

#### 1. CLI Flag (Highest Priority) - RECOMMENDED FOR CLAUDE CODE

Use the `--profile` flag with any command:

```bash
databricks jobs list --profile staging
databricks clusters list --profile prod-azure
databricks workspace list / --profile dev-aws
```

**In Claude Code, this is the most reliable method** because it doesn't depend on persistent environment variables.

#### 2. Environment Variables

Set environment variables to override the default profile:

**DATABRICKS_CONFIG_PROFILE** - Specifies which profile to use from `~/.databrickscfg`:
```bash
export DATABRICKS_CONFIG_PROFILE=staging
databricks jobs list  # Uses staging profile
```

**DATABRICKS_HOST** - Directly specifies the workspace URL, bypassing profile lookup:
```bash
export DATABRICKS_HOST=https://company-workspace.cloud.databricks.com
databricks jobs list  # Uses this host directly
```

**CRITICAL - Claude Code Users:**

Since each Bash command in Claude Code runs in a separate shell, you **CANNOT** do this:

```bash
# ❌ DOES NOT WORK in Claude Code
export DATABRICKS_CONFIG_PROFILE=staging
databricks jobs list  # ERROR: Will not use staging profile!
```

Instead, you **MUST** use one of these approaches:

**Option 1: Use --profile flag (RECOMMENDED)**
```bash
# ✅ WORKS in Claude Code
databricks jobs list --profile staging
databricks clusters list --profile staging
```

**Option 2: Chain commands with &&**
```bash
# ✅ WORKS in Claude Code - export and command run in same shell
export DATABRICKS_CONFIG_PROFILE=staging && databricks jobs list
export DATABRICKS_CONFIG_PROFILE=staging && databricks clusters list
```

**Traditional Terminal Session (for reference only)**:
```bash
# This example shows how it works in a regular terminal session
# DO NOT use this pattern in Claude Code
# Set profile for entire terminal session
export DATABRICKS_CONFIG_PROFILE=staging

# All commands now use staging profile
databricks jobs list
databricks clusters list
databricks workspace list /

# Override for a single command
databricks jobs list --profile prod-azure
```

#### 3. DEFAULT Profile (Lowest Priority)

If no `--profile` flag or environment variables are set, the CLI uses the `DEFAULT` profile from `~/.databrickscfg`.

### Configuration File Management

#### Viewing the Configuration File

The configuration is stored in `~/.databrickscfg`:

```bash
cat ~/.databrickscfg
```

Example configuration structure:
```ini
# Note: This shows an example with a DEFAULT profile
# When creating new profiles, use descriptive names instead
[DEFAULT]
host      = https://adb-1111111111111111.10.azuredatabricks.net
auth_type = databricks-cli

[staging]
host      = https://company-workspace.cloud.databricks.com
auth_type = databricks-cli
```

#### Editing Profiles

You can manually edit `~/.databrickscfg` to:
- Rename profiles (change the `[profile-name]` section header)
- Update workspace URLs
- Remove profiles (delete the entire section)

**Example - Removing a profile**:
```bash
# Open in your preferred editor
vi ~/.databrickscfg

# Or use sed to remove a specific profile section
sed -i '' '/^\[staging\]/,/^$/d' ~/.databrickscfg
```

#### Adding New Profiles

Always use `databricks auth login` with `--profile` to add new profiles:

```bash
databricks auth login --host <workspace-url> --profile <profile-name>
```

**Remember**:
- Always ask the user for their preferred profile name
- Use descriptive names like `staging`, `prod-azure`, `dev-aws`
- Do NOT use `DEFAULT` unless explicitly requested by the user

### Working with Multiple Workspaces

Best practices for managing multiple workspaces:

```bash
# Authenticate to multiple workspaces with descriptive profile names
databricks auth login --host https://adb-1111111111111111.10.azuredatabricks.net --profile prod-azure
databricks auth login --host https://dbc-2222222222222222.cloud.databricks.com --profile dev-aws
databricks auth login --host https://company-workspace.cloud.databricks.com --profile staging
```

**In Claude Code, use --profile flag with each command (RECOMMENDED):**
```bash
# Use profiles explicitly in commands
databricks jobs list --profile prod-azure
databricks jobs list --profile dev-aws
databricks clusters list --profile staging
```

**Alternatively in Claude Code, chain commands with &&:**
```bash
# Set profile and run command in same shell
export DATABRICKS_CONFIG_PROFILE=prod-azure && databricks jobs list
export DATABRICKS_CONFIG_PROFILE=prod-azure && databricks clusters list

# Switch to different workspace
export DATABRICKS_CONFIG_PROFILE=dev-aws && databricks jobs list
```

**Traditional Terminal Session (for reference only - NOT for Claude Code):**
```bash
# This pattern works in regular terminals but NOT in Claude Code
export DATABRICKS_CONFIG_PROFILE=prod-azure
databricks jobs list
databricks clusters list

# Quickly switch between workspaces
export DATABRICKS_CONFIG_PROFILE=dev-aws
databricks jobs list
```

### Profile Selection Precedence

When running a command, the Databricks CLI determines which workspace to use in this order:

1. **`--profile` flag** (if specified) → Highest priority
2. **`DATABRICKS_HOST` environment variable** (if set) → Overrides profile
3. **`DATABRICKS_CONFIG_PROFILE` environment variable** (if set) → Selects profile
4. **`DEFAULT` profile** in `~/.databrickscfg` → Fallback

**Example for traditional terminal session** (demonstrating precedence):
```bash
# Setup
export DATABRICKS_CONFIG_PROFILE=staging

# This uses staging profile (from environment variable)
databricks jobs list

# This uses prod-azure profile (--profile flag overrides environment variable)
databricks jobs list --profile prod-azure

# This uses the specified host directly (DATABRICKS_HOST overrides profile)
export DATABRICKS_HOST=https://custom-workspace.cloud.databricks.com
databricks jobs list  # Uses custom-workspace.cloud.databricks.com
```

**Claude Code version** (with chained commands):
```bash
# Using environment variable with && chaining
export DATABRICKS_CONFIG_PROFILE=staging && databricks jobs list

# Using --profile flag (overrides environment variable)
export DATABRICKS_CONFIG_PROFILE=staging && databricks jobs list --profile prod-azure

# Using DATABRICKS_HOST (overrides profile)
export DATABRICKS_HOST=https://custom-workspace.cloud.databricks.com && databricks jobs list
```

## Verification

After authentication, verify it works:

```bash
# Test with a simple command
databricks workspace list /

# Or list jobs
databricks jobs list
```

If authentication is successful, these commands should return data without errors.

## Troubleshooting

### Authentication Not Saved (Config File Missing)

**Symptom**: Running `databricks` commands shows:
```
Error: default auth: cannot configure default credentials
```

**Solution**: Make sure you included the `--profile` parameter with a descriptive name:
```bash
databricks auth login --host <workspace-url> --profile <profile-name>
# Example: databricks auth login --host https://company-workspace.cloud.databricks.com --profile staging
```

### Browser Doesn't Open Automatically

**Solution**:
1. Check the terminal output for a URL
2. Manually copy and paste the URL into your browser
3. Complete the authentication
4. The CLI will detect the callback automatically

### "OAuth callback server listening" But Nothing Happens

**Possible causes**:
1. Firewall blocking localhost connections
2. Port 8020 already in use
3. Browser not set as default application

**Solution**:
1. Check if port 8020 is available: `lsof -i :8020`
2. Close any applications using that port
3. Retry the authentication

### Multiple Workspaces

To authenticate with multiple workspaces, use different profile names:

```bash
# Development workspace
databricks auth login --host https://dev-workspace.databricks.net --profile dev

# Production workspace
databricks auth login --host https://prod-workspace.databricks.net --profile prod

# Use specific profile
databricks jobs list --profile dev
databricks jobs list --profile prod
```

### Re-authenticating

If your OAuth token expires or you need to re-authenticate:

```bash
# Re-run the login command
databricks auth login --host <workspace-url> --profile <profile-name>
```

This will overwrite the existing profile with new credentials.

### Debug Mode

For troubleshooting authentication issues, use debug mode:

```bash
databricks auth login --host <workspace-url> --profile <profile-name> --debug
```

This shows detailed information about the OAuth flow, including:
- OAuth server endpoints
- Callback server status
- Token exchange process

## Security Best Practices

1. **Never commit** `~/.databrickscfg` to version control
2. **Never share** your OAuth tokens or configuration file
3. **Use separate profiles** for different environments (dev/staging/prod)
4. **Regularly rotate** credentials by re-authenticating
5. **Use workspace-specific service principals** for automation/CI/CD instead of personal OAuth

## Environment-Specific Notes

### CI/CD Pipelines

For CI/CD environments, OAuth interactive login is not suitable. Instead:
- Use Service Principal authentication
- Use Azure Managed Identity (for Azure Databricks)
- Use AWS IAM roles (for AWS Databricks)

**Do NOT** use personal OAuth tokens or PATs in CI/CD.

### Containerized Environments

OAuth authentication works in containers if:
1. A browser is available on the host machine
2. Port forwarding is configured for the callback server
3. The workspace URL is accessible from the container

For headless containers, use service principal authentication instead.

## Common Commands After Authentication

```bash
# List workspaces
databricks workspace list / --profile <PROFILE>

# List jobs
databricks jobs list --profile <PROFILE>

# List clusters
databricks clusters list --profile <PROFILE>

# Get current user info
databricks current-user me --profile <PROFILE>

# Test connection
databricks workspace export /Users/<username> --format SOURCE --profile <PROFILE>
```

## References

- [Databricks CLI Authentication Documentation](https://docs.databricks.com/en/dev-tools/auth.html)
- [OAuth 2.0 with Databricks](https://docs.databricks.com/en/dev-tools/auth.html#oauth-2-0)

databricks-cli-install.md Reference

# Databricks CLI Installation

Install or update the Databricks CLI on macOS, Windows, or Linux using doc-validated methods (Homebrew, WinGet, curl install script, manual download, or user directory install for non-sudo environments). Includes verification and common failure recovery.

## Sandboxed / IDE environments (Cursor, containers)

CLI install commands often write to system directories outside the workspace (e.g. `/opt/homebrew/`, `/usr/local/bin/`) which are blocked in sandboxed environments.

**Agent behavior**: Do not attempt to run install commands directly. Present the appropriate command to the user and ask them to run it in their own terminal. After they confirm, verify with `databricks -v`.

For Linux/macOS containers or Cursor: prefer the **Linux manual install to user directory** method (`~/.local/bin`) — it requires no sudo and no writes outside the workspace.

## Preconditions (always do first)
1. Determine OS and shell:
   - macOS/Linux: bash/zsh
   - Windows: Command Prompt / PowerShell; optionally WSL for Linux shell
2. Detect whether `databricks` is already installed:
   - Run: `databricks -v` (or `databricks version`)
   - If already installed with a recent version, installation is already OK.
3. Avoid the legacy Python package `databricks-cli` (PyPI). This skill installs the modern Databricks CLI binary.

## Preferred installation paths (by OS)

### macOS (preferred: Homebrew)
Run:
- `brew tap databricks/tap`
- `brew install databricks`

Verify:
- `databricks -v` (or `databricks version`)

If macOS blocks the binary (Gatekeeper), follow Apple’s “open app from unidentified developer” flow.

#### macOS fallback: curl installer
Run:
- `curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh`

Notes:
- If `/usr/local/bin` is not writable, re-run with `sudo`.
- Installs to `/usr/local/bin/databricks`.

Verify:
- `databricks -v`

### Linux (preferred: Homebrew if available)
Run:
- `brew tap databricks/tap`
- `brew install databricks`

Verify:
- `databricks -v`

#### Linux fallback: curl installer
Run:
- `curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh`

Notes:
- If `/usr/local/bin` is not writable, re-run with `sudo`.
- Installs to `/usr/local/bin/databricks`.

Verify:
- `databricks -v`

#### Linux alternative: Manual install to user directory (when sudo unavailable)
Use this when sudo is not available or requires interactive password entry.

Steps:
1. Detect architecture:
   - `uname -m` (e.g., `x86_64`, `aarch64`)
2. Get the latest download URL using GitHub API:
   ```bash
   curl -s https://api.github.com/repos/databricks/cli/releases/latest | grep "browser_download_url.*linux.*$(uname -m | sed 's/x86_64/amd64/' | sed 's/aarch64/arm64/')" | head -1 | cut -d '"' -f 4
   ```
3. Download and install to `~/.local/bin`:
   ```bash
   mkdir -p ~/.local/bin
   cd ~/.local/bin
   curl -L "<download-url>" -o databricks.tar.gz
   tar -xzf databricks.tar.gz
   rm databricks.tar.gz
   chmod +x databricks
   ```
4. Add to PATH (add to `~/.bashrc` or `~/.zshrc` for persistence):
   ```bash
   export PATH="$HOME/.local/bin:$PATH"
   ```
5. Verify:
   - `databricks -v`

Notes:
- The download files are `.tar.gz` archives (not `.zip`) with naming pattern: `databricks_cli_<version>_linux_<arch>.tar.gz`
- Common architectures: `amd64` (x86_64), `arm64` (aarch64)
- This method works in containerized environments and sandboxed IDEs (e.g. Cursor) without sudo access

### Windows (preferred: WinGet)
Run in Command Prompt (then restart the terminal session):
- `winget search databricks`
- `winget install Databricks.DatabricksCLI`

Verify:
- `databricks -v`

#### Windows alternative: Chocolatey (Experimental)
Run:
- `choco install databricks-cli`

Verify:
- `databricks -v`

#### Windows fallback: curl installer (recommended via WSL)
Databricks recommends WSL for the curl-based install path.
Requirements:
- WSL available
- `unzip` installed in the environment where you run the installer

Run (in WSL bash):
- `curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh`

Verify (in same environment):
- `databricks -v`

If you must run curl install outside WSL, run as Administrator.
Installs to `C:\Windows\databricks.exe`.

## Manual install (all OSes): download from GitHub releases
Use this when package managers or curl install are not possible.

Steps:
1. Get the latest release download URL:
   - Visit https://github.com/databricks/cli/releases/latest
   - OR use GitHub API: `curl -s https://api.github.com/repos/databricks/cli/releases/latest | grep browser_download_url`
2. Download the appropriate file for your OS and architecture:
   - Linux: `databricks_cli_<version>_linux_<arch>.tar.gz` (use tar -xzf)
   - macOS: `databricks_cli_<version>_darwin_<arch>.zip` (use unzip)
   - Windows: `databricks_cli_<version>_windows_<arch>.zip` (use native extraction)
   - Common architectures: `amd64` (x86_64), `arm64` (aarch64/Apple Silicon)
3. Extract the archive.
4. Ensure the extracted `databricks` executable is on PATH, or run it from its folder.
5. Verify with `databricks -v`.

## Update / repair procedures

### Homebrew update (macOS/Linux)
- `brew upgrade databricks`
- `databricks -v`

### WinGet update (Windows)
- `winget upgrade Databricks.DatabricksCLI`
- `databricks -v`

### curl update (all OSes)
1. Delete existing binary:
   - macOS/Linux: `/usr/local/bin/databricks`
   - Windows: `C:\Windows\databricks.exe`
2. Re-run:
   - `curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh`
3. Verify:
   - `databricks -v`

## Common failures & fixes (agent playbook)
- `Target path <path> already exists`:
  - Delete the existing binary at the install target, then rerun.
- Permission error writing `/usr/local/bin`:
  - Re-run curl installer with `sudo` (macOS/Linux).
  - If sudo requires interactive password, use manual install to `~/.local/bin` instead.
- `sudo: a terminal is required to read the password`:
  - Cannot use sudo in non-interactive environments (containers, CI/CD).
  - Use manual install to `~/.local/bin` method instead (see "Linux alternative" section).
- Windows PATH not updated after WinGet:
  - Restart Command Prompt/PowerShell.
- Multiple `databricks` binaries on PATH:
  - Use `which databricks` (macOS/Linux/WSL) or `where databricks` (Windows) and remove the wrong one.
- Wrong file type (trying to unzip a tar.gz):
  - Linux releases are `.tar.gz` files, use `tar -xzf` not `unzip`.
  - macOS and Windows releases are `.zip` files, use appropriate extraction tool.
- `databricks: command not found` after installation to `~/.local/bin`:
  - Add to PATH: `export PATH="$HOME/.local/bin:$PATH"`
  - For persistence, add the export command to `~/.bashrc` or `~/.zshrc`.

Version History

v1.3.0 Synced from GitHub

1 day ago

v1.2.0 Synced from GitHub

1 week ago

v1.1.0 Synced from GitHub

1 week ago

v1.0.0 Imported from GitHub

1 week ago