databricks-jobs

Public

Repository: databricks/databricks-agent-skills

databricks

Imported Mar 3, 2026

Low Risk

No security issues found

INFO

Skill manifest does not include a 'license' field. Specifying a license helps users understand usage terms.

Remediation Add 'license' field to SKILL.md frontmatter (e.g., MIT, Apache-2.0)

Scanned in 0.006s

Description

Develop and deploy Lakeflow Jobs on Databricks. Use when creating data engineering jobs with notebooks, Python wheels, or SQL tasks. Invoke BEFORE starting implementation.

Details

Compatibility

Requires databricks CLI (>= v0.292.0)

Metadata

version: 0.1.0

Skill Files

Download SKILL.md

SKILL.md

# Lakeflow Jobs Development

**FIRST**: Use the parent `databricks` skill for CLI basics, authentication, profile selection, and data exploration commands.

Lakeflow Jobs are scheduled workflows that run notebooks, Python scripts, SQL queries, and other tasks on Databricks.

## Scaffolding a New Job Project

Use `databricks bundle init` with a config file to scaffold non-interactively. This creates a project in the `<project_name>/` directory:

```bash
databricks bundle init default-python --config-file <(echo '{"project_name": "my_job", "include_job": "yes", "include_pipeline": "no", "include_python": "yes", "serverless": "yes"}') --profile <PROFILE> < /dev/null
```

- `project_name`: letters, numbers, underscores only

After scaffolding, create `CLAUDE.md` and `AGENTS.md` in the project directory. These files are essential to provide agents with guidance on how to work with the project. Use this content:

```
# Databricks Asset Bundles Project

This project uses Databricks Asset Bundles for deployment.

## Prerequisites

Install the Databricks CLI (>= v0.288.0) if not already installed:
- macOS: `brew tap databricks/tap && brew install databricks`
- Linux: `curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh`
- Windows: `winget install Databricks.DatabricksCLI`

Verify: `databricks -v`

## For AI Agents

Read the `databricks` skill for CLI basics, authentication, and deployment workflow.
Read the `databricks-jobs` skill for job-specific guidance.

If skills are not available, install them: `databricks experimental aitools skills install`
```

## Project Structure

```
my-job-project/
├── databricks.yml              # Bundle configuration
├── resources/
│   └── my_job.job.yml          # Job definition
├── src/
│   ├── my_notebook.ipynb       # Notebook tasks
│   └── my_module/              # Python wheel package
│       ├── __init__.py
│       └── main.py
├── tests/
│   └── test_main.py
└── pyproject.toml               # Python project config (if using wheels)
```

## Configuring Tasks

Edit `resources/<job_name>.job.yml` to configure tasks:

```yaml
resources:
  jobs:
    my_job:
      name: my_job

      tasks:
        - task_key: my_notebook
          notebook_task:
            notebook_path: ../src/my_notebook.ipynb

        - task_key: my_python
          depends_on:
            - task_key: my_notebook
          python_wheel_task:
            package_name: my_package
            entry_point: main
```

Task types: `notebook_task`, `python_wheel_task`, `spark_python_task`, `pipeline_task`, `sql_task`

## Job Parameters

Parameters defined at job level are passed to ALL tasks (no need to repeat per task):

```yaml
resources:
  jobs:
    my_job:
      parameters:
        - name: catalog
          default: ${var.catalog}
        - name: schema
          default: ${var.schema}
```

Access parameters in notebooks with `dbutils.widgets.get("catalog")`.

## Writing Notebook Code

```python
# Read parameters
catalog = dbutils.widgets.get("catalog")
schema = dbutils.widgets.get("schema")

# Read tables
df = spark.read.table(f"{catalog}.{schema}.my_table")

# SQL queries
result = spark.sql(f"SELECT * FROM {catalog}.{schema}.my_table LIMIT 10")

# Write output
df.write.mode("overwrite").saveAsTable(f"{catalog}.{schema}.output_table")
```

## Scheduling

```yaml
resources:
  jobs:
    my_job:
      trigger:
        periodic:
          interval: 1
          unit: DAYS
```

Or with cron:

```yaml
      schedule:
        quartz_cron_expression: "0 0 2 * * ?"
        timezone_id: "UTC"
```

## Multi-Task Jobs with Dependencies

```yaml
resources:
  jobs:
    my_pipeline_job:
      tasks:
        - task_key: extract
          notebook_task:
            notebook_path: ../src/extract.ipynb

        - task_key: transform
          depends_on:
            - task_key: extract
          notebook_task:
            notebook_path: ../src/transform.ipynb

        - task_key: load
          depends_on:
            - task_key: transform
          notebook_task:
            notebook_path: ../src/load.ipynb
```

## Unit Testing

Run unit tests locally:

```bash
uv run pytest
```

## Development Workflow

1. **Validate**: `databricks bundle validate --profile <profile>`
2. **Deploy**: `databricks bundle deploy -t dev --profile <profile>`
3. **Run**: `databricks bundle run <job_name> -t dev --profile <profile>`
4. **Check run status**: `databricks jobs get-run --run-id <id> --profile <profile>`

## Documentation

- Lakeflow Jobs: https://docs.databricks.com/jobs
- Task types: https://docs.databricks.com/jobs/configure-task
- Databricks Asset Bundles: https://docs.databricks.com/dev-tools/bundles/examples

Version History

v1.2.0 Synced from GitHub

1 day ago

v1.1.0 Synced from GitHub

5 days ago

v1.0.0 Imported from GitHub

1 week ago