Federal Science DataHubFederal Science DataHub
  • English
  • Français
  • English
  • Français
  • Overview
  • Managing Workspaces and Users

    • Getting a workspace (only available on the GC network)
    • Estimate costs (only available on the GC network)
    • Account Setup
    • Requesting, configuring and removing tools in your workspace
    • Invite a user
    • Change a user role
    • Manage your CBR & workspace budgets
  • Storage

    • Working with Azure Storage
    • Bring Your Own Storage

      • Import AWS S3 Bucket
      • Import Azure Storage
      • Import Google Cloud Platform Storage
    • Access Storage in Databricks
    • Use AzCopy to Interact with Storage
  • Databricks

    • Getting Started with Databricks
    • FSDH Cluster Policies
    • MLFlow: AutoML and Experiments
    • Databricks Workflows
    • Dashboarding

      • How to Dashboard in Databricks
      • Dashboarding Tool Comparison
    • External Extensions

      • Git/GitHub Integration with Databricks
      • Databricks VS Code Extension
      • Working with Conda
      • Connecting Google API to Databricks
  • PostgreSQL

    • Create and use a PostgreSQL Database
    • Add a User to PostgreSQL on FSDH
    • PostgreSQL vs Azure Databricks Database Features
  • Web Applications

    • Hosting Web Apps on DataHub
  • Migrating to Production

    • Migrating Storage
    • Migrating Databricks
    • Migrating PostgreSQL
    • Migrating Web Apps
  • User Guidance

    • Account Management and Access control of workspaces
    • Backup and Recovery
    • Github and code repo management
    • Incident Detection & Response
    • Monitor Usage
    • Monitoring and Auditing a Workspace
    • Source code
    • Restricted File Types on FSDH Storage
  • Terms and Conditions

Working with Conda

Databricks can support Conda based environments. FSDH provides two options for users to work with Conda.

  1. Project specific Docker image with Conda support and a predefined Conda environment. The Docker image needs to be co-developed with the FSDH support team and pushed to GitHub Container Registry (GHCR).
  2. Generic Docker image with Conda support. Users will need to install packages in their notebooks.

For illustration purposes, the following steps are based on option 1.

Step 1: Create environment YAML

Sample code for env.yml. Skip to Step 3 if using an existing Docker image.

name: fsdh-sample
channels:
  - bioconda
  - default
dependencies:
  - python=3.8.16
  - pip=23.0.1
  - six=1.16.0
  - ipython=8.12.0
  - nomkl=3.0
  - numpy=1.24.3
  - pandas=1.1.5
  - traitlets=5.7.1
  - wheel=0.38.4
  - hifiasm=0.16.1
  - pip:
    - pyarrow==1.0.1

Step 2. Build and Push the Image

The FSDH team builds and pushes the image to GitHub. Skip to Step 3 if using an existing Docker image.

docker build -t fsdh-sample .
export GHCR_PAT="XXX"
echo $GHCR_PAT|docker login ghcr.io -u <username> --password-stdin
docker tag fsdh-sample ghcr.io/ssc-sp/fsdh-sample:latest
docker push ghcr.io/ssc-sp/fsdh-sample:latest

Step 3. Create a cluster

  1. Ask your admin to enable Container Service for your Databricks workspace.
  2. Create a cluster with access mode "No Isolation Shared"
  3. Choose a runtime of 10.4-LTS, 9.1-LTS or 7.3-LTS
  4. Under Advance -> Docker, use image ghcr.io/ssc-sp/fsdh-sample:latestimage

Step 4. Validate the Cluster

Run the following code:

%sh
conda list
Edit this page on GitHub
Last Updated: 2026-04-13, 11:39 a.m.
Previous
Databricks VS Code Extension
Next
Connecting Google API to Databricks