JupyterHub HPC Meeting - April 2022#

Date: 2022-04-06
Time: 8:30 AM PDT
- Your timezone: https://arewemeetingyet.com/Los Angeles/2022-04-06/08:30/JupyterHub-HPC
GitHub issue:
Calendar for future meetings: https://jupyterhub-team-compass.readthedocs.io/en/latest/meetings.html

Welcome to the Meeting#

Hello! If you are joining the team video meeting, sign in below so we know who was here. Roll call:

name / institution / GitHub handle
Rollin Thomas / NERSC / @rcthomas
Michael Milligan / MSI @ UMN / @mbmilligan
Erik Sundell / Fernando Perez research group / @consideRatio
Félix-Antoine Fortin / Université Laval - Compute Canada / @cmd-ntrf
Simon Li / University of Dundee / @manics
Jeff Edwards/Environmental Protection Agency / @JeffEdwardsTech
Monica Ihli / ORNL - Arm Data Science & Integrations Group / ihlimi@ornl.gov

Quick updates#

60 second updates on things you have been up to, questions you have, or developments you think people should know about. This is also a chance to suggest a future presentation if you’ve got work currently in progress you might want to share. Please add yourself, and if you do not have an update to share, you can pass.

Name: Your update
Erik: Worked on dask/dask-gateway and look to learn about its use in HPS

Reports and celebrations#

This is a place to make announcements (without a need for discussion). This is also a great place to give shout-outs to contributors! We’ll read through these at the beginning of the meeting.

Name: Your report or celebration

Agenda items#

Let’s collect all potential agenda items here before the start of the meeting. We will then attempt to create a coherent agenda that fits in the 60m meeting slot. If there are similar items try and group them together.

EPA team - discuss work and contribution to Wrapspawner
- Jeff works at EPA on high-end computing
- Working to add new feature to JupyterHub
  - Similar to profilespawner
  - Profiles for launching jobs come from file system
  - They use slurm, central scripts storage
  - Allow users to store their own
  - Form allows users to alter script
  - Some javascript
  - Want to contribute to wrapspawner
  - Erik’s suggested related reading on how to adjust spawn options based on user input which could be relevant for the most detailed customizations: https://discourse.jupyter.org/t/tailoring-spawn-options-and-server-configuration-to-certain-users/8449
  - Question is whether these are just Jupyter notebook jobs, or other kinds of jobs
- Submit PR!
  - Bringing in stuff from the filesystem, keep in mind that in many deployments, the filesystem where users do their work is not accessible from where JupyterHub is running
  - Read-only mount?
  - Filesystem part is configurable
All: Standing project items:
- Batchspawner check-in: Issues and PRs
  - need to check CI status, seems to be failing for PRs, from March:
    - pytest run for 3.9 against hub 0.9.6 fails
    - using a deprecated sqlalchemy maybe pinned by hub
    - Plan: drop that combo from the matrix, rerun
    - jupyterhub/batchspawner#229
    - Will probably unblock other failed PR tests
  - Worthwhile to do a sweep on batchspawner PRs
    - Which are worth keeping which to close?
    - Plan: Mike will take a look on PRs
  - Some new pull requests
  - Users have been breaking batchspawner by overwriting the script template
    - Created issue to not encourage users to do this
    - Documentation issue, not code issue
    - Some better config examples for README (they’re old)
- Wrapspawner check-in: Issues and PRs
  - None
Workshop/conference season:
- ISC 22 Interactive HPC
  - https://www.interactivehpc.com
  - Accepting late submissions
  - Author notification: April 14, 2022
  - 6-12 page papers, also “paper drafting exercise”
- SC 22 Urgent+Interactive HPC workshop accepted
- PEARC in Boston
  - Short deadline for posters + papers April 8 NO EXTENSION
DaskGateway maintenance push
- Looking to connect and overview usage Slurm and PBS in general and if DaskGateway is used
- Most people on the call are running or running on Slurm
- Erik’s been working on CI testing against e.g. Slurm
  - Docker containers
  - Had been outdated so Erik’s modernizing
  - See links below about CI and Slurm
- Notes from Rollin/NERSC:
  - Background:
    - We use Slurm to schedule jobs
    - Main back-end for DaskGateway on Slurm is dask-jobqueue
    - dask-jobqueue submits (IIRC) single-node jobs to start workers
    - Our policies prevent more than 2 jobs per user from accruing priority at any given time
    - Access to fast startup (which is expected by users) needs to be managed carefully
    - CLI access to talk to Slurm
  - A better fit for us:
    - Submit a single job with all the nodes you’ll need
    - Start scheduler and srun workers inside that job
    - On one machine dask-mpi is the most stable way to launch
    - On another, vanilla scheduler + dask-cuda-workers is fine
    - Always want scheduler and workers in the compute nodes, only client running in Jupyter
  - Another issue is DaskGateway is specific to Dask
    - We have Spark, IPyParallel, Ray users too
    - But we do want the general gateway functionality of:
      - Taking a user request for a cluster runtime
      - Templating that into a batch job with all the right settings and optimizations
      - Having a way to connect to that from a notebook
      - AND get the environments to match up (between the notebook and the cluster)
        
        Needs to support both bare metal and containers
    - Our center has its own API and we’re adding templating as a service
      - Plan lets us support Jupyter + generic cluster runtime
      - Lets us optimize and manage how the jobs run
- About CI to run Slurm:
  - A large PR updating Dockerfile’s that initializes Slurm in a container: dask/dask-gateway#519
  - The old Dockerfile that builds to a image that starts Slurm: dask/dask-gateway
  - The new Dockerfile after updating it in my PR: dask/dask-gateway

JupyterHub HPC Meeting - April 2022#

Welcome to the Meeting#

Quick updates#

Reports and celebrations#

Agenda items#

Links Shared#