JupyterHub HPC Meeting - April 2022#

Welcome to the Meeting#

Hello! If you are joining the team video meeting, sign in below so we know who was here. Roll call:

  • name / institution / GitHub handle

  • Rollin Thomas / NERSC / @rcthomas

  • Michael Milligan / MSI @ UMN / @mbmilligan

  • Erik Sundell / Fernando Perez research group / @consideRatio

  • Félix-Antoine Fortin / Université Laval - Compute Canada / @cmd-ntrf

  • Simon Li / University of Dundee / @manics

  • Jeff Edwards/Environmental Protection Agency / @JeffEdwardsTech

  • Monica Ihli / ORNL - Arm Data Science & Integrations Group / ihlimi@ornl.gov

Quick updates#

60 second updates on things you have been up to, questions you have, or developments you think people should know about. This is also a chance to suggest a future presentation if you’ve got work currently in progress you might want to share. Please add yourself, and if you do not have an update to share, you can pass.

  • Name: Your update

  • Erik: Worked on dask/dask-gateway and look to learn about its use in HPS

Reports and celebrations#

This is a place to make announcements (without a need for discussion). This is also a great place to give shout-outs to contributors! We’ll read through these at the beginning of the meeting.

  • Name: Your report or celebration

Agenda items#

Let’s collect all potential agenda items here before the start of the meeting. We will then attempt to create a coherent agenda that fits in the 60m meeting slot. If there are similar items try and group them together.

  • EPA team - discuss work and contribution to Wrapspawner

    • Jeff works at EPA on high-end computing

    • Working to add new feature to JupyterHub

      • Similar to profilespawner

      • Profiles for launching jobs come from file system

      • They use slurm, central scripts storage

      • Allow users to store their own

      • Form allows users to alter script

      • Some javascript

      • Want to contribute to wrapspawner

      • Erik’s suggested related reading on how to adjust spawn options based on user input which could be relevant for the most detailed customizations: https://discourse.jupyter.org/t/tailoring-spawn-options-and-server-configuration-to-certain-users/8449

      • Question is whether these are just Jupyter notebook jobs, or other kinds of jobs

    • Submit PR!

      • Bringing in stuff from the filesystem, keep in mind that in many deployments, the filesystem where users do their work is not accessible from where JupyterHub is running

      • Read-only mount?

      • Filesystem part is configurable

  • All: Standing project items:

    • Batchspawner check-in: Issues and PRs

      • need to check CI status, seems to be failing for PRs, from March:

        • pytest run for 3.9 against hub 0.9.6 fails

        • using a deprecated sqlalchemy maybe pinned by hub

        • Plan: drop that combo from the matrix, rerun

        • jupyterhub/batchspawner#229

        • Will probably unblock other failed PR tests

      • Worthwhile to do a sweep on batchspawner PRs

        • Which are worth keeping which to close?

        • Plan: Mike will take a look on PRs

      • Some new pull requests

      • Users have been breaking batchspawner by overwriting the script template

        • Created issue to not encourage users to do this

        • Documentation issue, not code issue

        • Some better config examples for README (they’re old)

    • Wrapspawner check-in: Issues and PRs

      • None

  • Workshop/conference season:

    • ISC 22 Interactive HPC

    • SC 22 Urgent+Interactive HPC workshop accepted

    • PEARC in Boston

      • Short deadline for posters + papers April 8 NO EXTENSION

  • DaskGateway maintenance push

    • Looking to connect and overview usage Slurm and PBS in general and if DaskGateway is used

    • Most people on the call are running or running on Slurm

    • Erik’s been working on CI testing against e.g. Slurm

      • Docker containers

      • Had been outdated so Erik’s modernizing

      • See links below about CI and Slurm

    • Notes from Rollin/NERSC:

      • Background:

        • We use Slurm to schedule jobs

        • Main back-end for DaskGateway on Slurm is dask-jobqueue

        • dask-jobqueue submits (IIRC) single-node jobs to start workers

        • Our policies prevent more than 2 jobs per user from accruing priority at any given time

        • Access to fast startup (which is expected by users) needs to be managed carefully

        • CLI access to talk to Slurm

      • A better fit for us:

        • Submit a single job with all the nodes you’ll need

        • Start scheduler and srun workers inside that job

        • On one machine dask-mpi is the most stable way to launch

        • On another, vanilla scheduler + dask-cuda-workers is fine

        • Always want scheduler and workers in the compute nodes, only client running in Jupyter

      • Another issue is DaskGateway is specific to Dask

        • We have Spark, IPyParallel, Ray users too

        • But we do want the general gateway functionality of:

          • Taking a user request for a cluster runtime

          • Templating that into a batch job with all the right settings and optimizations

          • Having a way to connect to that from a notebook

          • AND get the environments to match up (between the notebook and the cluster)

            • Needs to support both bare metal and containers

        • Our center has its own API and we’re adding templating as a service

          • Plan lets us support Jupyter + generic cluster runtime

          • Lets us optimize and manage how the jobs run

    • About CI to run Slurm: