2023 Collaboration Cafe Notes Archive#

2023-11-21#

mybinder.org AWS deployment#

Description: AWS Curvenote deployment is (hopefully) nearly ready
- jupyterhub/mybinder.org-deploy#2834
- jupyterhub/mybinder.org-deploy#2629

Notes#

Min will commit a terraform.lock for the GCP deployment so that it’s reproducible
Clean up config
Add to federation, send a small amount of traffic e.g. 1%, then ramp up if not problems
Blog posts: Steve from Curvenote will write one, Simon will write a separate technical post

Docs Working Group#

A new Jupyter Docs Working Group is in the works, we need your comments and feedback!

Proposal
- jupyterlab/team-compass#212
Governance/Charter/More (Team compass)
- jupyter/docs-team-compass

Notes#

JupyterHub python template repo can establish a “JupyterHub like project”: jupyterhub/jupyterhub-python-repo-template
https://diataxis.fr

Misc releases#

oauthenticator release will be made soon
kubespawner release will be made soon
multiauthenticator
- idiap/multiauthenticator
batchspawner release needed for jupyterhub compatibility

2023-10-17#

User-initiated sharing#

Description: Implementing user-intiated sharing, there are some things to work out in the design. Min has done some partial implementations, but switched to documenting how it should work before finishing any implementation. Draft PR here.

Notes#

don’t get too complex for first draft
invite coes may solve discoverability
limit user:user grant to read access for read:user:name (separate share-with scope later, if needed)
make it opt-in

Why should HPC folk care about Binder/be interested in it?#

Description: Sarah has been asked to present on Binder to a HPC workshop. I can give the “reproducibility with Binder” bit, but I’m looking for angles to make this interesting for the HPC audience. I think there’s a “Cloud vs HPC” culture and a “sysadmin won’t let me run docker on the cluster” pushback to navigate. So I’m thinking about how to frame this as “so this exact tech stack doesn’t fit your needs, but here’s some lessons learned”. Also, I think the workshop is for a tool that helps with reproducible workflows on HPC, so I don’t think they’ll actually use Binder/repo2docker at all (though I don’t know much about the tool) https://hpc.guix.info/events/2023/workshop/ Drafting abstract and talk outline: https://docs.google.com/document/d/10LnhlWXi_ZSNHUphT5-CiSlSb7tssT5_uMh9bRX4Mo0/edit?usp=sharing

Notes#

Questions: What’s the pitch? You should use this tool vs what to learn from this peer project?
ncar-xdev/repo2apptainer
- Is this tech (singularity/apptainer) still being pitched to HPC centres?
Use Guix inside repo2docker? jupyterhub/repo2docker#1048
r2d automates good practices where they exist, getting Guix practices into r2d is within scope! jupyterhub/repo2docker#1048
Meet them where they are: use podman/singularity instead, are they happy to run containers at all? Do they see a need for containers?
What dependencies do you need to install, and what command to execute to run them? https://repo2docker.readthedocs.io/en/latest/specification.html
- How do they envision reproducibility working outside of their ecosystem
[Simon] AFAICT Apptainer can’t bulid from a Dockerfile, so either you’d need to convert a Docker image, or convert the Dockerfile to an apptainer definition file https://apptainer.org/docs/user/main/build_a_container.html#building-containers-from-apptainer-definition-files
Host outside HPC the ability to build images, only permit launching images inside HPC, e.g., GESIS work on preventing Binder from launching images

JupyterHub-Contrib proposal draft#

Description: Outcome of meeting with Sumana a few weeks ago, https://hackmd.io/TrI2lZSJS5e3cS7y9wcMfw?both is a very draft proposal about a jupyterhub-contrib organization, and how that may work.

Notes#

S: How do we possibly use template repo to share cultural project expectations?
- Min: This is in the team-compass stuff, need more things written there than we have now

Possibly a template for the jupyterhub org but a cookie-cutter for the contrib org?

`@jupyterhub/binderhub-client` npm package#

Description: Yuvi is trying to clean up the frontend JS of binderhub, and part of that is maintaining a JS package that can be used by various clients that need to talk to the binderhub API. It’s at jupyterhub/binderhub and has seen a lot of work recently. No specific ask, just a report that this is happening

2023-09-19#

AWS/Curvenote MyBinder federation member#

Description: New AWS/curvenote mybinder federation member: Steve Purves has donated AWS credits and is keen to become an official member. Feedback welcome on this PR
- jupyterhub/mybinder.org-deploy#2652
- Related discussion/background: jupyterhub/mybinder.org-deploy#2556

Notes#

Will it work against AWS ECR? Next PR
- Deployment of cloud infra is in one PR: jupyterhub/mybinder.org-deploy#2652
- The BinderHub chart is deployed with extraConfig that defines a new DockerRegistry sub class that takes care of the complexity of working against a AWS ECR container registry (by interacting with a Golang based service Simon has developed).
  - manics/binderhub-container-registry-helper
- Curvenote BinderHub chart deployment: jupyterhub/mybinder.org-deploy#2698

Lowest level of maintenance for JupyterHub org project’s#

Description: In github.com/jupyterhub has a lot of projects, if possible I’d (@consideRatio) love us to manage to ensure a some basic level of maintenance in all of the projects - or, deprecate/archive them etc.

Notes#

We have lots of repos, concerns about level of support each repo gets.
Would like some clear expectations about what level of maintenance we are promising for being an “official jupyterhub repo”
Sarah is in touch with an experienced leader, Sumana, might be available to for discussion on open source leadership. Sarah is meeting next week.
Clarify expectations of what level of support comes with being under “jupyterhub” name
There is a proposal for jupyterhub-contrib which doesn’t indicate support in the same way
- Could have a low bar top accepting repositories
Being included in a distribution (z2jh, tljh) also raises the bar for expectations
Clear expectations is a good thing, and can be contributing to ecosystem in general
A while ago we had a discussion about how git ownership/maintainership confers status, and we wanted to move away from that to include people who do community work and other tasks not directly related to code

Drafting a new Roadmap#

Description: Roadmaps are useful tools to signpost to many audiences about the direction of the project (to newcomers where work is happening and they could be involved, to users about what to expect in future releases, to contributors where to focus their efforts). Ours is a little old now so the proposal is to begin drafting a new one.
Proposed background reading: https://docs.oscollective.org/guides/marketing-publicity-roadmaps-and-comms

Below is a suggested template for organising a roadmap - it’s not the sole correct way to think about this! It’s just here as a starting point :)

Let’s think about the release of the z2jh helm chart, and any packages that could be included, as our “unit” here.

#### The next release...
- **Theme:**
- **Theme:**

#### The release after that...
- **Theme:** Security? Have we had any feedback from the security bounty scheme that we could act on?
- **Theme:** Metrics? https://blog.jupyter.org/accurately-counting-daily-weekly-monthly-active-users-on-jupyterhub-6fbec6c6ce2f was a good start. Can we build within the JupyterHub ecosystem that doesn't rely on third-party software such as grafana/kube-metric/prometheus? Can we build something into the Admin panel for better discoverability?
- **Theme:**

#### Later on...
- **Theme:**
- **Theme:**

Notes#

Question: what level to roadmap? Org-level? Per-repo? How many repos get one?
Org-wide roadmap item could be e.g. triage, organizational maintenance, not tied to releases
Roadmap might include things we want to do, but might not get to
Stating long-term things we aren’t working on is still useful, and can motivate contributors or funding
Separate roadmaps for aspirational (funding), practical (what we have capacity to do short-term)
Roadmap - long term items:
- supporting multiple replicas of jupyterhub
- moving away from CHP
- repo2docker reproducibility
Roadmap - short term item nominations:
- establish maintenance expectations on projects in jupyterhub gh org
- reporting/metrics (jupyterhub-grafana)
- e.g. “focus on releasing z2jh 4”
Org-level, at least, maybe tying to releases is not a good fit. Instead more time-based “short-term/long-term”
Roadmaps don’t let us tell volunteer contributors what to spend their time on (but they also kind of do!)

2023-07-18#

Project Pythia#

Project Pythia
- Kevin Tyle
- Funded to work with public cloud
- All notebooks developed should work with BinderHub
- https://cookbooks.projectpythia.org/
- Pangeo encourages use of cloud stored datasets
- Educational outreach of Pangeo
- Soliciting community contributions for “cookbooks”, particular use-cases for workflows
  - Jupyter notebooks published by GitHub actions using JupyterBook
  - Want people to have ability to interact with notebooks- leveraging BinderHub
- Hosted on JetStream2
  - How can they make instance more performant?
  - Kubernetes recipe from SPSC, based on zero-to-binderhub
  - 4 workers, 60GB RAM
  - Anticipate making larger, adding GPUs, b ut fairly lightweight for now
  - Each user: maybe 2 GB RAM?
- Binder open to everyone, but will be locked down soon, require authentication
- Tried GitHub auth, but at Hackathon found users limited to just one session, so switched to Dummy authenticator. Will re-enable GitHub authentication
- Maybe could use repo slug as part of named server parameter so authenticated binderhub users can launch as many repositories as they want, like with unauthenticated BinderHub?
  - jupyterhub/binderhub
BinderHub tech: GPUs, persistence
- How to spread burden more

2023-06-20#

NFDI feedback#

Notes#

Related issue: jupyterhub/team-compass#644
- Includes presentation with some technical implementation ideas
Looking for experiences with other big consortia
Experiences with setting up governances
And emotional support :D
10 consortia all got funded for something-related to JupyterHub
- numerical solutions
- biology
National infrastructure, there should be synergy between the domains
- Base services needed by everyone, e.g., authentication, low-level hosting
Jupyter-NFDI
- 10 different institutions, need a synergetic agenda for that
Federation between different JupyterHubs
- Some on k8s, some on bare metal (BatchSpawner)
- Develop a backend spawner to link these together with web API (?)
Difficult problem (knowledge)
- Participants have first-time contact with GitHub (eg)
- Judging what kind of code is reliable
- Time limit
Arnim would like to see:
- governance structure that can grow and get more stable
- multiple hubs come together on a tech stack ‘approved’ by the jupyter community
- Not go for spawner with multiple targets, but rather developers access to shared resources (?), e.g. one central Hub
- Issues:
  - Researchers need to run code near data to avoid egress cost / data access issues, so a central hub must be able to launch notebook servers near data (multicluster spawner, RemoteSpawner, etc.)

GitLab/GitHub container registry pushing#

Notes#

Improve current logic to allow people to push built image to the registry associated with the repo
At the moment, it is a static string referring to a specific container registry
Dynamic URL related to the repo
Need user authentication in order to extract information on what repos/registries users can push back to
Why is BinderHub involved at all?
- Use repo2docker in the native CI provider
- Can trigger on push events
- It’s faster, and will be properly authenticated
Creating a hook to get the image slug, versus a fixed string, is a reasonable upstream extension
Small number of usecases

Authentication#

Notes#

Two separate topics

Authenticators typically allow all users by default but some shouldn’t, can we provide a common config name suggestion and behavior to toggle this? - github issue
Existing JupyterHub users allowed or not depending on if Authenticator.allowed_users is configured or not - a surprise to avoid? - github issue
Conclusion:
- Erik will describe config that is believed to resolve issues from a user perspective, and Min and Erik will discuss from there.

2023-05-16#

repo2jupyterlite#

Description: New tool for building jupyterlite from a repo, like repo2docker / binder
Participating: Wayne, Andreas, Raniere, Simon,
Assigned to room number: Breakout room 1 (Participants pane)

Notes#

Please keep notes of the discussion here
could have
ran into compatibility issues with micromamba
where do builds run? Could binder run repo2jupyterlite?

Maintenance#

Description: A room dedicated to maintenance tasks, such as issue triage and PR review.
Participating: Min, Erik, Mridul, Sign up here!
Assigned to room number: Main room (welcoming duties)

Notes#

Continue developing guidelines for how this room operates as per: jupyterhub/team-compass#617
Working on triage in the littlest jupyterhub
Releasing jupyterhub-traefik-proxy 1.0 jupyterhub/traefik-proxy#205
Make sure to publish

2023-04-18#

Notes#

binderhub service
- 2i2c-org/binderhub-service
- allow build & push without launch
- deploys BinderHub itself, not JupyterHub
- looking at auth, storage, etc.
- Question: can you do both authenticated build-only and not authenticated (existing default binderhub)? Exploring. Probably separate BinderHub instances at the moment, at least.
- Use case: research library with authenticated internal use, but public URL for anonymous consumption. Probably two ‘totally separate instances’. Could be addressed with a landing page.
- Questions about UI -
  - One option is that we could look at the CodeOcean environment builder interface. It’s built on top of a JupyterLab layer: https://help.codeocean.com/en/articles/1374271-configuring-your-computational-environment-an-overview
- Question about storage - r2d usually builds with repo contents in home directory; persisting this dir would omit repo contents. repo2docker has REPO_DIR build arg to control this.
- There is conflict between the experience of persistent user dat and updating the execution environment. mybinder.org tends to conflate both. Theen there’s the nbgitpuller approach, where compute environment and content are separated.
- Question about what to launch. just JupyterLab, or also RStudio, etc.
- Erik tried writing notes separately from this section by mistake, this were the notes:
  - Intro to binderhub-service
  - mridul: when to deploy binderhub-service vs binderhub, and does it make sense to use binderhub-service when it comes to having a authenticated binderhub side by side to a non-authenticated binderhub?
  - erik: I’m not sure, but I believe that binderhub-service efforts should be to help handling the situations when you have an existing jupyterhub where user’s storage is relevant as well
  - UI development?
    - erik: no UI developed yet, ideas are about making it modular to use a REST API from different webpages, such as jupyterhub’s spawner
    - raniere:
  - UX conversation:
    - mridul: git repo folder, user home folder - mridul sees the binderhub repo mostly as a description of the environment, and not having contents in it
    - arnim: is it only jupyterlab UI or RStudio etc? Are we locking into using only one UI?
      - erik: both options
    - chris: arnim what was the mode of usage for persistent binderhub?
      - tendency towards importing repositories
      - an idea about “projects”
      - it was dependent on user stories
      - Mridul: this is how it was mounted: gesiscss/persistent_binderhub
NFDI-Jupyter proposal (#2)
- A proposal towards nationally available JupyterHub(s) in Germany
- Bernd: NFDI (German Data Infrastructure initiative) receives significant funding, 90M EUR/y, spread out over 28 diciplines
  - A topic of relevance, with sections, involving a research software engineering group
  - Many independently deployed JupyterHub across german research institutes. A goal to consolidate technical and human resources.
  - A supercomputing centre in germany has significant experience on deploying JupyterHubs.
  - CH: There’s a similar project to allow spawning user sessions in multiple clusters. It’s still a prototype, hasn’t been developed in a bit but might be of interest: yuvipanda/jupyterhub-multicluster-kubespawner
    - Two other use-cases where this is often needed
      - one university with many hubs representing different classes.
      - a research community with data split across multiple cloud providers, that wish to move users into one or the other
- Presentation:
  - https://jupyter-jsc.fz-juelich.de
  - https://fz-juelich.sciebo.de/s/HfAdM6umijlyZ0u
- Current BackendSpawner implementation: kreuzert/jupyterhub-backendspawner
- Currently used backend “manager” implementations (using Django Rest Framework):
  - UNICORE Manager: https://gitlab.jsc.fz-juelich.de/jupyterjsc/k8s/images/drf-unicoremgr
  - Kubernetes Manager: https://gitlab.jsc.fz-juelich.de/jupyterjsc/k8s/images/drf-k8smgr
- Jonathan: BackendSpawner makes sense to focus on to become a open-source component
- Follow-ups:
  - @choldgraf is happy to join this effort
- Min:
  - putting the spawners outside the jupyterhub process is something of relevance to do in order to support running jupyterhub with multiple replicas
  - code in: kreuzert/jupyterhub-backendspawner
Binder credits (#1)
- we now have some funds at NumFOCUS (how much?)
- need to do work to disentangle the user capacity from the routing service etc.
- Do we want to use AWS for this?
  - Generally agree yes
  - SG started this but we hit a blocker on AWS container registry / building.
  - Simon has a prototype to make it easier to push to Amazon ECR.
  - So it should now be easier to deploy on AWS

2023-03-21#

Notes#

1/Maintenance
- Eskild:
  - delegate tasks is hard
  - create the issues for others to work own is a maintainence work on its own
  - mark PRs issues that are relevant so that others can help out with these ones
- project maintenance proposals
- having the proposal beforehand for people to braninstorm before
- where to put new proposals?
- Post a maintainence proposal effort first inside a GitHub issue or Discourse for early feedback and then discuss it/propose it in the monthly maintaince meeting.
  - or in the meeting agenda?
  - We already have places to post about meetings, so perhaps the GitHub issue announcing the meeting is the place to put maintenance proposals
2/Persistent binderhub
- James
  - introduces JupyterHub vs BinderHub, clarifies that the binderhub chart deploys binderhub the software, that starts building of images using repo2docker
  - binderhub was opinionated in removing auth and storage for jupyterhub, so if building on top of this to add it back we have quite a bit of additional complexity
  - the idea of
- How to engage more contributors? Does the project need to be in a different space? What needs to happen?

2023 Collaboration Cafe Notes Archive#

2023-11-21#

User-initiated sharing#

mybinder.org AWS deployment#

Notes#

Docs Working Group#

Notes#

Misc releases#

2023-10-17#

User-initiated sharing#

Notes#

Why should HPC folk care about Binder/be interested in it?#

Notes#

JupyterHub-Contrib proposal draft#

Notes#

@jupyterhub/binderhub-client npm package#

2023-09-19#

AWS/Curvenote MyBinder federation member#

Notes#

Lowest level of maintenance for JupyterHub org project’s#

Notes#

Drafting a new Roadmap#

Notes#

2023-07-18#

Project Pythia#

Jupyter Social Media Consultation#

2023-06-20#

NFDI feedback#

Notes#

GitLab/GitHub container registry pushing#

Notes#

Authentication#

Notes#

2023-05-16#

repo2jupyterlite#

Notes#

Maintenance#

Notes#

2023-04-18#

Notes#

2023-03-21#

Notes#

`@jupyterhub/binderhub-client` npm package#