2023 Collaboration Cafe Notes Archive#
2023-11-21#
User-initiated sharing#
Description: Min’s still working on user-initiated sharing in jupyterhub/jupyterhub#4594. Lots of useful feedback last time, but ran into some new snags and would love to hash them out if folks are interested.
mybinder.org AWS deployment#
Description: AWS Curvenote deployment is (hopefully) nearly ready
Notes#
Min will commit a terraform.lock for the GCP deployment so that it’s reproducible
Clean up config
Add to federation, send a small amount of traffic e.g. 1%, then ramp up if not problems
Blog posts: Steve from Curvenote will write one, Simon will write a separate technical post
Docs Working Group#
A new Jupyter Docs Working Group is in the works, we need your comments and feedback!
Proposal
Governance/Charter/More (Team compass)
Notes#
JupyterHub python template repo can establish a “JupyterHub like project”: jupyterhub/jupyterhub-python-repo-template
Misc releases#
oauthenticator release will be made soon
kubespawner release will be made soon
multiauthenticator
batchspawner release needed for jupyterhub compatibility
2023-10-17#
User-initiated sharing#
Description: Implementing user-intiated sharing, there are some things to work out in the design. Min has done some partial implementations, but switched to documenting how it should work before finishing any implementation. Draft PR here.
Notes#
don’t get too complex for first draft
invite coes may solve discoverability
limit user:user grant to read access for read:user:name (separate share-with scope later, if needed)
make it opt-in
Why should HPC folk care about Binder/be interested in it?#
Description: Sarah has been asked to present on Binder to a HPC workshop. I can give the “reproducibility with Binder” bit, but I’m looking for angles to make this interesting for the HPC audience. I think there’s a “Cloud vs HPC” culture and a “sysadmin won’t let me run docker on the cluster” pushback to navigate. So I’m thinking about how to frame this as “so this exact tech stack doesn’t fit your needs, but here’s some lessons learned”. Also, I think the workshop is for a tool that helps with reproducible workflows on HPC, so I don’t think they’ll actually use Binder/repo2docker at all (though I don’t know much about the tool) https://hpc.guix.info/events/2023/workshop/ Drafting abstract and talk outline: https://docs.google.com/document/d/10LnhlWXi_ZSNHUphT5-CiSlSb7tssT5_uMh9bRX4Mo0/edit?usp=sharing
Notes#
Questions: What’s the pitch? You should use this tool vs what to learn from this peer project?
-
Is this tech (singularity/apptainer) still being pitched to HPC centres?
Use Guix inside repo2docker? jupyterhub/repo2docker#1048
r2d automates good practices where they exist, getting Guix practices into r2d is within scope! jupyterhub/repo2docker#1048
Meet them where they are: use podman/singularity instead, are they happy to run containers at all? Do they see a need for containers?
What dependencies do you need to install, and what command to execute to run them? https://repo2docker.readthedocs.io/en/latest/specification.html
How do they envision reproducibility working outside of their ecosystem
[Simon] AFAICT Apptainer can’t bulid from a Dockerfile, so either you’d need to convert a Docker image, or convert the Dockerfile to an apptainer definition file https://apptainer.org/docs/user/main/build_a_container.html#building-containers-from-apptainer-definition-files
Host outside HPC the ability to build images, only permit launching images inside HPC, e.g., GESIS work on preventing Binder from launching images
JupyterHub-Contrib proposal draft#
Description: Outcome of meeting with Sumana a few weeks ago, https://hackmd.io/TrI2lZSJS5e3cS7y9wcMfw?both is a very draft proposal about a
jupyterhub-contrib
organization, and how that may work.
Notes#
S: How do we possibly use template repo to share cultural project expectations?
Min: This is in the team-compass stuff, need more things written there than we have now
Possibly a template for the jupyterhub
org but a cookie-cutter
for the contrib org?
@jupyterhub/binderhub-client
npm package#
Description: Yuvi is trying to clean up the frontend JS of binderhub, and part of that is maintaining a JS package that can be used by various clients that need to talk to the binderhub API. It’s at jupyterhub/binderhub and has seen a lot of work recently. No specific ask, just a report that this is happening
2023-09-19#
AWS/Curvenote MyBinder federation member#
Description: New AWS/curvenote mybinder federation member: Steve Purves has donated AWS credits and is keen to become an official member. Feedback welcome on this PR
Related discussion/background: jupyterhub/mybinder.org-deploy#2556
Notes#
Will it work against AWS ECR? Next PR
Deployment of cloud infra is in one PR: jupyterhub/mybinder.org-deploy#2652
The BinderHub chart is deployed with extraConfig that defines a new DockerRegistry sub class that takes care of the complexity of working against a AWS ECR container registry (by interacting with a Golang based service Simon has developed).
Curvenote BinderHub chart deployment: jupyterhub/mybinder.org-deploy#2698
Lowest level of maintenance for JupyterHub org project’s#
Description: In github.com/jupyterhub has a lot of projects, if possible I’d (@consideRatio) love us to manage to ensure a some basic level of maintenance in all of the projects - or, deprecate/archive them etc.
Notes#
We have lots of repos, concerns about level of support each repo gets.
Would like some clear expectations about what level of maintenance we are promising for being an “official jupyterhub repo”
Sarah is in touch with an experienced leader, Sumana, might be available to for discussion on open source leadership. Sarah is meeting next week.
Clarify expectations of what level of support comes with being under “jupyterhub” name
There is a proposal for
jupyterhub-contrib
which doesn’t indicate support in the same wayCould have a low bar top accepting repositories
Being included in a distribution (z2jh, tljh) also raises the bar for expectations
Clear expectations is a good thing, and can be contributing to ecosystem in general
A while ago we had a discussion about how git ownership/maintainership confers status, and we wanted to move away from that to include people who do community work and other tasks not directly related to code
Drafting a new Roadmap#
Description: Roadmaps are useful tools to signpost to many audiences about the direction of the project (to newcomers where work is happening and they could be involved, to users about what to expect in future releases, to contributors where to focus their efforts). Ours is a little old now so the proposal is to begin drafting a new one.
Proposed background reading: https://docs.oscollective.org/guides/marketing-publicity-roadmaps-and-comms
Below is a suggested template for organising a roadmap - it’s not the sole correct way to think about this! It’s just here as a starting point :)
Let’s think about the release of the z2jh helm chart, and any packages that could be included, as our “unit” here.
#### The next release...
- **Theme:**
- **Theme:**
#### The release after that...
- **Theme:** Security? Have we had any feedback from the security bounty scheme that we could act on?
- **Theme:** Metrics? https://blog.jupyter.org/accurately-counting-daily-weekly-monthly-active-users-on-jupyterhub-6fbec6c6ce2f was a good start. Can we build within the JupyterHub ecosystem that doesn't rely on third-party software such as grafana/kube-metric/prometheus? Can we build something into the Admin panel for better discoverability?
- **Theme:**
#### Later on...
- **Theme:**
- **Theme:**
Notes#
Question: what level to roadmap? Org-level? Per-repo? How many repos get one?
Org-wide roadmap item could be e.g. triage, organizational maintenance, not tied to releases
Roadmap might include things we want to do, but might not get to
Stating long-term things we aren’t working on is still useful, and can motivate contributors or funding
Separate roadmaps for aspirational (funding), practical (what we have capacity to do short-term)
Roadmap - long term items:
supporting multiple replicas of jupyterhub
moving away from CHP
repo2docker reproducibility
Roadmap - short term item nominations:
establish maintenance expectations on projects in jupyterhub gh org
reporting/metrics (jupyterhub-grafana)
e.g. “focus on releasing z2jh 4”
Org-level, at least, maybe tying to releases is not a good fit. Instead more time-based “short-term/long-term”
Roadmaps don’t let us tell volunteer contributors what to spend their time on (but they also kind of do!)
2023-07-18#
Project Pythia#
-
Kevin Tyle
Funded to work with public cloud
All notebooks developed should work with BinderHub
Pangeo encourages use of cloud stored datasets
Educational outreach of Pangeo
Soliciting community contributions for “cookbooks”, particular use-cases for workflows
Jupyter notebooks published by GitHub actions using JupyterBook
Want people to have ability to interact with notebooks- leveraging BinderHub
Hosted on JetStream2
How can they make instance more performant?
Kubernetes recipe from SPSC, based on zero-to-binderhub
4 workers, 60GB RAM
Anticipate making larger, adding GPUs, b ut fairly lightweight for now
Each user: maybe 2 GB RAM?
Binder open to everyone, but will be locked down soon, require authentication
Tried GitHub auth, but at Hackathon found users limited to just one session, so switched to Dummy authenticator. Will re-enable GitHub authentication
Maybe could use repo slug as part of named server parameter so authenticated binderhub users can launch as many repositories as they want, like with unauthenticated BinderHub?
BinderHub tech: GPUs, persistence
How to spread burden more
2023-06-20#
NFDI feedback#
Notes#
Related issue: jupyterhub/team-compass#644
Includes presentation with some technical implementation ideas
Looking for experiences with other big consortia
Experiences with setting up governances
And emotional support :D
10 consortia all got funded for something-related to JupyterHub
numerical solutions
biology
National infrastructure, there should be synergy between the domains
Base services needed by everyone, e.g., authentication, low-level hosting
Jupyter-NFDI
10 different institutions, need a synergetic agenda for that
Federation between different JupyterHubs
Some on k8s, some on bare metal (BatchSpawner)
Develop a backend spawner to link these together with web API (?)
Difficult problem (knowledge)
Participants have first-time contact with GitHub (eg)
Judging what kind of code is reliable
Time limit
Arnim would like to see:
governance structure that can grow and get more stable
multiple hubs come together on a tech stack ‘approved’ by the jupyter community
Not go for spawner with multiple targets, but rather developers access to shared resources (?), e.g. one central Hub
Issues:
Researchers need to run code near data to avoid egress cost / data access issues, so a central hub must be able to launch notebook servers near data (multicluster spawner, RemoteSpawner, etc.)
GitLab/GitHub container registry pushing#
Notes#
Improve current logic to allow people to push built image to the registry associated with the repo
At the moment, it is a static string referring to a specific container registry
Dynamic URL related to the repo
Need user authentication in order to extract information on what repos/registries users can push back to
Why is BinderHub involved at all?
Use repo2docker in the native CI provider
Can trigger on push events
It’s faster, and will be properly authenticated
Creating a hook to get the image slug, versus a fixed string, is a reasonable upstream extension
Small number of usecases
Authentication#
Notes#
Two separate topics
Authenticators typically allow all users by default but some shouldn’t, can we provide a common config name suggestion and behavior to toggle this? - github issue
Existing JupyterHub users allowed or not depending on if
Authenticator.allowed_users
is configured or not - a surprise to avoid? - github issueConclusion:
Erik will describe config that is believed to resolve issues from a user perspective, and Min and Erik will discuss from there.
2023-05-16#
repo2jupyterlite#
Description: New tool for building jupyterlite from a repo, like repo2docker / binder
Participating: Wayne, Andreas, Raniere, Simon,
Assigned to room number: Breakout room 1 (Participants pane)
Notes#
Please keep notes of the discussion here
could have
ran into compatibility issues with micromamba
where do builds run? Could binder run repo2jupyterlite?
Maintenance#
Description: A room dedicated to maintenance tasks, such as issue triage and PR review.
Participating: Min, Erik, Mridul, Sign up here!
Assigned to room number: Main room (welcoming duties)
Notes#
Continue developing guidelines for how this room operates as per: jupyterhub/team-compass#617
Working on triage in the littlest jupyterhub
Releasing jupyterhub-traefik-proxy 1.0 jupyterhub/traefik-proxy#205
Make sure to publish
2023-04-18#
Notes#
binderhub service
allow build & push without launch
deploys BinderHub itself, not JupyterHub
looking at auth, storage, etc.
Question: can you do both authenticated build-only and not authenticated (existing default binderhub)? Exploring. Probably separate BinderHub instances at the moment, at least.
Use case: research library with authenticated internal use, but public URL for anonymous consumption. Probably two ‘totally separate instances’. Could be addressed with a landing page.
Questions about UI -
One option is that we could look at the CodeOcean environment builder interface. It’s built on top of a JupyterLab layer: https://help.codeocean.com/en/articles/1374271-configuring-your-computational-environment-an-overview
Question about storage - r2d usually builds with repo contents in home directory; persisting this dir would omit repo contents. repo2docker has REPO_DIR build arg to control this.
There is conflict between the experience of persistent user dat and updating the execution environment. mybinder.org tends to conflate both. Theen there’s the nbgitpuller approach, where compute environment and content are separated.
Question about what to launch. just JupyterLab, or also RStudio, etc.
Erik tried writing notes separately from this section by mistake, this were the notes:
Intro to binderhub-service
mridul: when to deploy binderhub-service vs binderhub, and does it make sense to use binderhub-service when it comes to having a authenticated binderhub side by side to a non-authenticated binderhub?
erik: I’m not sure, but I believe that binderhub-service efforts should be to help handling the situations when you have an existing jupyterhub where user’s storage is relevant as well
UI development?
erik: no UI developed yet, ideas are about making it modular to use a REST API from different webpages, such as jupyterhub’s spawner
raniere:
UX conversation:
mridul: git repo folder, user home folder - mridul sees the binderhub repo mostly as a description of the environment, and not having contents in it
arnim: is it only jupyterlab UI or RStudio etc? Are we locking into using only one UI?
erik: both options
chris: arnim what was the mode of usage for persistent binderhub?
tendency towards importing repositories
an idea about “projects”
it was dependent on user stories
Mridul: this is how it was mounted: gesiscss/persistent_binderhub
NFDI-Jupyter proposal (#2)
A proposal towards nationally available JupyterHub(s) in Germany
Bernd: NFDI (German Data Infrastructure initiative) receives significant funding, 90M EUR/y, spread out over 28 diciplines
A topic of relevance, with sections, involving a research software engineering group
Many independently deployed JupyterHub across german research institutes. A goal to consolidate technical and human resources.
A supercomputing centre in germany has significant experience on deploying JupyterHubs.
CH: There’s a similar project to allow spawning user sessions in multiple clusters. It’s still a prototype, hasn’t been developed in a bit but might be of interest: yuvipanda/jupyterhub-multicluster-kubespawner
Two other use-cases where this is often needed
one university with many hubs representing different classes.
a research community with data split across multiple cloud providers, that wish to move users into one or the other
Presentation:
Current BackendSpawner implementation: kreuzert/jupyterhub-backendspawner
Currently used backend “manager” implementations (using Django Rest Framework):
UNICORE Manager: https://gitlab.jsc.fz-juelich.de/jupyterjsc/k8s/images/drf-unicoremgr
Kubernetes Manager: https://gitlab.jsc.fz-juelich.de/jupyterjsc/k8s/images/drf-k8smgr
Jonathan: BackendSpawner makes sense to focus on to become a open-source component
Follow-ups:
@choldgraf is happy to join this effort
Min:
putting the spawners outside the jupyterhub process is something of relevance to do in order to support running jupyterhub with multiple replicas
code in: kreuzert/jupyterhub-backendspawner
Binder credits (#1)
we now have some funds at NumFOCUS (how much?)
need to do work to disentangle the user capacity from the routing service etc.
Do we want to use AWS for this?
Generally agree yes
SG started this but we hit a blocker on AWS container registry / building.
Simon has a prototype to make it easier to push to Amazon ECR.
So it should now be easier to deploy on AWS
2023-03-21#
Notes#
1/Maintenance
Eskild:
delegate tasks is hard
create the issues for others to work own is a maintainence work on its own
mark PRs issues that are relevant so that others can help out with these ones
project maintenance proposals
having the proposal beforehand for people to braninstorm before
where to put new proposals?
Post a maintainence proposal effort first inside a GitHub issue or Discourse for early feedback and then discuss it/propose it in the monthly maintaince meeting.
or in the meeting agenda?
We already have places to post about meetings, so perhaps the GitHub issue announcing the meeting is the place to put maintenance proposals
2/Persistent binderhub
James
introduces JupyterHub vs BinderHub, clarifies that the binderhub chart deploys binderhub the software, that starts building of images using repo2docker
binderhub was opinionated in removing auth and storage for jupyterhub, so if building on top of this to add it back we have quite a bit of additional complexity
the idea of
How to engage more contributors? Does the project need to be in a different space? What needs to happen?