2023 Collaboration Cafe Notes Archive#

2023-11-21#

User-initiated sharing#

  • Description: Min’s still working on user-initiated sharing in jupyterhub/jupyterhub#4594. Lots of useful feedback last time, but ran into some new snags and would love to hash them out if folks are interested.

mybinder.org AWS deployment#

Notes#

  • Min will commit a terraform.lock for the GCP deployment so that it’s reproducible

  • Clean up config

  • Add to federation, send a small amount of traffic e.g. 1%, then ramp up if not problems

  • Blog posts: Steve from Curvenote will write one, Simon will write a separate technical post

Docs Working Group#

A new Jupyter Docs Working Group is in the works, we need your comments and feedback!

Notes#

Misc releases#

  • oauthenticator release will be made soon

  • kubespawner release will be made soon

  • multiauthenticator

  • batchspawner release needed for jupyterhub compatibility

2023-10-17#

User-initiated sharing#

  • Description: Implementing user-intiated sharing, there are some things to work out in the design. Min has done some partial implementations, but switched to documenting how it should work before finishing any implementation. Draft PR here.

Notes#

  • don’t get too complex for first draft

  • invite coes may solve discoverability

  • limit user:user grant to read access for read:user:name (separate share-with scope later, if needed)

  • make it opt-in

Why should HPC folk care about Binder/be interested in it?#

  • Description: Sarah has been asked to present on Binder to a HPC workshop. I can give the “reproducibility with Binder” bit, but I’m looking for angles to make this interesting for the HPC audience. I think there’s a “Cloud vs HPC” culture and a “sysadmin won’t let me run docker on the cluster” pushback to navigate. So I’m thinking about how to frame this as “so this exact tech stack doesn’t fit your needs, but here’s some lessons learned”. Also, I think the workshop is for a tool that helps with reproducible workflows on HPC, so I don’t think they’ll actually use Binder/repo2docker at all (though I don’t know much about the tool) https://hpc.guix.info/events/2023/workshop/ Drafting abstract and talk outline: https://docs.google.com/document/d/10LnhlWXi_ZSNHUphT5-CiSlSb7tssT5_uMh9bRX4Mo0/edit?usp=sharing

Notes#

JupyterHub-Contrib proposal draft#

Notes#

  • S: How do we possibly use template repo to share cultural project expectations?

    • Min: This is in the team-compass stuff, need more things written there than we have now

Possibly a template for the jupyterhub org but a cookie-cutter for the contrib org?

@jupyterhub/binderhub-client npm package#

  • Description: Yuvi is trying to clean up the frontend JS of binderhub, and part of that is maintaining a JS package that can be used by various clients that need to talk to the binderhub API. It’s at jupyterhub/binderhub and has seen a lot of work recently. No specific ask, just a report that this is happening

2023-09-19#

AWS/Curvenote MyBinder federation member#

Notes#

Lowest level of maintenance for JupyterHub org project’s#

  • Description: In github.com/jupyterhub has a lot of projects, if possible I’d (@consideRatio) love us to manage to ensure a some basic level of maintenance in all of the projects - or, deprecate/archive them etc.

Notes#

  • We have lots of repos, concerns about level of support each repo gets.

  • Would like some clear expectations about what level of maintenance we are promising for being an “official jupyterhub repo”

  • Sarah is in touch with an experienced leader, Sumana, might be available to for discussion on open source leadership. Sarah is meeting next week.

  • Clarify expectations of what level of support comes with being under “jupyterhub” name

  • There is a proposal for jupyterhub-contrib which doesn’t indicate support in the same way

    • Could have a low bar top accepting repositories

  • Being included in a distribution (z2jh, tljh) also raises the bar for expectations

  • Clear expectations is a good thing, and can be contributing to ecosystem in general

  • A while ago we had a discussion about how git ownership/maintainership confers status, and we wanted to move away from that to include people who do community work and other tasks not directly related to code

Drafting a new Roadmap#

  • Description: Roadmaps are useful tools to signpost to many audiences about the direction of the project (to newcomers where work is happening and they could be involved, to users about what to expect in future releases, to contributors where to focus their efforts). Ours is a little old now so the proposal is to begin drafting a new one.

  • Proposed background reading: https://docs.oscollective.org/guides/marketing-publicity-roadmaps-and-comms

Below is a suggested template for organising a roadmap - it’s not the sole correct way to think about this! It’s just here as a starting point :)

Let’s think about the release of the z2jh helm chart, and any packages that could be included, as our “unit” here.

#### The next release...
- **Theme:**
- **Theme:**

#### The release after that...
- **Theme:** Security? Have we had any feedback from the security bounty scheme that we could act on?
- **Theme:** Metrics? https://blog.jupyter.org/accurately-counting-daily-weekly-monthly-active-users-on-jupyterhub-6fbec6c6ce2f was a good start. Can we build within the JupyterHub ecosystem that doesn't rely on third-party software such as grafana/kube-metric/prometheus? Can we build something into the Admin panel for better discoverability?
- **Theme:**

#### Later on...
- **Theme:**
- **Theme:**

Notes#

  • Question: what level to roadmap? Org-level? Per-repo? How many repos get one?

  • Org-wide roadmap item could be e.g. triage, organizational maintenance, not tied to releases

  • Roadmap might include things we want to do, but might not get to

  • Stating long-term things we aren’t working on is still useful, and can motivate contributors or funding

  • Separate roadmaps for aspirational (funding), practical (what we have capacity to do short-term)

  • Roadmap - long term items:

    • supporting multiple replicas of jupyterhub

    • moving away from CHP

    • repo2docker reproducibility

  • Roadmap - short term item nominations:

    • establish maintenance expectations on projects in jupyterhub gh org

    • reporting/metrics (jupyterhub-grafana)

    • e.g. “focus on releasing z2jh 4”

  • Org-level, at least, maybe tying to releases is not a good fit. Instead more time-based “short-term/long-term”

  • Roadmaps don’t let us tell volunteer contributors what to spend their time on (but they also kind of do!)

2023-07-18#

Project Pythia#

  • Project Pythia

    • Kevin Tyle

    • Funded to work with public cloud

    • All notebooks developed should work with BinderHub

    • https://cookbooks.projectpythia.org/

    • Pangeo encourages use of cloud stored datasets

    • Educational outreach of Pangeo

    • Soliciting community contributions for “cookbooks”, particular use-cases for workflows

      • Jupyter notebooks published by GitHub actions using JupyterBook

      • Want people to have ability to interact with notebooks- leveraging BinderHub

    • Hosted on JetStream2

      • How can they make instance more performant?

      • Kubernetes recipe from SPSC, based on zero-to-binderhub

      • 4 workers, 60GB RAM

      • Anticipate making larger, adding GPUs, b ut fairly lightweight for now

      • Each user: maybe 2 GB RAM?

    • Binder open to everyone, but will be locked down soon, require authentication

    • Tried GitHub auth, but at Hackathon found users limited to just one session, so switched to Dummy authenticator. Will re-enable GitHub authentication

    • Maybe could use repo slug as part of named server parameter so authenticated binderhub users can launch as many repositories as they want, like with unauthenticated BinderHub?

  • BinderHub tech: GPUs, persistence

    • How to spread burden more

Jupyter Social Media Consultation#

2023-06-20#

NFDI feedback#

Notes#

  • Related issue: jupyterhub/team-compass#644

    • Includes presentation with some technical implementation ideas

  • Looking for experiences with other big consortia

  • Experiences with setting up governances

  • And emotional support :D

  • 10 consortia all got funded for something-related to JupyterHub

    • numerical solutions

    • biology

  • National infrastructure, there should be synergy between the domains

    • Base services needed by everyone, e.g., authentication, low-level hosting

  • Jupyter-NFDI

    • 10 different institutions, need a synergetic agenda for that

  • Federation between different JupyterHubs

    • Some on k8s, some on bare metal (BatchSpawner)

    • Develop a backend spawner to link these together with web API (?)

  • Difficult problem (knowledge)

    • Participants have first-time contact with GitHub (eg)

    • Judging what kind of code is reliable

    • Time limit

  • Arnim would like to see:

    • governance structure that can grow and get more stable

    • multiple hubs come together on a tech stack ‘approved’ by the jupyter community

    • Not go for spawner with multiple targets, but rather developers access to shared resources (?), e.g. one central Hub

    • Issues:

      • Researchers need to run code near data to avoid egress cost / data access issues, so a central hub must be able to launch notebook servers near data (multicluster spawner, RemoteSpawner, etc.)

GitLab/GitHub container registry pushing#

Notes#

  • Improve current logic to allow people to push built image to the registry associated with the repo

  • At the moment, it is a static string referring to a specific container registry

  • Dynamic URL related to the repo

  • Need user authentication in order to extract information on what repos/registries users can push back to

  • Why is BinderHub involved at all?

    • Use repo2docker in the native CI provider

    • Can trigger on push events

    • It’s faster, and will be properly authenticated

  • Creating a hook to get the image slug, versus a fixed string, is a reasonable upstream extension

  • Small number of usecases

Authentication#

Notes#

Two separate topics

  • Authenticators typically allow all users by default but some shouldn’t, can we provide a common config name suggestion and behavior to toggle this? - github issue

  • Existing JupyterHub users allowed or not depending on if Authenticator.allowed_users is configured or not - a surprise to avoid? - github issue

  • Conclusion:

    • Erik will describe config that is believed to resolve issues from a user perspective, and Min and Erik will discuss from there.

2023-05-16#

repo2jupyterlite#

  • Description: New tool for building jupyterlite from a repo, like repo2docker / binder

  • Participating: Wayne, Andreas, Raniere, Simon,

  • Assigned to room number: Breakout room 1 (Participants pane)

Notes#

  • Please keep notes of the discussion here

  • could have

  • ran into compatibility issues with micromamba

  • where do builds run? Could binder run repo2jupyterlite?

Maintenance#

  • Description: A room dedicated to maintenance tasks, such as issue triage and PR review.

  • Participating: Min, Erik, Mridul, Sign up here!

  • Assigned to room number: Main room (welcoming duties)

Notes#

2023-04-18#

Notes#

  • binderhub service

    • 2i2c-org/binderhub-service

    • allow build & push without launch

    • deploys BinderHub itself, not JupyterHub

    • looking at auth, storage, etc.

    • Question: can you do both authenticated build-only and not authenticated (existing default binderhub)? Exploring. Probably separate BinderHub instances at the moment, at least.

    • Use case: research library with authenticated internal use, but public URL for anonymous consumption. Probably two ‘totally separate instances’. Could be addressed with a landing page.

    • Questions about UI -

    • Question about storage - r2d usually builds with repo contents in home directory; persisting this dir would omit repo contents. repo2docker has REPO_DIR build arg to control this.

    • There is conflict between the experience of persistent user dat and updating the execution environment. mybinder.org tends to conflate both. Theen there’s the nbgitpuller approach, where compute environment and content are separated.

    • Question about what to launch. just JupyterLab, or also RStudio, etc.

    • Erik tried writing notes separately from this section by mistake, this were the notes:

      • Intro to binderhub-service

      • mridul: when to deploy binderhub-service vs binderhub, and does it make sense to use binderhub-service when it comes to having a authenticated binderhub side by side to a non-authenticated binderhub?

      • erik: I’m not sure, but I believe that binderhub-service efforts should be to help handling the situations when you have an existing jupyterhub where user’s storage is relevant as well

      • UI development?

        • erik: no UI developed yet, ideas are about making it modular to use a REST API from different webpages, such as jupyterhub’s spawner

        • raniere:

      • UX conversation:

        • mridul: git repo folder, user home folder - mridul sees the binderhub repo mostly as a description of the environment, and not having contents in it

        • arnim: is it only jupyterlab UI or RStudio etc? Are we locking into using only one UI?

          • erik: both options

        • chris: arnim what was the mode of usage for persistent binderhub?

          • tendency towards importing repositories

          • an idea about “projects”

          • it was dependent on user stories

          • Mridul: this is how it was mounted: gesiscss/persistent_binderhub

  • NFDI-Jupyter proposal (#2)

    • A proposal towards nationally available JupyterHub(s) in Germany

    • Bernd: NFDI (German Data Infrastructure initiative) receives significant funding, 90M EUR/y, spread out over 28 diciplines

      • A topic of relevance, with sections, involving a research software engineering group

      • Many independently deployed JupyterHub across german research institutes. A goal to consolidate technical and human resources.

      • A supercomputing centre in germany has significant experience on deploying JupyterHubs.

      • CH: There’s a similar project to allow spawning user sessions in multiple clusters. It’s still a prototype, hasn’t been developed in a bit but might be of interest: yuvipanda/jupyterhub-multicluster-kubespawner

        • Two other use-cases where this is often needed

          • one university with many hubs representing different classes.

          • a research community with data split across multiple cloud providers, that wish to move users into one or the other

    • Presentation:

    • Current BackendSpawner implementation: kreuzert/jupyterhub-backendspawner

    • Currently used backend “manager” implementations (using Django Rest Framework):

    • Jonathan: BackendSpawner makes sense to focus on to become a open-source component

    • Follow-ups:

      • @choldgraf is happy to join this effort

    • Min:

      • putting the spawners outside the jupyterhub process is something of relevance to do in order to support running jupyterhub with multiple replicas

      • code in: kreuzert/jupyterhub-backendspawner

  • Binder credits (#1)

    • we now have some funds at NumFOCUS (how much?)

    • need to do work to disentangle the user capacity from the routing service etc.

    • Do we want to use AWS for this?

      • Generally agree yes

      • SG started this but we hit a blocker on AWS container registry / building.

      • Simon has a prototype to make it easier to push to Amazon ECR.

      • So it should now be easier to deploy on AWS

2023-03-21#

Notes#

  • 1/Maintenance

    • Eskild:

      • delegate tasks is hard

      • create the issues for others to work own is a maintainence work on its own

      • mark PRs issues that are relevant so that others can help out with these ones

    • project maintenance proposals

    • having the proposal beforehand for people to braninstorm before

    • where to put new proposals?

    • Post a maintainence proposal effort first inside a GitHub issue or Discourse for early feedback and then discuss it/propose it in the monthly maintaince meeting.

      • or in the meeting agenda?

      • We already have places to post about meetings, so perhaps the GitHub issue announcing the meeting is the place to put maintenance proposals

  • 2/Persistent binderhub

    • James

      • introduces JupyterHub vs BinderHub, clarifies that the binderhub chart deploys binderhub the software, that starts building of images using repo2docker

      • binderhub was opinionated in removing auth and storage for jupyterhub, so if building on top of this to add it back we have quite a bit of additional complexity

      • the idea of

    • How to engage more contributors? Does the project need to be in a different space? What needs to happen?