JupyterHub and BinderHub Team Meeting - June#
Date: Thursday, 20th June 2019 at 7am UTC Videoconference link: https://calpoly.zoom.us/my/jupyter
The call will start with a quick round of updates from everyone present (you can pass if you want to), followed by bigger agenda items. There will be no feedback/questions on the 60s updates. Questions/discussion on the larger agenda items will be moderated by a speaker list (see end of this hackmd).
Welcome to the Team Meeting#
If you are joining the team video meeting, sign in below so we know who was here. Roll call:
name | institution | GitHub handle
Tim | Binder | @betatim
Sarah | Turing Institute | @sgibson91
John | Institut de Physique du Globe de Paris | @johnjarmitage
Arnim | GESIS | @arnim
David | KAUST | @davidrpugh
Erik | Sandvik | @consideratio
Kirstie | Turing Institute | @KirstieJane
Min | Simula | @minrk
Yuvi | UC Berkeley | @yuvipanda
60 second updates on things you have been up to, questions you have, or developments you think people should know about. Please add yourself, and if you do not have an update to share, you can pass.
Tim: jupyterhub/team-compass#172 Zenodo integration is live
Erik & Sunita: z2jh autoscaling cost simulator
David: just an introduction to what is being done at KAUST with binder and repo2docker jupyterhub, etc
Yuvi: Voila Gallery - https://voila-gallery.org/. Binder-like experience for pre-selected repositories ‘showing off’ some projects.
Reports and celebrations#
This is a place to make announcements (without a need for discussion). This is also a great place to give shout-outs to contributors! We’ll read through these at the beginning of the meeting.
add item here
Erik: @joshbode (Josh Bode, Australia) - Reports, identifies, and resolves an issue with z2jh + google authenticator!
Erik: @clkao (Chia-liang Kao, Taiwan) - Long lasting positive presence
Erik: @gsemet (Gaetan Semet, France) - Long lasting positive presence!
Chris: @betatim - For keeping an eye on the BinderHub N==2 setup and catching when bugs have popped up
Kirstie: :heartpulse: (I feel like I now need to add hearts to all the things, but I’ll stop at just this one because it feels like such a 24/7 task :scream_cat:)
Chris: @consideRatio - For consistently putting positive things in this agenda item :-)
Chris: we had a workshop last week with a number of people in the HPC world, here’s a place where some conversation happened, and a recap blog post / white-paper will likely come soon: https://discourse.jupyter.org/c/jupyterhub/hpc-meeting-2019
Kirstie: Binder in Elizabeth DuPre’s talk at the Organisation for Human Brain Mapping conference in our Open Science symposium :sparkles:
Features NeuroLibre (https://conp-pcno.github.io) which builds off Binder (should provide a Canada wide BinderHub…but I think it will be restricted to a curated subset of notebooks)
Neurolibre is kinda trying to do something similar to the Voila gallery above I think :smile:
This is what we’ll cover in the meeting (we have about 60m in total). We’ll copy the proposed agenda topics above here just before the meeting.
Add agenda item here [name, estimated time for conversation]
OVH cluster progress [Tim, 5min]
where are we: cluster has been running for a few weeks, some teething trouble with docker registry, otherwise works well.
where are we going: get to a stable running with about 25-35% of of traffic being sent there
what info do people need to get more informed about the current setup? There is a lot that needs documenting where to start?
[Min] here are the docs for what a deploy does. We should update that to include the changes for multiple clusters. It’s also possibly appropriate to have a new higher-lever deployment architecture doc in that repo. Also document how we route traffic to clusters, especially since we are discussing changing this.
[Sarah] Can we document the process of joining the Federation? I think there was some discussion on what you have to do in terms of the BinderHub (i.e. must be the same as mybinder.org, not customised), but do we mention how to connect the cluster to the mybinder.org-deploy repo so everything is deployed from the same place?
expose more health metrics about a hub so the load balancer can make smarter decision about where to send traffic
[Sarah] extend isthehubup-bot?!
Speaking on behalf of Project Binder [Tim]
should we have a system that allows people to speak at a conference/workshop/X on behalf of project binder?
currently there is no system,
this means it is tricky for any individual to make public statements about anything that isn’t a fact because a) the rest of the team might disagree or b) they might not be aware of that part of the project. (“Yes, mybinder.org will still be operating in 12months time!”, “No we aren’t seeking VC funding”, “Binder will add support for GPUs in Q3 of 2019”)
the audience doesn’t know if the person speaking is offering their personal opinion or the position of the project, they might hear conflicting things when they hear different speakers talk about the same thing
if someone is unhappy with what you say it is on you or a one on one disagreement, where as when you represent the team you can always say “this isn’t just my personal opinion, this is what the group thinks, sorry we disagree.”
Tim thinks we should create a system that is lightweight that allows the group to say “yes, X is going to event Y, they are representing us there, we support what they say, and for individuals to then ‘on location’ tell people that they are the official voice of the project”.
reasons why Tim likes it:
no one is caught off guard by things that were said by someone seen as representing the project
we can share slides, sentences, knowledge about how to introduce/explain/promote Binder
we can have a shared list of FAQs to which we have a standard answer
we create recognition/awareness for those going out promoting the project (right now you oculd be doing an amazing job promoting the project but we’d never know unless you tell us)
[name=ch] if anything else, we can also just have a guide for people speaking at conferences that says “these are the kinds of things you can/can’t say, here are some resources, etc”. In general I like the idea.
[name=Kirstie] I like this plan. I think you could go even lighterweight and “just” encourage people to talk about Binder at events. I feel like there are a lot of facts that could be stuck to quite easily without tripping into “check w the team” territory.
So effectively this is a “Yes, and…” response. Make it clear that one way to contribute to Binder is to give talks about it. And if you think you might be straying into “representing the team” territory then here are some guides/processes to capture that talk.
Talking “About Binder” should be encouraged, even/especially by none team members
[name=Kirstie] Do we have a place to share slides? Maybe in advance of speaking? Just to double check if folks want to? I suspect more people would give the talks if they a) knew that would be helpful, b) had some to crib from/reuse, c) had a little safety net for someone to double check before they go on stage :smile:
Several people support having a collection of facts, figures, images, drawings, etc that help evangelise for Binder
A wiki post on discourse with frequently asked questions about mybinder.org direction and plans
[name=Sarah] - Now deploy a BinderHub to Azure with one click! :tada: :radio_button:
[name=Kirstie] :rocket::star2: SO COOL!
How do I get Azure credits?
Can we re-use some of this to “deploy on GKE”?
unsure, step one would be to figure out how big the differences are - but please feel free to try!
Should we talk to Tania about getting a lot of Azure credits to run a part of the federation there? - Kenji Takeda (Tania’s boss) maybe?
maybe a long term thing, sow some seeds that can grow
Is there something similar for JupyterHub (or is this just tljh already)?
Might be TLJH, not sure?
A community workshop (hang out and code!)
Who would like to come?
Yuvi (I won’t even need another visa?!)
[name=Yuvi] Invite Georgiana!
What dates/times (roughly) work for people?
Tim likes end of August and partly at the weekend to help with not having to take off too much time from work
Sarah has booked a month off between Oct/Nov for travelling, going to StanCon around 20th August, RSE Con in Sept. Otherwise free!
Yuvi loves europe enough to go through another Visa process
Venue? Simula/Oslo has been suggested, Turing also has space for such events.
[name=Kirstie] The Turing Way project would love to host Binder/Jupyter. We don’t have dedicated funding for it, and the funding application at the Turing is quite long (the next call will probably not be until September). But if there are other funding pots available then we (I’m lightly volunteering Sarah here :wink:) can book the space and host. - Sarah is happy to organise this if we choose the Turing.
Learn to build bots!
Erik: Chat bots? What kind of bots?
Chris Hench has built a bot to automatically create PRs to the mybinder.org-deploy repo so it can be kept up to date semi-automatically
Next version of BinderHub API - jupyterhub/binderhub#358
One click deploys for things
Template repo for one-click builds on-premise
ACTION ITEM: open a discourse thread, @min - can you do this asks Erik
[name=John] - Using mybinder for a new reproducible journal
First I should introduce myself: I am a Paris based research scientist who works on geomorphology (rivers, erosion) and geodynamics (volcanism, plate tectonics). I like to solve PDEs on a computer and want to revolutionise publishing.
Lindsey Heagy suggested I join in to discuss a new journal idea: We wish to launch a new reproducable journal, based on Jupyter Notebooks (code agnostic), where the ideas, code, data are all open.
This idea comes out of a collection of researchers at Software Underground (https://softwareunderground.org/). We will focus on the subsurface and take submissions in the form of notebooks, where the articles follow a journal structure. The difference being that these articles are exicutable, reproducable, and with mybinder people could even interact with them and edit the code to see the effect of assumptions or differing anaylsis.
I hope to discuss how realistic this is and what support there might be for doing this.
An embrionic repository for this idea is here: softwareunderground/subsurface-journal
[name=CH] You should check out the neurolibre platform - might be of interest to you all for inspiration etc!
MIN: (summary by erik) it would be possible to build a docker image once and never rebuild it, that would make it stay functional for A LONG time in comparison to needing to rebuild it, which mybinder.org does from time to time.
ERIK: regarding keeping things operational, we should probably have a mechanism to consider if it remain functional.
Having a way to define if a repo is functional / runnable
See a discussion on jupyter/repo2docker#93
Documentation on how to prepare your repository https://repo2docker.readthedocs.io/
if this doesn’t answer your question or ideas for improving are super welcome!
eLife reproducible paper:
[name=Tim] What do people think of vanity URLs? For example
tim.vanity.mybinder.orgredirects to mybinder.org/v2/gh/betatim/something?urlPath=blahfoo.ipynb Someone asked about this for Bokeh * [name=Yuvi] -1 for me to put these under mybinder.org. Organizations can have those under their own domains if need be & set up web forwarding. * https://bndr.it - a URL shortener for mybinder.org, binder.pangeo.org and https://notebooks.gesis.org/binder/
Sunita/Erik: Simulation of a autoscaling z2jh setup in order to estimate costs by Sunita.
Ideas on input to the simulation?
Preliminary idea: number of humans wanting to use the hub hourly for a day/week
[name=Tim] Focus on speed, speed, speed, can we make it faster?
started with jupyter/repo2docker#707
for BinderHub: what UI tweaks can we make to create the impression of faster loads
min: profiling of launch times (how much is check-for-build vs pull vs launch vs jupyterhub api calls vs idle waiting) will be useful for directing optimizations. Might need new metrics to make this easier than digging through logs.
min: a pre-pull of popular images onto nodes would improve launch times of popular repos at the expense of new-node launch times. A more complex background-puller could do this continuously instead of at node start.
erik: it would be great to have a overview of the build/load time, how much time is spent on what etc, and what kind of views do the user have during this time?
tim summarized by erik: if we can make it look faster, or let the user experience time to pass faster, that is a way to improve as well
min: forwarding the progress events from jupyterhub to the user log in binder (see jupyterhub progress bar in 1.0) might better communicate that progress is being made.
min: A more incremental progress bar might help. Inspired by Apple’s Messages, kubespawner progress asymptotically approaches 90% with each event because the number of events is not known a priori. We could do something similar.
Erik: Pipfile on repo2docker
Request for help reviewing a PR: jupyter/repo2docker#649