JupyterHub and BinderHub Team Meeting - July

Meeting details

Date: Thursday, 18th July 2019 at 5pm UTC Videoconference link: https://calpoly.zoom.us/my/jupyter

The call will start with a quick round of updates from everyone present (you can pass if you want to), followed by bigger agenda items. There will be no feedback/questions on the 60s updates. Questions/discussion on the larger agenda items will be moderated by a speaker list (see end of this hackmd).

Welcome to the Team Meeting

Hello!

If you are joining the team video meeting, sign in below so we know who was here. Roll call:

  • name | institution | GitHub handle

  • Min RK | Simula | @minrk

  • Sarah Gibson | The Alan Turing Institute | @sgibson91

  • Tim Head | Binder | @betatim

  • Georgiana Dolocan | Simula | @GeorgianaElena

  • Kirstie Whitaker | The Alan Turing Institute | @KirstieJane

  • David Pugh | King Abdullah University of Science and Technology (KAUST) | @davidrpugh

  • Chris Holdgraf | UC Berkeley | @choldgraf

  • CL Kao | infuseai | @clkao (late time & mute)

  • David Anthoff | UC Berkeley | @davidanthoff

  • Arnim Bleier | GESIS | @arnim

  • Yuvi Panda | UC Berkeley | @yuvipanda

  • Erik Sundell | Sandvik | @consideRatio

Quick updates

60 second updates on things you have been up to, questions you have, or developments you think people should know about. Please add yourself, and if you do not have an update to share, you can pass.

  • [name=David] I am working on a small side project to combine Dockerfile, environment.yml file, and docker-compose to launch JupyterLab for development but can not figure out how to “properly” activate a conda environment inside Docker container. I think that repo2docker must have solution to this already but not sure where to look in the code base; IIRC there was some discussion in the issue tracker about this but I can not find it ATM. Hoping someone can point me in the right direction.

    • repo2docker uses an ENTRYPOINT that points at a login shell so that all environment variables and initilisation happens.

    • Link to PR XXX (coming soon)

  • [name=Sarah] I’ve banded together a small group of volunteers at the Turing to work on private repo access for the Turing’s BinderHub by saving a token from a GitHub login and parsing this to BinderHub to perform the cloning. This seems like the “correct” way to provide repo access since the security is handled by GitHub and it knows which repos users have access to or not. Hopefully, this will result in a PR with more docs and maybe even an update to the code base :crossed_fingers: Our progress can be tracked in this hackmd. Thanks to everyone who has given comments so far! :sparkles:

    • Tim says: It will be great to have this!

    • @athornton at the Space Telescope(?) runs a JupyterHub which passes on tokens, worth asking what their setup is

Reports and celebrations

This is a place to make announcements (without a need for discussion). This is also a great place to give shout-outs to contributors! We’ll read through these at the beginning of the meeting.

  • [name=Kirstie] Gave the opening keynote at PyData London on Saturday and it inclued a bunch of slides on Binder and how great it is: https://doi.org/10.5281/zenodo.3333760 Thank you for the amazing resource and hard work :sparkling_heart:

    • [name=Kirstie] Also just spent the afternoon working with Bastian Greshake Tzvoras from Open Humans and we’re loving the Jupyter Hub that they use. So some more love to the team from the citizen science folks around the world :earth_africa:

  • [name=Tim] The Henchbot! https://blog.jupyter.org/automating-mybinder-org-dependency-upgrades-in-10-steps-bb5e38542059

  • [name=David] Had my first PR merged into repo2docker! I worked on upgrading conda from 4.6 to 4.7.

  • [name=Chris] There’s a binder federation now! https://blog.jupyter.org/the-international-binder-federation-4f6235c1537e

  • [name=Chris] BinderHub works with Zenodo now! https://blog.jupyter.org/binder-with-zenodo-af68ed6648a6

Agenda items

This is what we’ll cover in the meeting (we have about 60m in total). We’ll copy the proposed agenda topics above here just before the meeting.

  • Add agenda item here [name, estimated time for conversation]

  • In person workshop, any inputs needed? Please note if you are coming at the link by the end of the month.

  • Julia packaging [Tim]

    • asked David Anthoff if he could join our call for a “crash course on Julia”

    • Q&A style

    • repo2docker has some long conversations in issues about packaging things for Julia and as we don’t have a huge amount of expertise on the team in that area -> invite someone who has done work on r2d and Julia

    • please post your questions:

      • is Julia a compiled language like C or a compiled language like Python or neither?

        • Julia is just-in-time compiled, this means it is compiled the first time you run a function

        • this means running a function the second time is super fast, but the first time it takes a while because it has to be compiled

        • making the “first run” faster is something the core Julia team is working on but it is a hard problem

        • repo2docker pre-compiles (generates an intermediate representation) all Julia packages that are installed. This gives a small speed up but you still need to JIT code

        • Note: an issue discussing some of this is here: https://github.com/jupyter/repo2docker/issues/686

        • when you start up Julia you can specify a SysImage (this is a shared library (.dll or .so)). If a function you want to call is in the SysImage it does not need JIT’ing so it is super fast.

        • all of core Julia is in the SysImage

        • you can create your own SysImage and include code from packages of your choice

        • this is great (fast first runs!), can be brittle :-/ and sometimes doesn’t work

      • why does first using (as an example) the plotting libraries take so long? What is happening while we wait? Is this process different in the Jupyter kernel to normally?

      • pre-compiling: yes, no, maybe, what are the options? Seems like there was a lively debate on GitHub about this

      • what makes building a custom SysImage so brittle?

        • creation of the SysImage relies on running the tests of a package and then grab the compiled version of the functions that were executed

      • What does “brittle” mean in this case? :-)

        • for some packages it doesn’t work

      • Is the “SysImage” pattern the future of how this will work, it just needs some dev attention and time? Or is there a chance something totally different will come about?

      • Is there a place that we can link to from the repo2docker docs where discussions about this are happening?

      • How is precompilation handled on www.juliabox.com/?

        • JB is closed source we aren’t super sure

        • Someone from JB commented that they don’t make a custom SysImage

      • Not a Q, but: if you ever wanna write a blog post about how nice reproducibility is in Julia w/ repo2docker etc, please do reach out and we should put it on the jupyter blog :-)

      • Ideas for the way forward:

        • encourage people to use postBuild to make a custom SysImage if they are brave

          • find out from David where to find an example of this

          • create an example in github.com/binder-examples

        • in the future future custom SysImage building tools will get more stable

  • OVH is up and running but needs debugging effort [Tim]

    • sometimes launch success rate is very low

      • Erik - idea seed: store build success for repo commits and compare between OVH / GKE or similar. It is a bad sign of course if things can build on one but not on the other.

    • the federation proxy doesn’t detect this, how to improve?

  • Emit events about user activity in structured fashion from JupyterHub [name=Yuvi]

    • Discrete JSON formatted, schema-verified discrete events with arbitrary data

    • Jupyter-wide design document: https://github.com/jupyter/telemetry/blob/master/proposal/design.md

    • JupyterHub events: https://github.com/jupyterhub/jupyterhub/pull/2542

      • Currently for server starts & stops

      • More to be added

  • Discuss CZI funding proposal (if time)

    • The deadline for this is in two weeks

    • Will need somebody to push on a proposal. I can write it if we think it’s a good idea.

    • Can ask for 50->250K in funding.

    • Most well-thought-out idea is the “binder fellow”: https://github.com/jupyterhub/team-compass/issues/165

    • Is this a good idea? Worth pushing on now? Or should we wait until the next round?

      • Kirstie: I think its really important to apply - even if you’re not successful you’ll have got a plan together than someone else might want to pay for.