GitHub is (probably) used to house the vast majority of the world’s FOSS* code. But GitHub itself isn’t open-source, is run by a for-profit company (Microsoft) and uses open-source software, including those with copyleft licenses, to train its paid-for AI Copilot. Is this an issue?
Github being run by Microsoft definitely is a risk. However, the benefits currently outweigh not using Github for us (NL eSience Center engineers);
On GitHub we:
have github actions available for free, which allow large mounts of automation (tests, CI, CD: pypi, containers).
can host container images
have github pages to host (static) webpages
If you’re at an institute where there is a private GitLab (common at Dutch universities) with available runners that is a good alternative. At the eScience center we’re too small to support our own gitlab instance/have runners available.
We do generally publish the full source code of releases to Zenodo, so the repository is archived and can be migrated elsewhere.
As for the AI-training: it seems that AI companies don’t care about copyright or licenses in general: this should probably just be resolved in court
have github actions available for free, which allow large mounts of automation (tests, CI, CD: pypi, containers).
can host container images
have github pages to host (static) webpages
We also use GitHub for all of these things. Until there is an alternative that provides these services, I agree with @BSchilperoort that the benefits outweigh the negatives, unfortunately.
I also think it’s important to note that these features all help a lot in writing high quality software & helps us do that more efficiently. Our time budget on projects (or in general) is limited, so being able to write and support more (FOS) software with the same amount of engineer time is very valuable.
Also, our default license at the NL eScience Center is Apache 2.0, which allows for commercial use. So it’s fully within MS’s legal right, or anyone else, to use our software to train an LLM.
Great points both. I think subconsciously I had also done the risk-benefit analysis and concluded that GitHub was too useful to give up - hence most of my code is over there. I think it’s worth keeping on eye on places like Codeberg though as they seem to be rapidly maturing. E.g. I notice you can host static sites, but CI/Actions looks quite faffy.
You can also reduce your reliance on GitHub in some ways, while still taking advantage of what they have to offer. If you host the docs on readthedocs, you’re not as reliant on gh pages, and if you publish releases to Zenodo you can always migrate the repo as needed (and you get a DOI for your code!).
@wschwanghart had some conversations when I came on board TopoToolbox about code hosting, and we came to more or less the same conclusions. The network effects are also really strong: a lot of potential contributors are already on GitHub, and you don’t have to convince them to sign up for another service or learn a new workflow.
The University of Potsdam does have a GitLab instance, but I believe you can only access it if you are affiliated with a German university, which rules that out for any kind of international collaboration.
A good thing about git is that it is naturally distributed and doesn’t depend on a single hosting provider. If Microsoft suddenly didn’t want us to use GitHub anymore it would take like 10 minutes to move our code to another hosting provider.
Of course, we would lose all the conversation in our issues/pull requests and our CI configuration. I would like to make our CI setup depend less on GitHub Actions specific tooling, by moving more of the setup into our build system, for example.
Do you all try to capture and store issue/PR discussions outside of GitHub in any way?
At least at TU Delft it’s at least possible to create accounts for external contributors. However, it does add an extra barrier, as you need a contact person at the university. You even need to have a login to be able to open an issue . Also, gitlab enterprise apparently has the same vendor lock-in problem that Github has (issues etc. are still proprietary code).
In what way do you see it becoming non-viable? Non-viable because MS will end up having to charge for it (and poor FOSS people won’t be able to afford it)? Or non-viable because MS will decide it’s not worth (profitable enough) maintaining? Or another reason? I don’t disagree, just curious what you see the main drivers as being.
Seeing the current state of the US I would like to revisit this topic
I’ve had another good look at codeberg, and for some of our projects I think we could definitely move over to codeberg.
Codeberg has some free CI runners, although you will need to request access. And there are stricter time/resource limits than on GitHub. But for simple CI it should be OK.
Codeberg pages sadly isn’t in a good state at the moment. Hopefully this can improve soon. However, if you want to host docs readthedocs is still a good free option.
Codeberg does support uploading packages! Although by default you’re limited to 1.5 GiB of resources.
I have now mostly focused on hosting for container images, as we would like to have to accessible somewhere long term, by an entity that’s preferably based in the EU and not a big tech company.
Next time I need to create a new repository I will put it on Codeberg. Once I have some experience I can see if my colleagues at the eScience Center can also be convinced.
Thanks for looking into this more. I think CI runners is one of the biggest barriers, so it will be interesting to know your experience with them. Most (all) of my CI workflows are “simple”, so maybe Codeberg is already suitable for me.
I do use GitHub Pages though. I’ve been looking at Read the Docs for docs (mainly for easier version controlled docs), but need a solution for other static websites (e.g. this one you might be familiar with - https://eurocsdmswork.shop/).
I suppose the more people that make the jump the more likelihood of things like Codeberg Pages becomes more production ready. I might give myself a policy of “all new projects should be Codeberg unless there’s a really good reason not to”.