GitHub is (probably) used to house the vast majority of the world’s FOSS* code. But GitHub itself isn’t open-source, is run by a for-profit company (Microsoft) and uses open-source software, including those with copyleft licenses, to train its paid-for AI Copilot. Is this an issue?
Github being run by Microsoft definitely is a risk. However, the benefits currently outweigh not using Github for us (NL eSience Center engineers);
On GitHub we:
have github actions available for free, which allow large mounts of automation (tests, CI, CD: pypi, containers).
can host container images
have github pages to host (static) webpages
If you’re at an institute where there is a private GitLab (common at Dutch universities) with available runners that is a good alternative. At the eScience center we’re too small to support our own gitlab instance/have runners available.
We do generally publish the full source code of releases to Zenodo, so the repository is archived and can be migrated elsewhere.
As for the AI-training: it seems that AI companies don’t care about copyright or licenses in general: this should probably just be resolved in court
have github actions available for free, which allow large mounts of automation (tests, CI, CD: pypi, containers).
can host container images
have github pages to host (static) webpages
We also use GitHub for all of these things. Until there is an alternative that provides these services, I agree with @BSchilperoort that the benefits outweigh the negatives, unfortunately.
I also think it’s important to note that these features all help a lot in writing high quality software & helps us do that more efficiently. Our time budget on projects (or in general) is limited, so being able to write and support more (FOS) software with the same amount of engineer time is very valuable.
Also, our default license at the NL eScience Center is Apache 2.0, which allows for commercial use. So it’s fully within MS’s legal right, or anyone else, to use our software to train an LLM.
Great points both. I think subconsciously I had also done the risk-benefit analysis and concluded that GitHub was too useful to give up - hence most of my code is over there. I think it’s worth keeping on eye on places like Codeberg though as they seem to be rapidly maturing. E.g. I notice you can host static sites, but CI/Actions looks quite faffy.
You can also reduce your reliance on GitHub in some ways, while still taking advantage of what they have to offer. If you host the docs on readthedocs, you’re not as reliant on gh pages, and if you publish releases to Zenodo you can always migrate the repo as needed (and you get a DOI for your code!).
@wschwanghart had some conversations when I came on board TopoToolbox about code hosting, and we came to more or less the same conclusions. The network effects are also really strong: a lot of potential contributors are already on GitHub, and you don’t have to convince them to sign up for another service or learn a new workflow.
The University of Potsdam does have a GitLab instance, but I believe you can only access it if you are affiliated with a German university, which rules that out for any kind of international collaboration.
A good thing about git is that it is naturally distributed and doesn’t depend on a single hosting provider. If Microsoft suddenly didn’t want us to use GitHub anymore it would take like 10 minutes to move our code to another hosting provider.
Of course, we would lose all the conversation in our issues/pull requests and our CI configuration. I would like to make our CI setup depend less on GitHub Actions specific tooling, by moving more of the setup into our build system, for example.
Do you all try to capture and store issue/PR discussions outside of GitHub in any way?
At least at TU Delft it’s at least possible to create accounts for external contributors. However, it does add an extra barrier, as you need a contact person at the university. You even need to have a login to be able to open an issue . Also, gitlab enterprise apparently has the same vendor lock-in problem that Github has (issues etc. are still proprietary code).
In what way do you see it becoming non-viable? Non-viable because MS will end up having to charge for it (and poor FOSS people won’t be able to afford it)? Or non-viable because MS will decide it’s not worth (profitable enough) maintaining? Or another reason? I don’t disagree, just curious what you see the main drivers as being.