I thought it would be only a matter of time before someone posted something about Claude Code / <insert name of your agentic AI of choice here>, so here we go..!
We’ve just started a trial of Claude Code in UKCEH and wow, it’s scarily capable. I’ve given it model re-engineering tasks that would have taken days/weeks of my time and it’s done them - accurately, by asking all the right questions and running all the right tests - in a matter of minutes.
I’m not sure how to feel about this. It’s great for productivity, but I can’t help but feel somewhat redundant now, and if this is the new way of work (describing what you want to achieve to Claude rather than coding it yourself), I’m really going to miss actually writing code!
Unfortunately, I feel like I worry about LLM usage a lot these days. I have come to the conclusion that they are not for me (to use), for various more or less personal reasons, but the big question for me is how to address LLM-generated contributions to TopoToolbox.
We got one PR a couple of weeks ago that was largely created with Claude Code. It was a bit of a mess and took a lot of effort for the contributor and me to sort out. As far as I can tell, it implemented everything correctly, but it didn’t really capture our style so well. It had a lot of duplication and some patterns that I really want to minimize in libtopotoolbox (a ton of dynamic memory allocation, for example). It would have been a pain to maintain.
We eventually landed the PR, but the whole thing soured me (and, I think, our contributor) a little on the LLMs. Fortunately, Boris is super committed to TopoToolbox, already contributes a lot, and was pretty understanding about getting his code into a more reasonable shape. Still, I think it is probably not as good as it could have been, and now refactoring it is another entry on my long list of TODOs.
I don’t want to ban LLM-generated contributions outright. I really want us to be welcoming to all kinds of contributors, especially those who aren’t so comfortable with coding and who may reach for the LLM to help them do something scientifically interesting. Probably insisting on code standards is the right thing to do, regardless of the origin of the code, but it is already hard to get people to write tests.
That sounds like a bad experience, and I’m wondering if I’ve just been lucky so far in what I’ve experimented with (the key word being experimented - I haven’t rolled out Claude on anything too consequential).
Specifically for your example, I guess if you had coding standards/conventions that you could point Claude towards, that might have helped the issue, but I’ve got very limited experience in knowing how well it sticks to instructions about things like “don’t dynamically allocate memory”.
I think “the community” (not sure what I mean by that - anyone involved in scientific coding, I guess) needs to do some figuring out of how best to embrace, or not, agentic AI. I see a few groups already developing internal guidance on use of AI for software engineering, so it will be interesting to see this emerge.
I’ve used it to generate the RemoteBMI javascript front end (which worked very well), and have seen the output of others using it. It can have its use, but what I am most worried of is the sheer amount of code people produce with it, and how verbose and messy the code often is.
To be able to produce high quality code with the LLMs you already need to know how to
write high quality code in that language. A novice user will not be able to identify anti-patterns, bad code structure in general, or other problems (incorrect logic/assumptions) in the code. This can lead to the code being fragile, producing unexpected results, and becoming unmaintainable in general.
And of course the LLMs are only able to reproduce code that’s already covered by their training data (e.g., Python REST APIs, JS front ends are ideal use cases). Trying to use it for Julia GPU stuff is useless as they only hallucinate.
At least maintaining or fixing all this LLM generated science code means good job security for me as RSE