ChatGPT Code Refactoring is Super-Hype, Far from Reality

I have worked a lot with trying to get ChatGPT to refactor some simple, working Ruby on Rails lib files (classes).

On all but the most simple code snippets and short methods, ChatGPT fails miserably:

See, for example today when I asked ChatGPT to refactor some prototype code, which works fine but has not been refactored:

Shared Refactoring Attempt with ChatGPT

ChatGPT Attempts to Refactor a Rails Library File (usda.rb) and Fails Repeatedly.

This is not abnormal. This is par-for-the-course for ChatGPT, generating nonsense as a generative AI chatbot who cannot understand "the big picture" when rewriting code.

I'm growing weary of all the ChatGPT and generative AI hype of how AI will replace highly skilled computer scientists and programmers. This is simply marketing hype by business and folks who do not actually develop production code.

As I have tried to explain to many people generative AI is nothing but a text completion engine based on a large language model, and as far a refactoring code goes, it's very primitive and seriously error prone.

I would call it "form over substance", meaning ChatGPT uses a trained LLM to cobble together code snippets and syntax but ChatGPT does not "understand" what it is doing.

The AI-world, businesses and the media have "super-hyped" ChatGPT and generative AI and the results will not be good for our industry as a whole.

Maybe we should post more experiences about how ChatGPT (can or cannot) refactor code? Businesses need to be made aware of the serious pitfalls of relying on generative AI for code refactoring, in my view.

4 Likes

So we're not quite all redundant yet.

2 Likes

Just finished this YT video on OpenAI, AGI and Q* (one of many I have reviewed this week). Honestly, it seems like OpenAI is desperately grasping at straws for AGI.

This video helps explains / speculates what OpenAI is doing....

In a nutshell....

Generative AI projects based on LLMs like ChatGPT are expensive autocompletion engines, which produce a lot of text based on the probabilities in the underlying models after extensive training.

So, a well known problem is the well established fact that these LLMs like ChatGPT (3.5 in this case) cannot perform simple math because LLMs do not really "do math", they autocomplete text based on very large and expensive language models.

To address this basic problem with LLMs and generative AI, OpenAI (from the Q* leak mentioned in the video above) is testing / implementing a supervisory layer on top of the LLM; whereas the LLM generates a lot of text, somewhat like a kind of randomish-hallucinating text generator based on LLMs (the noise) and the OpenAI Q* supervisory layer will process all the generated gobbledygook and select the best fit.

So, it seems that OpenAI engineers have accepted that LLMs hallucinate (made up) gobbledygook, and they believe that it is a step toward AGI to have LLMs generate gobbledygook and select the best gobble.

Then, on top of this "select the best gobbledygook" model, OpenAI engineers have tried to convince us that they are the world's leader in AI and AGI because they are tuning a "select the best gobbledygook" model, and we are all supposed to be concerned this "select the best gobbledygook" model is the future of AGI and we should all be afraid of AGI doomsday, somehow.

However, in my mind, what we should be concerned about is the fact that their are ML engineers at OpenAI and elsewhere who believe that a step toward AGI is simply a "select the best gobbledygook" computing model. They refer to this as "thinking"; but in my view, it is not "thinking" at all (not even close), it's just selecting the most probable signal (the gobble) from LLM noise output (the gobbledygook).

This is not really "intelligent" in my view (does anyone actually think it is?), it's "desperation".

However, if Microsoft and OpenAI can convince the world that their selection algorithm, which selects a single gobble from myriad generated LLM noise replies (the gobbledygook) is AGI, then they can dominate the market.

Of course, all of this is speculation on my part at this point, but the recent Sam Altman soap opera drama and the leak of the Q* "best gobbledygook selector" seems suspect to me.

More on this in another post later.

Gobble Gobble, You Turkeys :slight_smile:

Happy Thanksgiving!

3 Likes

I've read or watched several good articles and talks about AI generated code, and it is dreadful. 75% of the code that LLMs are trained on contain security vulnerabilities (20% contain serious ones). So, by default, the code they produce also contains security vulnerabilities. The absurd thing is that you can ask for the code again but without SQL injection this time please. And it will, but it'll still contain race conditions. And you can ask for it again but without race conditions, and it may well exclude one race condition but leave another. What is the point of having random code produced that you then have to spend ages analysing and fixing?

One of the first examples I saw included a line like:

username = "user"
password = "secret password"

Thus implying that it's OK to put passwords in the code. I wouldn't call that setting a good example. In another example that had a line like that (except that it was a key not a password), it didn't take into account the fact that the API it was generated code for required the key to be a string of hexadecimal digits, not arbitrary text. The generated code should have included the use of a key derivation function and hex formatting, but it didn't. The result is that the API would see the key as being a string of length zero (counting only the non-existent leading hex digits). So it was completely insecure. That wasn't the only flaw, but it was the most outrageous. Admittedly, the API is flawed as well for accepting a useless key without error.

Most depressingly perhaps, since ChatGPT was launched, page hits on stackoverflow and the like started plumetting, which seems to indicate that instead of people asking other people questions and getting actual answers that are also analysed and possibly improved with the help of other people, they're getting something that just looks like an answer but isn't. And scarily, people often have a tendency to believe what computers tell them. And since ChatGPT is trained on the internet, there might be fewer and fewer new questions and answers to train on, so it can only provide code that looks like answers for things that have already been written (and hence shouldn't need to be written again anyway).

I've tried to think of situations where it might be useful, like when someone is new to an API and doesn't know where to start, but even then, I don't think it's helpful. You'd still need to carefully check that the generated code isn't a disaster which requires reading all the API's documentation, so there's no real speed advantage to the generated code.

Anyway, programming is far too much fun to outsource to a computer. :slight_smile:

4 Likes

The caveat is the load of crap on the internet, and so learning from the internet means also learning rubbish... and therefore you can produce more than an erroneous answer/solution... Progress will come with the availability of engines capable of filtering correctly what can be considered as good information giving good approximation of correctness, in order for AI chat stuff to learn and give better answers, for that knowledgeable humans are required, this reminds me of tests we did with plant determination years ago, even at 99.2% we did have specimen that were not, when checked by the human reference, who knowing the plant can find what makes it not the species, we had to go to our microscope to see the specialist was correct... But sure it will be of great help e.g. in medical imagery, in finding out on scanner/RMI images the little thing that the radiologist may not have noticed, often because more focused on something else...

4 Likes