Featured left to right are Malinda Dilhara, Danny Dig and Abhiram Bellur who are currently working on groundbreaking research in code development and automation using generative AI with incredible results: 1400% improvement, 39x more effective and 50% more productive.
In the field of software engineering, progress is usually incremental and ongoing. But once in a while, there is a huge leap in technological advancement. That time is now with generative artificial intelligence (AI).
As Danny Dig, founder and executive director of Pervasive Personalized Intelligence (PPI) Center, associate professor at Colorado University Boulder and consultant to JetBrains explains, the academic sector had been paying attention to generative AI (Gen AI) for some time. And then industry started demonstrating how effective it was at improving and automating the generation of images and text, and how sophisticated it was at code analysis. When this happened, software engineers started to embrace it with promising results.
“Immediately, people in my academic field of software engineering research realized that this was no longer incremental progress. It was one of those unique ‘once in a career’ opportunities where we are jumping the curve to a higher level of productivity.”
Gen AI and large language models (LLMs) are rapidly transforming the field of software development. Among others, developers use generative AI to search for code fragments using natural language; generate code, tests, documentation, comments, and commit messages; explain code, bug fixes, summarize recent changes, and more.
Using Generative AI to Change Code with 1400% Improvement
Danny and his research team immediately recognized the potential for improving code development that was more aligned with how a human thinks. Whereas before the task was often cumbersome and repetitive, the benefits of AI integration into the software development process are significantly reducing the time developers spend on repetitive tasks and debugging.
It is for this reason that Danny and his research team started to explore what kinds of tools and supports they could provide to software engineers so that Gen AI assists, not replaces them, to change existing code, while automating the process. They wanted to help industry automate common changes that software engineers make, and figure out how they could make these tools better align with professional programmers in how they write code and how they think about code. More specifically, they wanted to explore:
· How can I improve my code?
· How can tools suggest improvements aligned with how experts think about code?
· How can I make this code better, right, faster, more reliable, perform better.
· How can I use best practices?
· How can I make it more modern?
Because Gen AI has such a large body of codes on which it was trained, it can come up with recommendations that are very well aligned with how the most expert programmers think about changing the code.
Danny’s research group started to apply Gen AI to changing code, making them the first in the field to do so. The name of their project is called “Generative AI Programming Assistant”. Their research is partially funded by PPI members - NEC Laboratories America, Inc. and Intel Labs - and by the National Science Foundation. PPI is an Industry-University Collaborative Research Center.
Generative AI requires huge data sets, and the PPI Center with its connection to industry and government is able to leverage its resources to go beyond the type of work that thought leaders and academics would be able to do on their own.
The results are groundbreaking. “Whereas in the past, the field was accustomed to small, incremental percentage improvements, like 20%,” said Danny, “we are now looking at improvements of 1400%. This is one of those rare moments in one's career where you are no longer doing incremental improvements, but involved in breakthroughs. That’s why I'm so excited.”
Automating Code Changes That Are 39X More Effective
A top concern remains the trustworthiness of the solutions provided by Gen AI. While many solutions resemble the ones produced by expert developers, LLMs are known to produce hallucinations - solutions that seem plausible at first, but are flawed. To help developers trust generative AI solutions, the research being done by Danny and his team is discovering novel approaches that synergistically combine the creative potential of LLMs with the safety of static and dynamic analysis from program transformation systems.
Current results show that their approach is effective. It safely automates code changes and is up to 39x more effective than previous state of the art tools. Moreover, this approach produces results that expert developers trust. The team submitted patches generated by its LLM-powered tools to famous open-source projects whose developers accepted most of its contributions.
In addition, surveys completed with dozens of professional developers reveal that they agree with the recommendations provided by their tools.
All of this demonstrates the usefulness of the team’s novel approach, ushering in a new era where LLMs become effective AI assistants for developers.
Assisting Not Displacing Talent
However, there are some in the industry who are concerned that because generative AI is so well aligned with how expert programmers think about code, it will displace people. Danny disagrees with this. He believes his research and the ongoing developments will make entry-level programmers learn computing and programming faster. And that it's going to make existing professional software developers even more productive. Already, he and his team are seeing 50% improvement in productivity. He likes to think of the technology as a virtual assistant hinting and giving suggestions; that this technology and these innovations will help broaden the field of computing, making it easier for new people to come into the field.
One of the areas that Danny is most excited about is the potential for generative AI to help young talent that isn’t experienced in computer programming and assist them to be more competent. The technology is going to suggest changes in their code that is better aligned with what professional programmers would recommend. Danny stressed, “It’s not only going to tremendously assist existing professional software developers, but it will open up the field of computing, providing young talent and entrepreneurs the tools they need to come up with really cool ideas.” He referenced a smart bra that uses biosensors to detect differences in blood flow to help predict the risk of cancer as one example of how this technology will help future generations.
Societal Responsibility
There are also societal implications to be considered. With all fast moving technology, there is the need to be responsible. One of the biggest challenges being tackled right now on the technical side is sorting out the trustworthiness of the solutions. Although LLMs and Gen AI can create this ‘wow effect’ and be just like what a professional expert would recommend, the results can also be flawed. Users could get a result that seems very plausible and looks very realistic, but isn’t sound.
Those with the technical expertise and years of best practices can make informed decisions, but those without this experience in software engineering won’t be able to. And this is the biggest danger and greatest challenge - that the new generation of entrepreneurs might fully rely on an output created by generative AI that is a hallucination. That there is something subtle about it that just isn’t quite right.
“This is the biggest challenge,” explained Danny. “How can we trust the results produced by large language models and by generative AI? This is one of the biggest technical challenges my research group is working on.”
They are designing their research to automatically detect hallucinations, filtering out what’s not right, fixing it and making it better, or throwing out what isn’t working. “If you want all people to rely on the solutions provided by LLM’s, we have to make them more transparent. We have to filter out automatically the hallucinations and only show to the user the part that’s right,” said Danny.
He explained that experienced developers can figure out what's wrong, but an entry-level person, or computing enthusiast, lacks the expertise and technical maturity to figure out the slight thing that is wrong. Inexperienced users might proceed down the wrong path, because the computer directed them that way.
Leading the Way
The folks at PPI Center are leading the way in leveraging Gen AI tools to generate and change code automatically and more efficiently. All while balancing this with the need to filter out and reduce hallucinations so that the talent of the future has the resources it needs to improve future technology.
The research is described in three papers that will be presented this summer and fall at the ACM flagship conference in Software Engineering (FSE’24) and the top conference in the field of software maintenance and evolution (ICSME’24).
If you’d like to learn more about the PPI Center’s “Generative AI Programming Assistant” research, please view this video or visit its website at https://www.ppicenter.org/.
Comments