Code Refactoring Research Background
Dorin Pomian, a computer science expert, wanted to further his knowledge of code quality improvements through automatic software refactoring tools. He explored his options, and reached out to a leader in the field, Danny Dig, who is the director of the PPI Center and an associate professor of computer science at the University of Colorado (CU) in Boulder.
Dorin, a graduate of West University of Timisoara in Romania, had spent 10+ years working as a software engineer in France for a vast number of industries, including automotive, telecom and travel, but he wanted to further his education in what he saw as a crucial area in his field – code refactoring.
In computer programming and software design, code refactoring is the process of restructuring existing computer code - changing the factoring - without changing its external behavior. Refactoring is intended to improve the design, structure, and/or implementation of the software, while preserving its functionality.
“This process of making source code more maintainable is close to my heart,” said Dorin, who graduated with his Master’s of Science in Computer Science from CU Boulder in December 2023. “It has lots of ramifications for industry and it’s why I focused my research on automating the process so that software developers can be more productive in other critical areas.”
The name of Dorin’s research project, which was sponsored by the PPI Center with funding from the National Science Foundation, is called: “LLMs and IDE Static Analysis for Extract Method Refactoring”.
LLM stands for Large Language Models, which are machine-learning models that comprehend and generate human language text. They work by analyzing massive data sets of language.
IDE is an acronym for Integrated Development Environment – a software application that helps programmers develop software code efficiently. It increases developer productivity by integrating capabilities such as software editing, building, testing and packaging in an easy-to-use application.
LLMs Prove Effective AI Assistants for Software Developers
According to Dorin, excessively long methods that encapsulate multiple responsibilities within a single method are challenging to comprehend, debug, reuse, and maintain. Thus, in his graduate research, he develops a process that shows how advancing the science and practice of refactoring by augmenting the refactoring capabilities of IDEs with the power of LLMs to perform Extract Method (EM) will streamline the process.
His formative study on 2,849 EM scenarios revealed that LLMs are very effective in giving expert suggestions, yet they are unreliable. Up to 62.3% of the suggestions are hallucinations!
To overcome this, Dorin, under the supervision of his advisor, Danny Dig, used a novel approach that synergistically combines the creative potential of LLMs with static analysis to enhance the refactoring suggestions generated by LLMs. Additionally, his process utilizes the safety measures of static analysis within IDEs to execute refactoring safely.
Using candidates suggested by LLMs, Dorin’s team filtered, further enhanced, and then ranked suggestions based on static analysis techniques from program slicing. They designed, implemented, and evaluated this approach in an IntelliJ IDEA plugin called EM-Assist. The team empirically evaluated EM-Assist on a diverse, publicly available corpus that had been used successfully in the past by other researchers.
The results showed that EM-Assist outperforms previous state-of- the-art tools. EM-Assist also suggests the correct refactoring among its top 5 suggestions 60.6% of times, compared to 54.2% reported by existing ML models, and 52.2% reported by existing static analysis tools.
When the PPI research team replicated 2,849 actual EM instances from open-source projects, EM-Assist’s recall rate was 42.1% compared to 6.5% for its peers.
In addition, the team conducted firehouse surveys suggesting refactoring recommendations on recent commits with 20 industrial developers. 81.3% of the respondents agreed with the recommendations provided by EM-Assist. This shows the usefulness of Dorin’s approach, and ushers in a new era of refactoring when LLMs become effective AI assistants for developers.
Industrial Applications
Since his graduation from CU in December 2023, Dorin Pomian has been looking for ways to apply his automation process and knowledge of enhanced refactoring techniques to industry.
One application is a free software tool he designed. It can be downloaded and installed for free from JetBrains’ Marketplace. JetBrains is an international software development company, and is a leader in developing tools for software engineers, data scientists, and project managers. Danny Dig is currently on sabbatical at the company.
According to Danny, “Many are wondering whether Generative AI and LLMs will displace software developers. I think Dorin’s research shows exactly the opposite: the synergy of LLMs and state of the art IDEs and human developer in the loop is greater than any of these in isolation. In this new era, developers who use Generative AI assistants would be even more productive than before. We are fortunate at the PPI Center to have wonderful students like Dorin who are leading groundbreaking research in this new era of software development. The company that hires Dorin is lucky to have one of the pioneers in this field, helping them transition to the new era of GenAI-powered developer productivity.”
The link to install Dorin’s software is available here: https://plugins.jetbrains.com/plugin/23403-llm-powered-extract-method
The tool is released under open-source license on GitHub. The source code, the evaluation data, sample LLM prompts and results, readme and more are available here:
If you’d like to reach out to Dorin Pomian for further information about his research, and its applications, or to bring Dorin onboard your company, he can be reached via LinkedIn or by email: dorin.pomian@gmail.com.
コメント