-
Notifications
You must be signed in to change notification settings - Fork 19
Implementation #198 - Refactor completed by Kiro #199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Conversation
…eline, verify tests
…scalculation. Extremely unobvious.
…ests are passing.
…onal-pattern Refactor/evalq to functional pattern
Pull in extra commit from previous branch
…re causing failures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements a comprehensive refactoring of Tplyr's internal architecture to eliminate evalq() usage and adopt an Extract-Process-Bind (EPB) pattern. The refactoring was completed using Kiro, an AI-assisted development tool, and represents approximately 6-8 hours of automated work plus 2-3 hours of manual cleanup.
Key achievements:
- Refactored 34 internal functions from
evalq()pattern to Extract-Process-Bind pattern - Eliminated environment pollution from temporary variables
- Improved code testability, maintainability, and clarity
- Maintained 100% backward compatibility with no user-facing changes
Reviewed changes
Copilot reviewed 45 out of 46 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| vignettes/denom.Rmd | Added set.seed() calls for reproducible random sampling in examples |
| vignettes/.gitignore | Added Quarto build directory to gitignore |
| tests/testthat/test-sort.R | Wrapped test code in proper test_that() structure and added new sorting tests |
| tests/testthat/test-pop_data.R | Added comprehensive tests for build_header_n() function |
| tests/testthat/test-nested.R | Added new test file for nested count layer functions |
| tests/testthat/test-count.R | Minor formatting fix (closing brace) |
| tests/testthat/_snaps/count.md | Updated error message snapshot to include function name |
| man/Tplyr.Rd | Added package website URL |
| R/zzz.R | Added global variable declaration for tot_fill |
| R/stats.R | Refactored risk difference functions to EPB pattern with detailed documentation |
| R/sort.R | Added @noRd tags to internal functions |
| R/shift.R | Refactored shift layer functions to EPB pattern with comprehensive documentation |
| R/riskdiff.R | Renamed and refactored prep_two_way() to explicit parameter passing |
| R/process_metadata.R | Refactored metadata functions to EPB pattern with improved structure |
| R/prebuild.R | Refactored treatment_group_build() to EPB pattern, removed empty evalq wrapper |
| R/pop_data.R | Refactored build_header_n() and add_total_group() to EPB pattern |
| R/nested.R | Refactored nested count processing to EPB pattern with clear documentation |
| R/layer.R | Replaced evalq() with direct environment binding |
| R/gather_defaults.R | Refactored to use env_get() instead of evalq() |
| R/desc.R | Refactored descriptive statistics layer to EPB pattern |
| R/count.R | Extensively refactored count layer functions to EPB pattern |
| R/collapse_row_labels.R | Removed commented-out browser() call |
| R/assertions.R | Refactored to use env_get() with inheritance |
| NEWS.md | Added internal changes section documenting refactoring |
| NAMESPACE | Added S3 method export for process_metadata.tplyr_statistic |
| .kiro/** | Extensive documentation of refactoring process, requirements, and implementation |
| .Rbuildignore | Added .kiro directory to build ignore list |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This is intentionally going to be a large PR, because I'd like to include a details about the experience of using Kiro to implement the refactoring described in #198.
PR Formalities
The updates to the source code were generally handled by Kiro, but a few notes:
At this point, I'm confident that the refactor is returning equivalent results to the existing development version of Tplyr.
Kiro Experience
My initial testing with Kiro was focused around designing a new proof-of-concept. I've been a fan of Claude Code for a little while now, and I've been using it to bring an idea to life. I know there's a huge ecosystem of tools for Claude Code, and different ways that you can manage planning and implementation. But I just haven't tried it myself.
What stuck out to me about Kiro was Spec mode. If you've been using any sort of AI tools or building anything AI related, you're probably aware that managing context is everything right now. The LLMs take input and generate output, so it becomes your job (or the tools job) to keep that context relevant and manage the window of context that the models can actually digest. Kiro's spec mode works with you to build requirements, create a design, and generate a task list for how the implementation will be conducted. This all fits quite nicely within the UI as well.
Using Kiro on Tplyr
Having used Claude Code and seen how effective these agentic tools have gotten, the potential of what Kiro can do was apparent very quickly. So I embarked on a side quest from my prototype to throw something different at Kiro that would help me stress test it; refactor a large, complicated code base that I wrote to fix design decisions that I regret.
The issue that I was trying to address is summarized in the attached GitHub issue in more detail. To summarize, when Tplyr would process data, we evaluated the functions within the object environments instead of the individual function environments. This was a common pattern throughout the entire code base.
This felt like a good use case for an AI tool based on a few factors:
I think this makes a great challenge for Kiro because the tool needs to first understand the existing architecture, conventions, and problem, and then create and implement a plan. The updates are so widespread that if something went sideways, it would be easy to have to trash everything and start over. The code base is very interconnected, so it would be easy to get lost.
If I were to try to do this update myself, I'd estimate that it would take me about 2 dedicated weeks to get everything completed. Given the complexity of the code base, if a new developer that's never worked on Tplyr came on board and tried to do it, I think it could easily be a month.
Watching Kiro implement
The first thing I asked Kiro to do was analyze the code base and document conventions. Seeing how Kiro plans things out, this felt like a good approach to help manage context about Tplyr up front, so when it started implementing there was already context set aside to provide to the model.
From there, based on my request, Kiro started designing requirements. Here's a little snippet:
You can find the full set of documents that Kiro wrote under .kiro/specs/tplyr_refactor - a lot more context was set aside. But based on my problem description, Kiro wrote quite thorough documentation around the requirements of the refactor (and separately, functional requirements), and used that to build a design for the refactor, and ultimately and a task list for implementation.
One of the things that truly impressed me about this process was how smart the implementation approach was. The task list Kiro created was about 30 steps. Here's an example of what those steps look like:
I made it clear up front that parity with the current code base was critical, so it made sure to measure the current state of the code base before hand. Instead of making a wide swath of updates immediately, it took things incrementally and focused on one part at a time. After each change, it would rerun tests and make sure things were functioning properly before moving on.
One extremely interesting part of this process was when Kiro was trying to update Tplyr's count layers. This is realistically the most complex part of the code base - specifically nested count layers. After Kiro updated the general count layer processing, a chunk of tests started failing. Instead of fixating on resolving the problem right then and there, Kiro figured out the source of the error, but noted that the culprit function was going to be updated later on in another task. It took note of the failures and let them go for the time being. That's an impressive level or strategic thinking - something a lot developers, including myself, can get stuck on.
Results
By the time Kiro was done, the entire code base had been refactored and about 98% of the existing tests were passing. Overall, that's impressive. But let's dig into the failing tests. One of the failing tests was just a the capture of a snapshot error that changed slightly, so this was a non issue. Everything else was essentially cases of variable inheritance within teh objects. Recall the the issue being addressed was that functions were executing within the Tplyr environments. The nice part of this approach was that if a value wasn't present in the layer, it would automatically pull from the parent environment of the table. All I had to manually change to fix this was specify to inherit values when the value was pulled from the layer environments. I would also say this was a reasonable failure on the part of the AI, because the errors this created were unexpected, pretty difficult to trace, and not exactly clear from the package design either.
A Couple Pitfalls
There were a few small issues that I did run into while Kiro was going through its implementation. These were all realistically trivial. A couple times, Kiro would hang and it would be hard to tell what happened. Most of the time this was because there was some sort of interactive thing happening in the terminal. For example, Kiro ran a git command that triggered paging within the terminal window, so it needed me to scroll the window down to the let the command return. Fortunately there's a way to prevent this. Kiro let's you create steering rules to control how the agent goes about doing its tasks. Once I noticed this was happening, I asked it to create a rule to be cautious of the terminal commands being executed and mitigate the possibility of it requiring my interaction.
There were a few more circumstances where the agent started hanging as well and I couldn't find anyplace it was waiting for my interaction. For this, I had to hit cancel on the agencts execution. It recovered and moved on. I'm not sure the source of these issues, but Kiro wrote code and that code as executing, so it could be anything. Ultimately, not a large issue.
Lastly, one thing I admittedly didn't like is that Kiro also added a number of unit tests. In some ways, this is good - this was a substantial change and the tests were aimed at verifying those changes. That said, some of the tests were failing and the failures were false positives. Either Kiro had the wrong idea, or in some cases the unit test file had some changes to the global environment that had trickle effects. Another lesson learned from Tplyr - having good tests is more important than having a lot of tests.
Conclusion
Going back to the beginning, I estimated this task would've taken me - as one of the core developers of the package - about 2 weeks (80-100 hours) of development time. Using Kiro, the planning and updates took about 6-8 hours. The clean-up on my end after the fact took maybe 2-3 hours. Keep in mind that the 6-8 hours spent by Kiro also didn't require my full attention. My dedicated focus was easily half of that. There was a substantial part of that time that I wasn't even at my computer.
I'll note that no one asked me to write this, to test this, or to endorse Kiro. In my current position, I don't have a lot of time code like I used to. That said, I have no shortage of hopes and dreams. Tools like this give me freedom to bring some of those dreams to life. Possibly more importantly, they can help me articulate what I want to do and communicate that with my team.
Long story short, Kiro has blown me a way.