Large PRs are Bad

  • Comments posted to this topic are about the item Large PRs are Bad

  • I've a bit more of a narrower focus than Microsoft's case. But I make every effort to set up a test area (even if it's a mock area), and instrument the code to display all that it does in a debug mode or even "live".  Not everyone can or will do that.  The problem is that it takes time that not everyone will want to take. I look at it as investment if you have to revisit it multiple times and look at it from the standpoint that should the worst case occur, it will take much longer.

    One of the funniest episodes is a time I wrote something from scratch that had absolutely no bugs. It took me so much longer to be happy. I felt sure I had to have bugs in it but couldn't find any.

     

  • This is a scenario that causes me to lose sleep at night. I'm on a two-person team that's migrating all our source code out of an old, on-prem TFS to GitHub (GH). Part of working in GH is using PRs and reviewing code. Code review is something which has never been done where I work. (In fairness to my current employer and coworkers, I have never been fortunate enough to work anywhere that practiced reviewing code.) I want code review to be a practice we adopt! The way we've done it so far is that people working on the same project will check out the code (TFS terms), then work on it in isolation for months. What developers do is work on that project until they get whatever it is they're working on, done. Then they check all their changes in. Let me tell you there are horror stories I could relate when someone has been working in isolation for 6 months, then check in their code and how it ruins other people's code. It has been my practice, first thing each morning, to do a TFS Get Latest (a Git pull) to get all my colleagues changes. Most of my colleagues will check in their code each day, but those who hold onto it for months have simply wiped-out whole routines, windows, or pages, that I and others have been working on. For those who don't work with Visual Studio (VS) the cause of these disasters was due to how VS tracks what files are in a project and those developers who work in isolation for very long periods of time, before checking in their code changes. VS tracks all the files in a project, so when I added new pages or code files into a project. Then when this one colleague checked in his code changes after months of work in isolation, his changes would wipe out other people's code changes. VS would prompt him with a notice that there were merge conflicts. But he had so many because he would hold onto all his changes for months, that he would be overwhelmed with the huge number of merge conflicts, that he would just think that he wants to keep all his code, therefore he'd wipe out other people's code by keeping the changes he made in the file that VS uses to track all the files in the project. Under those circumstances VS deletes any files that aren't in the file it uses to track all files in the project.

    (I've really got to write this up in a blog.)

    Anyway, my point is that I've witnessed and suffered the pain of people holding onto code changes they've been working on for way too long. The worst thing is we've never done code reviews. I've never done code reviews. I think that when we get to the point of getting all out code into GH, then try to practice proper pull requests patterns, we're going to be in for some huge surprises as well as a lot of resentment. It will mean having to change the way people work, which won't fly well because people have been working the way they have for years. Also, to compound this problem, many software projects have only one person working on it. So, doing PRs is going to be difficult, because you'll have to get someone who doesn't know the project, to look at it. And I don't have any experience at code reviews, so I really can't be of much help or guidance. So, you see why I'm losing sleep.

    Rod

  • Where I work we have GitHub workflows.  These contain a number of things

    • Linters/formatters
    • Test suites with minimum code coverage requirements
    • Docker container builders
    • Docker vulnerability scanners

    We are investigating https://www.codium.ai/pr-agent/

    When I'm reviewing stuff the first thing I do is check the GitHub workflows have passed.  If they haven't, then why submit the PR?

    Then I start by cloning the repo, switching to the PR branch and start running the automated tests.  If there aren't any then the PR has failed before it starts.

    There are a number of tools such as SonarLint and Sourcery that provide useful hints to alter your Python code to be more readable.  These are compatible with VS Code and other IDEs so there is no excuse not to use them.

    If someone submits a 1,000 line peer review then, unless those are 1,000 lines of test data to populate a DB table or collection, this is a PR failure.  The PR needs to be broken down into separate deliverable chunks.  You can't expect a fast turn around on a huge PR.

    The PR process is as much about flow as anything else. A 1,000 line PR is a huge constipated block in a narrow pipeline.  If you think in terms of small deliverables then what you deliver is naturally smaller less entangled work.

    One of the reason that people submit huge PRs is that there isn't the culture of jumping on PRs.  It can be very frustrating to submit a PR at 08:00 for a simple fix and have to report in the next day's daily stand up that no-one has looked at the PR.  It's all about delivering stuff, not having stuff in flight.  Working your socks off on 100 things is of less value than delivering 1.

    The other culture change when it comes to PRs is to recognise when a PR comment is

    • A suggestion
    • A question
    • An imperative

    As a lead I am responsible and accountable for the work done by my team.  I am always careful to make it clear which of the three my PR comment comes under.  If there is a major disagreement on an imperative then that is a sure sign that a discussion should have taken place earlier.  Whether the disagreement is with me or between a submitter and reviewer I will listen to your arguments and decide on merit.  You are not responsible or accountable for the results of that decision therefore, whether right or wrong, my decision is last word.

     

  • Large PR's are the result of a bigger problem, code bloat caused by not properly compartmentalizing components.

  • Doctor Who 2 wrote:

    This is a scenario that causes me to lose sleep at night. I'm on a two-person team that's migrating all our source code out of an old, on-prem TFS to GitHub (GH). Part of working in GH is using PRs and reviewing code. Code review is something which has never been done where I work. (In fairness to my current employer and coworkers, I have never been fortunate enough to work anywhere that practiced reviewing code.) I want code review to be a practice we adopt! The way we've done it so far is that people working on the same project will check out the code (TFS terms), then work on it in isolation for months.

    ...

    (I've really got to write this up in a blog.)

    This is written up in the Mythical Man Month

    https://amzn.to/3DRMcBf

    We learned decades ago that you can't hold onto code for months. If I were managing, I'd require the dev to check in every week, in a branch, and rebase more often to see what others are doing. If they can't do that, they can't write software.

    I know that's not your issue, but that's how the industry can build better software today that doesn't keep crashing on mobile phones.

    You ought to blog about it.

    Re: don't know the code. That's where a PR also has dialog with the submitter. What were they trying to do? Does this adhere to any standards? Are there bad practices here? Have they covered the cases to solve the problem? Those are the things you look for in review. It can be hard, which is why big PRs cause problems.

    Like big merges.

     

  • I used to take the time to go into detail and look at everything when I've been asked to do a code review.   I used to point out minor issues with SQL that didn't meet standards.  Even small things like formatting/capitalization/camel case.  But over the years I've let those things go and focus on bigger issues like using Row_Number or other obvious issues.  I found one recently that a developer was using it to return only one row, I pointed out with adding the right columns in the join it would eliminate the need for the Row_Number.   I've also changed what I do in a code review to just focus on reviewing the code being changed.  If a change is being made to a look up function in a data flow in an SSIS package, then that's all I focus on.  I used to go over everything but I would occasionally point out things the developer isn't changing.  Now you could argue that they should fix those as well.  I started thinking that I don't got through everything within the packages to make sure it meets standards, so why would I expect others to do it.  Also with being on these two week sprints it's hard to justify the time for them to change it then get it reviewed again when it's up against the deadline, and it's working as is.

    As I said I just focus on what is mentioned in the comments as to what is changing and I review the SQL for any hard coded dates or values that I think may cause issues, or bad joins of using Row_Number.  I don't take the time to try and run any code or package in a code review, to me that's not the point of the code review.  If its a 'new' developer I would take a little extra time.  The only time I can say I test anything is if it's a change to an SSRS report.  The report should be on a Dev or Test server and I will make sure it works before signing off on it.

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

  • Hi David, I really like the approach you described using with GitHub workflows! As I mentioned in my post we're migrating to GitHub. We've got two organizations in our Enterprise environment, and we had a Microsoft tech support guy give us training on GitHub Actions, which of course uses workflows. But I don't recall anything being mentioned about linters/formatters or test suites. (The concept of containers, Docker or otherwise, hasn't even hit my fellow employee's radar yet.) Perhaps we didn't cover those because it was an intro course, but I'd love to learn more about GH workflow linters/formatters and the test suite code coverage!

    Kindest Regards, Rod Connect with me on LinkedIn.

  • Thank you very much, Steve, for your feedback! WOW, that Mythical Man Month - how very far ahead of its time, it was.

    Kindest Regards, Rod Connect with me on LinkedIn.

  • I think that is a good point.  Review what the developer has actually submitted, not other stuff.  Obviously there are some "It depends" scenarios.  Restricting the blast radius is a sensible strategy.

    Sometimes I will raise JIRA tickets as a result of stuff I have seen while looking at code for a PR.  Some things I spot in passing and if I don't write them down I will forget about it and why I thought it was important.  This helps keep the current scope small while not losing sight of future risks/threats

     

  • Rod, it may take me a while but I will put together some stuff for you and pass it on.

  • Thank you, David. I appreciate the effort!

    Kindest Regards, Rod Connect with me on LinkedIn.

  • David.Poole wrote:

    I think that is a good point.  Review what the developer has actually submitted, not other stuff.  Obviously there are some "It depends" scenarios.  Restricting the blast radius is a sensible strategy.

    Sometimes I will raise JIRA tickets as a result of stuff I have seen while looking at code for a PR.  Some things I spot in passing and if I don't write them down I will forget about it and why I thought it was important.  This helps keep the current scope small while not losing sight of future risks/threats

     

    When you're starting fresh reviewing everything works, and if you do it right hopefully you can maintain a decent code base in such a way that reviewing just the changes works going forward.  Stepping into an existing code base and trying to review everything if it's not a clean code base is probably going to be a mess and you'll probably be trying to just make sure it works and only fixing truly egregious existing errors.

  • I disagree with the statement that large PRs are bad. Quite the opposite. How are you supposed to see and understand the whole picture, if you can't see all of the changes for some work that a developer is doing? A logical mistake can just as easily be introduced when doing a bunch of small PRs, because the reviewer is only concentrating on the little bit of code that is being shown as changed, as opposed to also seeing the code that was submitted in a prior PR.

    What is the difference between spending 5 minutes per PR and having to review 10 small PRs, versus spending 50 minutes on one PR? Steve mentioned being interrupted, and I agree, as interruptions often lead to bugs (for the stuff I am currently working on). I see the 10 small PRs as being an interruption to what I'm working on.

    I would much rather plan on taking a contiguous hour out of my work day to review a single PR, than to be interrupted multiple times for smaller PRs, and have to try to remember the other person's previous PRs to know how all their code fits together.

    Countless times, I have reviewed co-worker's PRs and have to spend time scratching my head trying to figure something out and asking a question, only to have them respond with, "Well, it's not complete yet. I just added that code for some testing purposes." Talk about a waste of my time. Not to mention, since we unfortunately don't do squash merges, the commit history of our files is huge and unwieldy, to the point of being worthless and unusable in the times when you want to use it to try and understand the history of a problem. So we are using source control, but essentially missing out on one of the biggest features of source control...to be able to effectively review history.

    As for other people's complaints about work taking months, again, Steve pretty much nailed with having those people be responsible for doing periodic merges in their branches. This is the way my team does work. We utilize the "feature branch" branching strategy. It is the norm for us, that feature branches will be worked on for over a month. We also have weekly "bug fix" branches, where generally small, not new feature related changes, are committed. Then, every week, full automation tests are run against the bug fix branch. Once all testing has been completed, that branch is merged into the main branch, and everyone with an open feature branch has to rebase.

    Ultimately, as is with many other things, I think the answer is, "it depends". To me, a 1,000 line PR is shouldn't be anything to complain about. Whatever best constitutes a person's "unit of work", is what the PR should be.

  • We might disagree with what a large PR is. To me, a PR should include everything that is needed for a piece of work, or for a deployment. Could that be a 5 line feature toggle to set something up later? Sure?

    It could be 100 lines across 5 files (or more). Certainly there are PRs that take tens of minutes, or maybe an hour, to review. However, the goal is get smaller pieces of *complete* functionality.

    If someone submits a PR because they committed and aren't done yet, that's silly. A) not time for a PR yet. B) put in the description this isn't complete, but is needed to be deployed because it sets something up.

     

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply