Got more questions concerning using RStudio and how to put "whatever" into Git

  • I'm sorry for being so vague, but until I learn what the lingo is, I'll have to use vague terms.

    Besides being the TFS Administrator at work, I am also trying to get us moved to Git. Currently, we're using Azure DevOps Services (ADS). I'm trying to get a team with data analysts in it, setup with Git in ADS. They all use RStudio. I have no experience with R. And I certainly don't have RStudio installed on my PC/laptop, nor am I ever likely to get RStudio installed. What I need to know is a 10,000 ft. level understanding of RStudio. I want to have a common language with which I can talk with them.

    However, working with the first person (Bill) I've set up their first Git repo in ADS.

    I know that RStudio can work with Git - which is the reason that group wants to work with Git.

    So far, I've helped two people get started using Git with ADS. Yesterday I helped the second person get started.  Her name is Chelsea. She spent some time showing me their current R code storage. I'll describe it here. That whole team has been storing their R code on a network share called L:. They have folders in which they put "whatevers". This is one of the first places where my ignorance comes into play. I don't know what those things are. If this were Visual Studio, then those folders would either be a solution with one or more projects, or a project that's a part of a Visual Studio solution. I don't know what they're called in RStudio, so I'm calling them "whatever". Chelsea said that each folder is a collection of R code related to generating some reports, doing data analysis etc on one thing. (Man, it is so frustrating not knowing the vernacular!)  And since they've been doing this for years, they have a large collection of folders with long names using ending in something like "Vn", as in V1, V5, V12. That means version 1 of that "thing", version 5 of that "thing" and version 12 of that "thing". So, are these whatevers all something like Visual Studio solutions/projects? I'm trying to figure out how this stuff can all be stored in a Git repository or Git repositories. What would make sense to them.

    You can see the problem they're facing. For however long they've been doing this, they're in the habit of storing all the code that they collaboratively work on, in this L: network share. (I can just imagine the many times they've had conflicts as two or more people are working on the same .R file at the same time!) In fact, Chelsea really had an extremely hard time getting her mind around the idea that she should clone the repo to her machine. She really wanted to clone the repo into one of those's whatevers on the L: drive. I told her that would only seriously cause them all sorts of problems, because all of them would try to clone that same repo into the same folder on their L: network drive. And now I'm wondering if Bill has already done that. I don't have any permissions to that L: drive, so I can't look for the hidden .git folder. Since RStudio works with Git, it must know how to handle local Git repos.

    So, would someone please introduce me to the terms used by users of RStudio to handle working with these whatevers. How does RStudio handle working with Git? I'm assuming that RStudio will handle some of the complexities of working with Git, such as it wouldn't reveal that a modified file is probably unstaged so the user must first stage it, then commit it, before pushing it to remote. These are all concepts I'm comfortable with. Chelsea was ready to run for the hills when I walked her through it using the command prompt.

    Rod

  • Thanks for posting your issue and hopefully someone will answer soon.

    This is an automated bump to increase visibility of your question.

  • I'm not that familiar with RStudio, but I know git and their idea of versioning by folder is something unneeded in git.

    In git, the way I would handle it is create a branch for each version and then have a "main" branch  that contains whatever the latest version is.  Mind you this assumes that you rarely need to use previous versions.  If you frequently swap versions, it may make more sense to keep the existing folder structure.

    But I know with everything I do in git, I don't worry about versioning it by folders.  All of my versions are done by branches and in the event I need to view an older version, it is easy to do.  My approach is to have a development branch for me that doesn't get pushed to the central repository, or if I do, I push it to a fork so I am not cluttering up live.  When I have something ready for live AND I want to mark it as a specific version for others to use, I will make a new branch on the main fork and merge my changes into that branch.  If that version is ready for live, I will then make a merge/pull request on the main branch to get my changes into the main branch.  Finally, I will rebase my fork off of the main fork's main branch so that my fork is up to date and pull those changes to my local machine.

    The approach that you go with really depends on the developer and the project.  Forking may be overkill for example if you don't have a lot of collaboration on the tool.  And keeping your changes local may not be the approach you wish to use.  It may make more sense to have your changes pushed to the central repo frequently.  For me though, I don't like having unfinished code pushed to the central repo, but I do local commits any time I finish making a major change.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!

  • I like your approach, Brian. I hope I can get them to not try to clone their instance of the Git repo into the same folder on their L: drive. If they do that, then they'll really have a mess.

    Kindest Regards, Rod Connect with me on LinkedIn.

  • Oh I completely agree.  you should clone to your local disk, not a network share.  And a good way to prove to them that is the way to go - have them try cloning to a network share and have 2 people modify files at the same time.  it will get messy and break things quicky.

    or worse - person A clones the work there, person B makes some changes and decides their changes broke things, so they do a hard-reset on the git repo.  All changes person A had done are now lost.

    the idea of git is that you have a central repository where the information is stored and it is the "source of truth".  Then if person A and person B want to work on it, they can both make changes without breaking the others work.  The only problems that can come into play are if person A and person B are both modifying the same file in their local repos, then they both make merge requests (gitlab terminology, github uses pull request) and the merges conflict.  Then whoever merges first wins and the second merger needs to rebase and fix the merge conflicts which can be a pain in the butt.

    But the first time someone clobbers someone else's work, I expect you will get one of 2 responses:

    1 - guess we should have listened to you and not put the "local" git data on the L drive

    2 - git is horrible, look how much time and data we lost.  Lets go back to using the L drive and multiple folders to keep track of versions.

    I think you need to do it right the first time to prevent those types of losses and to prevent failure.  I would try to do everything I could to prevent them from syncing their git repo to the L drive.  You really want your local repository to be local to your disk on your machine.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply