How to check if the file exists in S3 bucket.

  • Hi,

    I am trying to build a script to copy sql backups to an S3 bucket.  The structure of the folders is like this:

    ServerName

    - FullBackups

    -- Daily

    -- Weekly

    -- Monthly

    Inside the lower level folders are the backups.  But the structure may change.  I have a script below which recursively looks through the folders and copies the backups.

    One problem though.  If the file exists in S3 it gets copied again.  How can I add a check to see if the file is there already and skip copying if the case.    I need something like this:

    $fFile = "daily_mydb_20200831.bak"
    $S3file = Get-S3Object -BucketName $S3Bucket -Key "/ServerName/FullBackups/Daily/daily_master_20200831.bak"
    $s3obj = ($S3file.key -split "/")[-1]
    if ($fFile -eq $s3obj -and $S3file.size -ge $fFile.Length) {
    "File exists: $s3obj"
    }
    else{
    Write-S3Object -BucketName $S3Bucket -Key $s3keyname -File $backupName
    }

    This is is the main script:

     

    $BackupLocation = 'F:\FullBackups'
    $S3Bucket = 'MyBucket'
    $s3Folder = 'MyServer'

    Import-Module -Name AWSPowerShell
    Initialize-AWSDefaults -ProfileName MyProfile -Region ap-southeast-2


    # FUNCTION – Iterate through subfolders and upload files to S3
    function RecurseFolders([string]$path) {
    $fc = New-Object -com Scripting.FileSystemObject
    $folder = $fc.GetFolder($path)
    foreach ($i in $folder.SubFolders) {
    $thisFolder = $i.Path

    # Transform the local directory path to notation compatible with S3 Buckets and Folders
    # 1. Trim off the drive letter and colon from the start of the Path
    $s3Path = $thisFolder.ToString()
    $s3Path = $s3Path.SubString(2)
    # 2. Replace back-slashes with forward-slashes
    # Escape the back-slash special character with a back-slash so that it reads it literally, like so: "\\"
    $s3Path = $s3Path -replace "\\", "/"
    $s3Path = "/" + $s3Folder + $s3Path

    # Upload directory to S3
    Write-S3Object -BucketName $s3Bucket -Folder $thisFolder -KeyPrefix $s3Path
    }

    # If subfolders exist in the current folder, then iterate through them too
    foreach ($i in $folder.subfolders) {
    RecurseFolders($i.path)
    }
    }

    # Upload root directory files to S3
    $s3Path = "/" + $s3Folder + "/" + $sourceFolder
    Write-S3Object -BucketName $s3Bucket -Folder $BackupLocation -KeyPrefix $s3Path
    $s3Path
    # Upload subdirectories to S3
    RecurseFolders($BackupLocation)
  • Just to add some clarification to this problem and "bump" it because it is an interesting issue...

    Where are you copying the backups FROM?  For example, are you copying from on-prem to an S3 "Gateway" drive?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I assume you have the same structure locally and are just duplicating this in S3? I haven't tested this, but it looks reasonable. I see your same code on SO as an answer.

    I also see this, using Get-S3ObjectMetadata as a test - https://docs.aws.amazon.com/AmazonS3/latest/API/API_control_S3ObjectMetadata.html

    https://docs.aws.amazon.com/AmazonS3/latest/API/API_HeadObject.html

    I see references to the default for write-s3object being overwrite. Are you sure you have the same file being copied over? How is it named? Or do you mean it's copying and wasting bandwidth?

  • It is an AWS EC2 instance.  But it does not matter, as long as there is connectivity.  I am trying to copy my folder structure to AWS without overwriting the same file if it exists.

  • Yes, it copies the same files.  If I run it twice without adding any new files, it would copy over all files again.

  • I'm not a PowerShell user but a Google for "check if file already exists using powershell" turns up things that may easily help do what you want.  Try it.

    Even better might be a similar search.  "check if file already exists using powershell in s3".

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Found a solution in case someone may need it:

    param($BackupFolder, $S3Bucket, $s3Folder, $Filter) 
    $erroractionpreference = "Stop"
    <#
    $BackupFolder = 'F:\Backup'
    $S3Bucket = 'mybucket'
    $s3Folder = 's3myfolder'
    $Filter = '*.zip'
    #>

    # Get the AWS Stuff
    Import-Module -Name AWSPowerShell
    # Get credentials from the persisted store
    Initialize-AWSDefaults -ProfileName MyProfile -Region MyRegion


    Get-ChildItem –Path $BackupFolder –Recurse -Filter $Filter |

    Foreach-Object {
    #$S3filename = ""
    $filename = $_.FullName
    $S3filename = $filename.replace("^[a-z]:\\","")
    $S3filename = $s3Folder + '/' + $S3filename
    $S3filename = $S3filename.replace("\","/")
    if(Get-S3Object -BucketName $S3Bucket | where{$_.Key -eq $S3filename}){
    "File $S3filename found"
    }
    else{
    Write-S3Object -BucketName $S3Bucket -Key $S3filename -File $filename
    }
    }
  • Glad you got it to work and thanks for the update.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply