Blog Post

PowerShell: Using Exponents and Logs to Format Byte Sizes

,

In a recent Scripting Games event, I had to format some byte values into their most appropriate size according to how many bytes were present. Anyone who hangs around with me long enough will realize I love numbers and the preciseness that they provide. This is one of many reasons why I enjoy computer programming and especially working with SQL Server. Even more than my love of numbers is my hatred toward duplicating work.

This is one of my longer posts, but I promise some neat stuff if you stick with me on this…

When I started to code for the event, I headed down the path of using the PowerShell switch statement. I had never known that you could use an expression to evaluate a switch – so that was something really cool. This is the function I initially came up with (all the commenting and help removed for brevity):

switch ($a)
{
#each step in the switch increases by a multiplier of 1024
{$_ -lt 1024} {"{0:N2} Bytes" -f ($a)}
{
$_ -ge 1024 -and $_ -lt (1048576) } {"{0:N2} KiloBytes" -f ($a/1024)}
{
$_ -ge 1048576 -and $_ -lt (1073741824) } {"{0:N2} MegaBytes" -f ($a/1048576)}
{
$_ -ge 1073741824 -and $_ -lt (1099511627776) } {
"{0:N2} GigaBytes" -f ($a/1073741824)}

#Stop @ TeraBytes, could be more than 1024 TeraBytes, but that is acceptable
default {{"{0:N2} TeraBytes" -f ($a/1099511627776)}}

}

This was a workable function, but it didn’t feel right. I had to do a lot of typing and we all know we only have so many keystrokes in our lives, so I stepped back and re-evaluated.

At each expression, I was looking at a multiplier of 1024. Like the veil was removed, I realized that these were exponents. Time for the .NET Math Class (pun intended). Among some of the other cool methods of this class is the Pow method. No, we’re not in the original Batman series fighting the super villain, it’s the equivalent of the T-SQL POWER function. I soon had a simple function and all those large nasty numbers replaced with their exponential equivalents. Here is version two:

function Get-Power
{

param([double]$RaiseMe)

[
Math]::Pow(1024, $RaiseMe)

}

switch ($a)
{
#each step in the switch increases by a multiplier of 1024
{$_ -lt (Get-Power 1)} {"{0:N2} Bytes" -f ($a)}
{
$_ -ge (Get-Power 1) -and $_ -lt (Get-Power 2) } {
"{0:N2} KiloBytes" -f ($a/(Get-Power 1))}

{
$_ -ge (Get-Power 2) -and $_ -lt (Get-Power 3) } {
"{0:N2} MegaBytes" -f ($a/(Get-Power 2))}

{
$_ -ge (Get-Power 3) -and $_ -lt (Get-Power 4) } {
"{0:N2} GigaBytes" -f ($a/(Get-Power 3))}

#Stop @ TeraBytes, could be more than 1024 TeraBytes, but that is acceptable
default {"{0:N2} TeraBytes" -f ($a/(Get-Power 4))}

}

This worked the same and it was a little cleaner, but it still bothered me. There was a lot of copied and pasted code in there and it was all to determine what text to slap on the end of a quotient. I once again stepped back and realized that it was the switch’s fault. I then made it my goal to eliminate the switch.

In the process, I remembered that 1024 is 2^10… this meant that 1024^2 was 2^20, 1024^3 was 2^30 and so on. Believe it or not, this made my path clear.

A quick Algebra refresher – exponents and logs are functionally the same things, just expressed in a way that makes each valuable for different circumstances. For instance, 2^10=1024 is the functional equivalent of LOG_2(1024) = 10. For our purposes, we will want to know the power of 2 each number being provided is. To find the unknown “power” value in an equation, (the 10 above) the log base is technically irrelevant. With that in mind, we can use the natural logarithm (ln) to determine what power of 2 any number represents.

2^x = 1024 bytes
ln(2^x) = ln(1024 bytes)
ln(2)x = ln(1024 bytes)
x = ln(1024) / ln(2)
x = 6.93 / 0.693
x = 10

Before you think, that’s great, thanks for the math lesson and stop reading - this information really does have some decent uses in PowerShell. Let’s see how this math can be applied to make my function above re-usable and easily maintainable code.

The Math class has a method named Log . When only provided with a single value, this function will use “e” as the base – the natural algorithm which is the functional equivalent of “ln” used above. Armed with this function, we can determine which power of 2 any particular number is. More importantly, this information can eventually be used to format any size of number appropriately.

In the code below, I use PowerShell to extract the closest whole power of 2 that makes up the $byte value by implementing the Floor method.

$PowerOfTwo = [Math]::Floor([Math]::Log($byte)/[Math]::Log(2))

If I now create an array of my “descriptors”, I can use this text to be added to the end of each of the numbers to make it pretty.

$ByteDescriptors = ("B", "KB", "MB", "GB", "TB", "PB", 
"EB", "ZB", "YB", "WYGTMSB")

With the array prepared, the $PowerOfTwo variable can be divided by 10 (and Floored) to provide the index into the descriptor array.

$DescriptorID = [Math]::Floor($PowerOfTwo/10)

Finally, we use the format method to combine all this information into an output. The $Scale variable is set in the script to be 2. Not only is the $DescriptorID used to determine the description, it is also used as a power of 2 in the divisor with the total byte value as the dividend.

Write-Output ("{0:N$Scale} $($ByteDescriptors[$DescriptorID])" -f (
$byte / [Math]::Pow(2, ($DescriptorID*10))))

I’ve included the full function below which includes all of these pieces as well as full comments and a few extra parameters. While I may have been able to use the first function, the flexibility that the most recent iteration of this script provides seems worth the effort. Not only do I think this is a neat function, I now have an answer for my kids when they ask “When am I ever going to use this stuff in real life?” 🙂

function Format-Byte
{
<#
.SYNOPSIS
Formats a number into the appropriate byte display format.

.DESCRIPTION
Uses the powers of 2 to determine what the appropriate byte descriptor
should be and reduces the number to that appropriate descriptor.

The LongDescriptor switch will switch from the generic "KB, MB, etc."
to "KiloBytes, MegaBytes, etc."

Returns valid values from byte (2^0) through YottaByte (2^80).

.PARAMETER ByteValue
Required double value that represents the number of bytes to convert.

This value must be greater than or equal to zero or the function will error.

This value can be passed as a positional, named, or pipeline parameter.

.PARAMETER LongDescriptor
Optional switch parameter that can be used to specify long names for
byte descriptors (KiloBytes, MegaBytes, etc.) as compared to the default
(KB, MB, etc.) Changes no other functionality.

.PARAMETER Scale
Optional parameter that specifies how many numbers to display after
the decimal place.

The default value for this parameter is 2.

.EXAMPLE
Format-Byte 123456789.123

Uses the positional parameter and returns returns "117.74 MB"

.EXAMPLE
Format-Byte -ByteValue 123456789123 -Scale 0

Uses the named parameter and specifies a Scale of 0 (whole numbers).

Returns "115 GB"

.EXAMPLE
Format-Byte -ByteValue 123456789123 -LongDescriptor -Scale 4

Uses the named parameter and specifies a scale of 4 (4 numbers after the
decimal)

Returns "114.9781 GigaBytes"

.EXAMPLE
(Get-ChildItem "E:\KyleScripts")|ForEach-Object{$_.Length}|Format-Byte

Passes an array of the sizes of all the files in the E:\KyleScripts folder
through the pipeline.

.NOTES
Author: Kyle Neier
Blog: http://sqldbamusings.blogspot.com
Twitter: Kyle_Neier

Because of the 14 significant digit issue, anything nearing 2^90
will be marked as WYGTMSB aka WheredYouGetThatMuchStorageBytes. If you
have that much storage, feel free to find a different function and or
travel back in time a hundred years years or so and slap me...

#>

[CmdletBinding()]

param(
[parameter(
Mandatory
=$true,
Position
=0,
ValueFromPipeline
= $true

)]
#make certain value won't break script
[ValidateScript({$_ -ge 0 -and
$_ -le ([Math]::Pow(2, 90))})]
[
double[]]$ByteValue,

[parameter(
Mandatory
=$false,
Position
=1,
ValueFromPipeline
= $false

)]
[
switch]$LongDescriptor,

[parameter(
Mandatory
=$false,
Position
=2,
ValueFromPipeline
= $false

)]
[ValidateRange(
0,10)]
[
int]$Scale = 2

)

#2^10 = KB, 2^20 = MB, 2^30=GB...
begin
{
if($LongDescriptor)
{
Write-Verbose "LongDescriptor specified, using longer names."
$ByteDescriptors = ("Bytes", "KiloBytes", "MegaBytes", "GigaBytes",
"TeraBytes", "PetaBytes", "ExaBytes", "ZettaBytes",
"YottaBytes", "WheredYouGetThatMuchStorageBytes")
}
else
{
Write-Verbose "LongDescriptor not specified, using short names."
$ByteDescriptors = ("B", "KB", "MB", "GB", "TB", "PB",
"EB", "ZB", "YB", "WYGTMSB")
}
}

process
{
foreach($byte in $ByteValue)
{
#Determine which power of 2 this value is based from
Write-Verbose "Determine which power of 2 the byte is based from."
$PowerOfTwo = [Math]::Floor([Math]::Log($byte)/[Math]::Log(2))

#Determine position in descriptor array for the text value
Write-Verbose "Determine position in descriptor array."
$DescriptorID = [Math]::Floor($PowerOfTwo/10)

#Determine appropriate number by rolling back up through powers of 2
#format number with appropriate descriptor
Write-Verbose ("Return the appropriate number with appropriate "+
"scale and appropriate desciptor back to caller.")

Write-Output ("{0:N$Scale} $($ByteDescriptors[$DescriptorID])" -f (
$byte / [Math]::Pow(2, ($DescriptorID*10))))

}
}
}

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating