The search for files – Part 3

This is the third part of this series and this part is dedicated to the two functions that handle file signatures. They are almost identical but I decided not to modularize them futher, as they are called for different purposes. But you could argue the code duplication and break it down even further.

I want to start by saying that these functions are by no means “The Definitive File Signature” functions. I’ve also decided to limit the file signature check to one signature.
Just to show the method rather than trying to solve every scenario I could think of.

It’s also worth mentioning that the “world” of file signatures are not that consistent.
Or maybe I should really say that there are sometimes many variants of the same type of files.
A prime example are plain text files.

If you save a .txt file from notepad in the default ANSI format you will get one signature.
If you save the file in the UTF-8 format you will get another signature.
So be aware that unless a file type always get the same signature the results may not be what you expect.

I would recommend looking at the following sites if you want to the review the file types you’re looking for.

http://filesignatures.net/
http://filext.com/

Test-FileSignature

Test-FileSignature is the helper function that will return true or false if the signature matches the supplied file path. Again we are using QuickIO.Net to get the file signature because we can’t rely on Get-Content, due to the 260 character limitation.
The function need the path to the file and the signature to check against.
The signature need to be in hexadecimal if you are using this function by itself.
The reason is that all the sites that list file signatures are presenting them in hex.

Get-FileSignature

This function was really added so that it would be possible to use an existing file as a file signature example. Rather than having to look the extension up on one of the sites above.

The main difference from an option perspective is that you can chose a signature length.
As default it will check the first 4 bytes of a file but if the files you’re looking for have a longer or shorter signature, then you can change the amount of bytes to collect from the file.

Get-FileSignature -FilePath C:\tmp\textfile.txt -SignatureLength 3

In the above example we will collect the signature using the first 3 bytes rather than the default.
This is actually a little tip and example of the signature variants of text files.
If you’re searching for UTF8 encoded textfiles they all seem to share the first 3 bytes.

I’ll stop here with the explanations and leave it up to you to try out.

QuickIO.Net

Due to a bit of confusion in the first post about downloading QuickIO.Net, here are the methods I’ve used to download the library:

  • From within Visual Studio using the Nuget-package manager.
  • By downloading the Nuget command line tool and use that to download the library.

Installation

Installation instructions are probably the wrong term as it’s a powershell module and manifest.
While using the module I’ve used the standard powershell path and the QuickIO library in a subfolder:

Module/Manifest: C:\Program Files\WindowsPowershell\Modules\GetFilteredFileList
QuickIO DLL:     C:\Program Files\WindowsPowershell\Modules\GetFilteredFileList\QuickIO

If e.g. you put the DLL somewhere else then use the QuickIOPath parameter.

Disclaimer:
The code/functions in this post and site is supplied AS IS, without any warranties or support. I assume no responsibility or liability for the use of the code/functions.

GetFilteredFileList Module and Manifest
QuickIO.Net Home Page

The search for files – Part 2

It would be quite a lot to go through if I would go through the code itself.
So I’ve decided to provide some explanations and thoughts around the usage rather than the code. However if there are questions or suggestions on the code, please feel free to leave a comment. Again the link to the code can be found in part 3.

Get-FilteredFileList

As I mentioned at the end of Part 1, Get-FilteredFileList is the function that you will start with. From a modularization point of view this is the main function that then calls the other functions as needed.

Note:
If you check with Get-Help on this function you’ll notice that there is a default path for the QuickIO.Net library. You can use the QuickIOPath parameter to use a different path, otherwise the functions will assume the default path.

Lets start with a simple example.

Get-FilteredFileList -FilePath 'c:\temp' -RandomExtension 6 -Recursive

This will search recursively through the path ‘c:\temp’ and look for any file with an extension of 6 characters. The result is essentially what I started with in the original QuickIO.Net post.

Get-FilteredFileList -FilePath 'c:\temp' -SpecificExtension "aaa","ccc" -Recursive

Kind of the same type of search except we know which extensions to search for.
The SpecificExtension parameter accept a string array of extensions.

Lets look at a more complex example.

Get-FilteredFileList -FilePath 'c:\temp' -RandomExtension 3 -ExcludeExtension "txt","csv" -FileSignature "25504446"

In this example:

  • We are looking for files with an extension of 3 characters.
  • We exclude any files with an extension of “txt” or “csv”.
  • We will check each file that matches if they have the signature “25504446”.

If you want to search for any file with a specific file signature then just skip the Random/SpecificExtension paramters.

Get-FilteredFileList -FilePath 'c:\temp' -ExcludeExtension "txt","csv" -FileSignature "25504446"

I will go into more detail in regard to the file signatures in the next part.
But basically it’s the first 4 bytes of the file which most files (not all) use for storing the signature.
25 50 44 46 is the signature in HEX of a PDF file.

As you may have noticed the ExcludeExtension paramter also accepts a string array of extensions that you want to exclude. It’s worth mentioning that if you use the exclude option, those files will not be checked against the signature. E.g. if someone renames a PDF file to TXT, it will be skipped and not found in the list (using the example above of course).

Search-FileExtension

This function is a helper function that will accept the initial file list that get generated by Get-FilteredFileList.
Get-FilteredFileList will call Get-FilesQuickIO to get the files under the supplied path.
Search-FileExtension will process that list and filter it further.
As this is a helper function you don’t really call it manually as you would use Get-FilteredFileList.
You can of course review the code and examples but I won’t go into them in this post.

Test-FileMatch
The purpose of the function is really to see if the extension matches the selected criterias.
Originally this was part of the Search-FileExtension function.
But I decided to pull that code out to a seperate function to modularize the code further.

This conclude the second part of this topic.
In the third part we will look at the last two functions.

See you there.

The search for files – Part 1

This time I will go back to the topic of using the QuickIO.Net library.
You may have read my earlier post but for those who haven’t you can find it here:

Using QuickIO.Net with Powershell

I’ve decided to make this a multipart post as it would be quite large for one single post.
But lets begin with a little background.

The original post was really in regard to using QuickIO.Net for generic file searches.
But without some limitations of the built-in cmdlets in Windows/Powershell.
The main one is the limit of 260 charcters in the path.
A lot of Windows/Powershell programs and cmdlets can’t deal with paths of more than 260 characters.

In the original post I mentioned the background for me to use QuickIO.Net.
Which was that a client of mine had a crypto locker type event but the paths had more than 260 charcters. So I had to come up with another solution to check the file shares for a random 6 character extension.

After that post I mentioned it to people in forums and in facebook groups with similar events. But the original post didn’t really include any handling of the results except what I had to use. Which were the check for files with a 6 character extension.

As a result there were discussions and comments around using it for other/extended scenarios.
In the comments section of the original post you can find an example of how Svein Erik solved his scenario. Including automation of restoring files that was found.

To expand on the original post I’ve created some new functions with some ideas around the processing part of the file search result. In the beginning the idea was to base this around the crypto locker type searches.
But I soon came to the conclusion, why limit this to just that particular use case?
The same functions could be used for any type of file search that you want to filter on the extension and/or signature. You can find the link to the powershell module and manifest in the third part of this series.

Anyway, the new functions that I’ve added are:

Get-FilteredFileList
Will create the list of the files that you want, this is the “orchestrator” of generating the result.
Search-FileExtension
Is a helper function to filter the contents during the list generation, based on the used settings.
Test-FileMatch
Is a helper function just to determine if the extension is a match or not.
Test-FileSignature
Will see if the signature of the file is correct depending on the signature you’ve entered in the call to Get-FilteredFileList.
Get-FileSignature
Will give you the signature from an example file that you provide the path to.

I’ll explain in more detail what these functions do in the next couple of posts.
With that I’ll end this post here, as it’s time to start looking at the functions.

See you in the next part.

Anonymous functions in Powershell

While going through a number of javascript tutorials there were one thing that I wondered if it existed in Powershell.

Which is the notion of the “anonymous function”.
In Powershell this is more likely to be refered to as a “script block”.

The use case (in my opinion) is short repetitive code blocks that you want to call with different values, throughout your script/function.
Any more then that it’s probably better to just create a normal function as it will be be easier to follow and maintain.

Anyway, how do we create an “anonymous function”.
For example lets say you want to calculate the volume of a box in different parts of your code.
You could then create a variable holding a script block to do it.

# The variable containing the script block
$calculateVolume={param($width,$height,$depth) $width*$height*$depth}

# Calling the anonymous function using & before the variable.
&$calculateVolume 2 5 10

You can extend the script block between the curly braces just as any other function.
But then you might consider creating a normal function to “modularize” your code, rather than creating a big monolithic script/function.

Finally I want to leave a short example of something that I’ve found useful recently.
Which will examine a value and determine if the value is odd or even.
Then execute some code depending on if it’s odd or even.

# The IF statement will return true if it's odd and false if it's even.
$oddValue={param([int]$value) if($value % 2 -eq 1){$true}else{$false}}

# Some value to evaluate
$value = 5

# Calling the anonymous function in a switch statement.
switch(&$oddValue $value)
{
    $true {Write-Output "Some code if the value is odd";break}
    $false {Write-Output "Some code if the value is even";break}
}

Later on in the code when using it in the switch statement it looks much cleaner in my opinion.
Again, if you’re only going to use it once in your script/function, you may as well just use that one ‘IF’ statement.

Using QuickIO.Net with Powershell

This time around another .Net library called QuickIO.Net by the author Benjamin Abt.

The reason for this “project” was because a client of mine suffered from a Crypto Locker type event. It was caught quickly but not before some files out of those millions of files were encrypted.

The reason for using QuickIO.Net were threefold.

  1. Get-ChildItem don’t support paths of more than 260 characters.
  2. Conduct additional scripting/reporting after the list was created.
  3. Another workaround was in regard to using Robocopy and then RegEx to pull out the desired information. This produced issues with the file names when using any of the nordic special characters. E.g. åÅ, äÄ, öÖ.

The third point my not have been a big issue if it were one or two files but this was a scan of millions of files.

QuickIO.Net was very usefull as a workaround to the 260+ character limit.
I also thought this was a much cleaner solution compared to the Robocopy/RegEx solution.

The client also needed to check which files were changed at the time of the incident and only files with an extension of six characters long. The list could then be exported to a .csv file or used for further scripting e.g. move files to a quarantine location and so forth.

To use QuickIO.Net you need to download it from NuGet, a link is provided at the bottom of the page. Worth noting is that the QuickIO library can do much more than detailed in this post.

The input to the method that will search through the directories have three parameters.
The path to start from, any pattern filter using standard filters (e.g. “*.doc”) and if it should look through all the subdirectories or not.

The whole script is available via the links at the bottom of the page.
Here are the initial code and the first line that will load the .Net library.
You may want to change the path or even create a parameter for it if you want.

function Get-FilesQuickIO
{
    [CmdletBinding()]
    Param
    (
        # Path to start from
        [Parameter(Mandatory=$true,ValueFromPipelineByPropertyName=$true)]
        [string]$FilePath,

        # To search recursively
        [Parameter(Mandatory=$false,ValueFromPipelineByPropertyName=$true)]
        [switch]$Recursive,

        # To filter
        [Parameter(Mandatory=$true,ValueFromPipelineByPropertyName=$true)]
        [string]$Filter
    )

    # Load QuickIO Assmebly
    Add-Type -Path .\QuickIO\SchwabenCode.QuickIO.dll

I created some logic to handle the “Recursive” switch as the search option is named differently when calling the library. It also seems that if you use a filter and recursive at the same time the “EnumerateFiles” method will not do it recursively.
As a workaround I also created some logic to check if the pattern contains “*” or “*.*”.
If not then we first grab all the directories and then loop through those directories one by one.
The reason for not just using the directory loop and skip the extra logic, is that it takes about twice the time.
Actually it’s faster to get all the files and then use a pipe to “where” and filter it that way instead.
But for the sake of completeness I include the logic to handle the filter option.

    # Initiate the file list
    $fileList = @()

    # Set search option based on recursive or not
    if(($Recursive -eq $true) -and (($Filter -eq "*") -or ($Filter -eq "*.*")))
        {
        # Set the search option value
        $searchOption = "AllDirectories"

        # Get the list of files
        $fileList = [SchwabenCode.QuickIO.QuickIODirectory]::EnumerateFiles([system.string]$FilePath,[system.string]$Filter,[System.IO.SearchOption]$searchOption)
        }
    elseif ($Recursive -eq $true)
        {
        # Set the search option value
        $searchOption = "AllDirectories"

        # Set the directory pattern
        $directoryPattern = "*"

        # Get all the directories as the recursive option doesn't work when using a filter.
        $directoryList = [SchwabenCode.QuickIO.QuickIODirectory]::EnumerateDirectories([system.string]$FilePath,[system.string]$directoryPattern,[System.IO.SearchOption]$searchOption) | select -ExpandProperty FullName

        # Loop through the directories with the file pattern
        foreach($d in $directoryList)
            {
            $fileList += try{[SchwabenCode.QuickIO.QuickIODirectory]::EnumerateFiles([system.string]$d,[system.string]$Filter,[System.IO.SearchOption]$searchOption)}catch{$null}
            }
        }
    else{
        # Set the search option value
        $searchOption = "TopDirectoryOnly"

        # Get the list of files
        $fileList = try{[SchwabenCode.QuickIO.QuickIODirectory]::EnumerateFiles([system.string]$FilePath,[system.string]$Filter,[System.IO.SearchOption]$searchOption)}catch{$null}
        }
    return $fileList

As you may have noticed in this example I’m not dealing with any exceptions.
Which you may/should do of course but as this is an example code I’ll leave that up to you to create.

Finaly to solve the six character extension check I created some code outside the function.
I could’ve included it in the code above or created another function for it but this was kind of a one off, hopefully.

    $result = @()    

    foreach($f in $fileList)
        {
        $extension = $f.Name.Split(".") | select -last 1
    
        if($extension -match "\b.{6}\b")
            {
            $result += $f
            }
        }

The result from QuickIO.Net will include the full path, file name, size, dates and so forth.
You can then easily use standard powershell filters like “Where” or “Select” to pull the information you want.
E.g. in the scenario described above we used a Where statement on the returned result to just select the files changed on that particular date.

Hope you found this usefull and again the library is capable of a lot more if you need it for other purposes.

Script source code
QuickIO.Net

Creating Excel files with Powershell and EPPlus

One of the great things with Powershell is that it’s a layer ontop of the .Net framework. It may not be obvious at first but it’s kind of a “hybrid” of a command line shell and for a lack of better analogy, a cousin to C#.

What it means for us that use Powershell is that we can extend the capabilities in our scripts and modules beyond the already large collection of cmdlets in Powershell. By using built-in or third party .Net libraries.

In this blog post we are going to look at creating Excel ‘.xlsx’ files without installing Excel. Which is quite handy if you need to create Excel files on a system where you don’t want to install Office and e.g. create automated reports as Excel files.

To achieve this we are going to use the downloadable and free .NET library “EPPlus” for the creation of the Excel file and content.

A couple of things worth noting before we begin.

  • This post was made using Windows Server 2012 R2, Powershell version 4 and the DotNet4 version of the EPPlus library.
  • The post is going to show you how to create the Excel file and add content, however the EPPlus library is capable of much more but would require a much longer post. So this post will concentrate on the basics to get you going.
  • If you don’t want to know how to do it from scratch there is an (as far as I can tell) excellent module in the PS Gallery by “Douglas Finke” called “ImportExcel” that let you use standard Powershell syntax. ImportExcel also utilizes EPPlus to create the files.

With that said you may ask yourself, why go through all this if there is an available module already?

Good question, from the last project I was involved in these were some of my reasons.

  • Some features were not implemented yet unless you used the Export-Excel function. E.g. the autofit column width.
  • The spread sheets I was going to create was quite specific, so I didn’t need to deal with “generic” data input.
  • From a code perspective, I wanted the code to be as condensed as possible and not mix and match parts from a seperate module  and then calling functions directly from EPPlus.

Your milage may wary of course but those were some of the reasons which made me look into EPPlus in more detail.

If you want to follow the code in a text-version then I’ve created a link to the sample code on GitHub/Gist (EPPlus_Sample.ps1), you can find the links at the bottom of this post.

Before we can start we need to download the EPPlus library, the link to the library is available at the bottom of this post.

Note: You may need to unblock the files, otherwise you’ll get an error when you try to add the .dll to the powershell session.

But lets start looking at the interesting stuff, the code. I’ll be going through this from top to bottom. Again if you want to follow this in a “text” format then you have the link to the sample code.

epplus_sample

  1. First we need to add the EPPlus library to our session. This is where you’ll get an error if the .dll is blocked.
  2. Then we create a path to the .xlsx that we want to create and initiate the “Excel Package”. The Excel Package have “overloaded” methods for the initation but for this example we reference a file path.
  3. Next we create a reference to the workbook of the package, this way it will be cleaner to reference the underlaying properties.
  4. In the next section we add an Author and Title to the Excel file, this is optional.
  5. Next we add the first worksheet, if you want more than one sheet in your excel file then you can create more by using the same syntax. The pipe to Out-Null isn’t really needed but will supress the output to console if you don’t want to see it.
  6. Next we add a reference to the specific worksheet that we just created. You reference the worksheet by it’s index number and they start from 1.
  7. Next we add a value to the first cell in the first column.
  8. The next section will add a background color with a solid red color. As you can see we reference the cells between A1->E1.
  9. Once you’ve filled the sheet(s) with information then it might be a good time to autofit the colums. It’s good to do this at the end of the file creation. Since otherwise you may’ve added more information later on that will not have been part of a previous autofit.
  10. Next you find an example of setting the column width to a specific size. The columns are referenced by a number 1=A, 2=B and so forth.
  11. Next we save the file and at the very end we clean up any references to the package.

The result should look something like this:

result

I never promised it would look pretty 🙂

But what if you need to edit an Excel file that already exist?
It’s very similar to what we did before, you can find the sample code at the bottom of the post.

epplus_sample2

The only difference from what we did before is:

  • We don’t add a new sheet
  • We get the value from cell A1 and add more information to it.
  • Then change the value of cell A1 to the new value.
  • I also replaced the ugly red color of the first row to something more pleasing to the eye.
  • And finaly added some information in the A2 cell.

It should now look like this:

result2

To conclude this post the basic structure to create an Excel file is not that complicated once you’ve seen the structure and initiation.The reference to cells are the same as if you were using Excel. E.g. you can reference multiple cells using the standard range syntax like A1:A5.

You can add dropdown lists and add validation (may become a future post) and many more features that are available from the EPPlus library.

Finaly, I hope you’ve found this information useful and I wish you good luck with the next Excel automation project.

Reference Links:

EPPlus Library
ImportExcel – Douglas Finke
EPPlus_Sample.ps1
EPPlus_Sample2.ps1