Monday, July 10, 2017

Using Powershell to find duplicate IDs in serialized items

On my current project I was troubleshooting a failing integration test that makes use of the Sitecore FakeDB Serialization module, which can declarative load a branch of content directly from the file system. The setup method for this test was reporting a duplicate key exception, so I used PowerShell to identify the culprit.
The first step was to extract the IDs from the serialized *.item files.  In the powershell prompt, I navigated to the root of the TDS directory, and used this command to get at the item IDs:

>Get-ChildItem -recurse | Select-String '^id:' | More

This looked like it was giving me the desired raw output:

But I wasn't 100% sure what was what. For example, what was the "3" doing before the ":id:"?  To get a closer look at the data returned, I piped this into Format-Table:

>Get-ChildItem -recurse | Select-String '^id:' | Format-Table | More

This shows what fields I have to work with.  So to identify my duplicates, I needed to group by the "Line" field, which contained the Sitecore ID, and find those with a count greater than 1.  A quick Google search turned up an article on how to do Group operations in PowerShell, and the previous output showed  needed to group by the "Line" field. So now I had this:

>Get-ChildItem -recurse | Select-String '^id:' | Group Line | More

Following the example of the article I cited above, I used this to identify the duplicates:

Get-ChildItem -recurse | Select-String '^id:' | Group Line | Sort Count -Descending | Select -First 5

In my case I saw several IDs with a count of two, so this command gave me the information I needed.  A more universal approach would be to filter the results to counts of two or above, which you can do with this:

Get-ChildItem -recurse | Select-String '^id:' | Group Line | Where {$_.Count -gt 1}

Or with Powershell 3.0 and up, you can get rid of the curly braces:

Get-ChildItem -recurse | Select-String '^id:' | Group Line | Where Count -gt 1

Finally, I should mention that most of the commands above have shorter versions, which speed typing at the cost of legibility:


So the search could have been written as below:

gci -recurse | sls '^id:' | group line | ? count -gt 1

Finally, in addition to Format-Table, Format-List (which writes out each property of each object returned) and Format-Wide (which writes out a single property of each object, in a multi-column format) are useful as you do discovery of how your query is working.  Finally, Out-GridView sends results to a window that allows sorting, filtering, and selecting columns.

These articles were helpful as I figured out how to query with Powershell:

And to learn about FakeDB's Serialization feature:

No comments:

Post a Comment