Removing EXO litigation holds and making your mailbox functional again

Recently i had a situation where a user with a shared mailbox which had a ludicrous number of items. He recently started in the role and wanted to “start fresh” – but items could not be deleted.

A quick investigation found that litigation hold had been enabled for the mailbox (in addition to our org wide retention policies). It was unclear why the litigation hold had been enabled – and the reasons were lost due to staff turnover.

Finding and fixing the issue included:

  • get-mailbox <identity> | fl *hold*
    • This command will show the status of all holds on that specific mailbox. In my case, i could see that “LitigationHoldDate” and “DelayHoldApplied” were populated
    • In order to remove these i ran
      • get-mailbox <identity> | Set-Mailbox -LitigationHoldEnabled $false
      • get-mailbox <identity> | Set-Mailbox -RemoveDelayHoldApplied
  • After running these steps – i was still not able to delete items from the mailbox, so i ran
    • get-mailboxFolderStatistics <identity> -FolderScope RecoverableItems | FL Name,FolderAndSubfolderSize,ItemsInFolderAndSubfolders
    • and could see that the recoverable items and purges were both at 100GB – meaning that the quota, which also applied to delete items was full – so i could not yet delete anything more
    • In order to speed up the process of the managed folder assistant doing its job, run
      • Start-ManagedFolderAssistant -Identity <identity> -FullCrawl -HoldCleanup
    • After some time, if you re-run the get-mailboxFolderStatistics command, you should see the recoverable items and purges start to come down
  • Since this mailbox was full and is receiving a very high volume of new mail – the 100GB limit was going to be hit again very quickly – so in order to mitigate that for this initial cleanup, i then set
    • Set-Mailbox -Identity <identity> -RetainDeletedItemsFor 1
    • This will only detain deleted items for 1 day in the dumpster before purging
    • I will set this back to 30 days once the initial cleanup is complete and the mailbox is back to “normal” operation.

 

Crowdstrike BSOD and GPO no longer updating

After the Crowdstrike BSOD’s on 19/07/2024 – we have seen a significant uptick on clients not refreshing group policy.

The machines in question can be identified via:

  • The last update file date on C:\Windows\System32\GroupPolicy\Machine\registry.pol being on or around 19/07/2024 (some were on the 20th or 21st for us)
  • Event ID 1096 in the system event log with a line similar to “The processing of Group Policy failed. Windows could not apply the registry-based policy settings for the Group Policy object LocalGPO. Group Policy settings will not be resolved until this event is resolved. View the event details for more information on the file name and path that caused the failure”

The fix itself is very simple, delete the file C:\Windows\System32\GroupPolicy\Machine\registry.pol… but in an environment which does not have SCCM on all endpoints (which is incredibly frustrating), the following can be utilised to identify the machines suffering from the issue. The following script also checks for setup log event ID 1015 – indiciating Windows component store corruption… far less common – but we’ve also had some of that (although im less including to think this is Crowdstrike related and more just the poor maintenance of machines)

Obviously you could also add the code to delete the file when found – but at this point, i just needed to identify.


# Define the path to the input file containing the list of machines
$inputFilePath = “<path to txt file with computer list – could also run against AD if you wanted>”

# Define the output file to store the results
$outputFilePath = “<outputpath>\results.csv”

# Import the list of machines from the text file
$machines = Get-Content -Path $inputFilePath

# Initialize an array to hold the results
$results = @()

foreach ($machine in $machines) {
# Trim any leading/trailing whitespace
$machine = $machine.Trim()

# Ping the machine to check if it’s online
if (Test-Connection -ComputerName $machine -Count 1 -Quiet) {
Write-Host “$machine is online.”

# Define the path of the file to check
$filePath = “\\$machine\C$\Windows\System32\grouppolicy\machine\registry.pol”

# Check if the file exists and get the last write time
if (Test-Path -Path $filePath) {
$fileDate = (Get-Item -Path $filePath).LastWriteTime
Write-Host “File found on $machine. Last modified on: $fileDate.”
} else {
Write-Host “File not found on $machine.”
$fileDate = $null
}

# Check for Event ID 1096 in the System log within the last 7 days
$event1096 = Get-WinEvent -ComputerName $machine -FilterHashtable @{LogName=’System’; Id=1096; StartTime=(Get-Date).AddDays(-7)} -ErrorAction SilentlyContinue

# Check for Event ID 1015 in the Setup log within the last 7 days
$event1015 = Get-WinEvent -ComputerName $machine -FilterHashtable @{LogName=’Setup’;Id=1015; StartTime=(Get-Date).AddDays(-7)} -ErrorAction SilentlyContinue

# Determine the status of the events
$event1096Status = if ($event1096) { “Event 1096 Found” } else { “Event 1096 Not Found” }
$event1015Status = if ($event1015) { “Event 1015 Found” } else { “Event 1015 Not Found” }

# Add the result to the array
$results += [PSCustomObject]@{
Machine = $machine
Online = $true
FileDate = $fileDate
Event1096 = $event1096Status
Event1015 = $event1015Status
}
} else {
Write-Host “$machine is offline.”

# Add the result to the array
$results += [PSCustomObject]@{
Machine = $machine
Online = $false
FileDate = $null
Status = “Offline”
Event1096 = “N/A”
Event1015 = “N/A”
}
}
}

# Export the results to a CSV file
$results | Export-Csv -Path $outputFilePath -NoTypeInformation

Write-Host “Results have been saved to $outputFilePath.”

 

RDS Session hosts don’t accept new connections after reboot – despite not being in drain mode

Recently – i have had the following scenario:

RDS Farm with 2 x DNS-RR brokers and approx 30 session hosts, all server 2022.

Some session hosts, seemingly randomly, after a reboot will all look ok, but won’t accept any connections.

cycling the session host to not accept new connections/accept new connections would bring the server “back”

After looking through the logs and posting on a few forums (and getting some exceedingly poor responses) – i came to a point where i knew i could implement a “hack” – but would prefer to find the root cause.

To that end, i engaged Andy from https://purerds.org/ – who i’d previously worked with and seems to have that “next level” RDS knowledge – partially for a sanity check that i hadn’t missed something – and partially with the hope that he had seen this before.

The guts of it is:

  • In SQL – the RDSHProperty table shows a “drainmode” of “0” for all servers – so the servers not accepting connections is not recognised by the brokers (as we expected)
  • In SQL – the TargetProperty table shows a “serverMaxActiveSessions” of “10000” for all servers – in line with the above
  • In the log “Microsoft-Windows-TerminalServices-SessionBroker-Client/Operational”
    • We can see the server leave the farm @ 1am (reboot time) with
      • Event Id 1283
      • Remote Desktop Services successfully left a farm on the Connection Broker server <broker1>;<broker2>
    • But no corresponding entry to re-join the farm (unlike the healthy servers)
  • If I restart the service “TermService” on the local server, I get the following events (as expected – but just for the sake of documenting things)
    • EventId 1280 – Remote Desktop Services failed to join the Connection Broker on server <broker1>;<broker2>. Error: Current async message was dropped by async dispatcher, because there is a new message which will override the current one.
    • EventId 1281- Remote Desktop Services successfully joined a farm on the Connection Broker server <broker1>;<broker2>

In the end, we were unable to find a root cause, so i ended up using the following powershell script as a scheduled task on each session host:

$LogPath = “C:\Windows\Temp”
$LogName = “RDSRestartOnJoinFail.log”
$startTime = (Get-Date).AddHours(-12)

#Logging
Function Write-Log {
[CmdletBinding()] param(
[Parameter()] [ValidateNotNullOrEmpty()] [string]$Message,

[Parameter()] [ValidateNotNullOrEmpty()] [ValidateSet(‘Information’,’Warning’,’Error’,’Success’,’Break’)] [string]$Severity = ‘Information’,

[Parameter()] [ValidateNotNullOrEmpty()] [ValidateSet(‘Console’,’LogFile’,’Both’)] [string]$LogType = ‘Console’
)

$Date = (Get-Date).toString(“yyyy/MM/dd HH:mm:ss”)
$LogString = $Date + “, ” + $Severity + “, ” +$Message
If ($LogType -eq “Console” -or $LogType -eq “Both”) {
If ($Severity -eq “Information”) { Write-Host $LogString -foregroundColor Blue}
If ($Severity -eq “Warning”) { Write-Host $LogString -foregroundColor Yellow}
If ($Severity -eq “Error”) { Write-Host $LogString -foregroundColor Red}
If ($Severity -eq “Success”) { Write-Host $LogString -foregroundColor Green}
If ($Severity -eq “Break”) { Write-Host $LogString -foregroundColor White}
}

If ($LogType -eq “LogFile” -or $LogType -eq “Both”) {
Add-Content $LogPath\$LogName -value $LogString }
}

#Main
$Events = Get-WinEvent -FilterHashtable @{ProviderName = “Microsoft-Windows-TerminalServices-SessionBroker-Client”; LogName = “Microsoft-Windows-TerminalServices-SessionBroker-Client/Operational”; id=’1281′; StartTime = $startTime} -ErrorAction SilentlyContinue

if ($events.Count -eq 0) {
Write-Log -Message “No events with ID 1281 found in the past 12 hours – Can assume that machine has not re-joined the farm. Restarting TERMSERV service” -Severity Warning -LogType LogFile
Restart-Service -Name TermService -Force
} else {
Write-Log -Message “$($events.Count) events with ID 1281 found in the past 12 hours – Can assume that machine HAS re-joined the farm” -Severity Information -LogType LogFile
}

Pet insurance australia – just shit…

Dogs…. just fluffy balls of awesomeness right ?

Just like we have health insurance, i got pet insurance for our first Golden Retriever – who turned 11 a last month, through Pet Insurance Australia… as they seemed to be ok-ish based on the online reviews… acknowledging that its incredibly difficult to discern a real review from a bot-farm review anymore.

He’s had a full life of playing with other dogs (his favourite), his little human, his therapy dog work and the rest of our family… like most goldens, he’s pretty much universally loved… because he’s fucking awesome and might well be the nicest creature on the planet – ever.

All the way back in 2016, i got pet insurance for him because – risk and risk mitigation. At the time it was around the $500 a year mark.

Fast forward to yesterday (July 2024) – the premiums are now approx $2200 for the upcoming renewal. One one hand, i understand inflation and that his risk profile has changed now he’s older… on the other – isn’t that what i paid premiums for the last 8 years to help cover ?

When i rang to cancel the policy, i got the same old bullshit, including an offer to give us 3 months free… which really sealed the deal for me. If you can offer 3 months for free, then you’re just price gouging (like most corporates at the moment, i’m not saying this is isolated) rather than increasing prices in line with inflation.

Fuck you Pet Insurance Australia…. there aren’t many sacred things left in the world – but the health of doggies everywhere is one of them – you don’t fuck with that…. may you all get bowel cancer and die a long, incredibly painful death.

Moving from Synology to QNAP

My Synology 2413+ 12 bay NAS recently died after 12 years of service.

This NAS was primarily used as:

  • an iSCSI backup target for Veeam
  • Video recording for home security cameras
  • Media storage

Overall, i was pretty happy with the unit itself – but as per most companies these days, support was non-existent…. so when i did run into an issue, i was on my own.

Due to that, and Synology not being able to answer what would happen with my surveillance station licenses, i made the decision to go for a QNAP as:

  • It was a little cheaper for better hardware specs (this is in the 8-bay desktop model i was looking at – may be different for other models)
  • QVRPro – the equivalent of Synology surveillance station is free for up to 8 cameras – and i only use 4. There is apparently a 14 day retention time on video at the “free” license level…. and while i would prefer it to be 31 days…. its going to be fine most of the time.

In the ways im interested in, the QNAP has so far proven to be quite good, its setup and joining to an AD domain was simple and painless, adding disks, storage pools and volumes was easy and clear, QVRPro setup had very minor hiccups (more due to my understanding than the software)… but, it hasn’t been all great. The issues i have noticed so far:

  • The lack of a Synology Hybrid RAID equivalent isn’t a disaster, but disappointing…
  • Due to the above, i have purchased some more 8TB disks (previously had a mix of 6TB and 8TB) – the time taken to expand/repair is significant (as expected) – but the poor thing has been the performance of the device while this is occuring. Trying to stream anything during this process has been pointless – with constant dropouts. Having the performance degrade during a repair or expand is not unexpected – but not to the point of drop-outs.

Will be interested to see the performance difference once the rebuild has finished.

Win RM fails on DC with event ID 142

For a while i have had a niggling issue where on a DC that is used by a number of in-house coded applications, WinRM would fail intermittently with the following:

Log : Microsoft-Windows-WinRM/Operational

EventID : 142

Event Message: WSMan operation Enumeration failed, error code 2150859046

There isn’t much to go on for this error when googling – and MS support – well… no point in trying that.

After verifying permissions and configuration, checking server resources etc… i was at a point where i didnt know how to “fix” it or even have any leads.

I initially put in a simply script to restart the service nightly… but every now and again, the stop of the service would hang…. so i’d have to kill the process.

I’ve ended up going down a path of:

  • Attaching a scheduled task to eventID 142
  • To get around powershell restrictions – have it launch a batch file containing

reg add HKLM\SOFTWARE\Policies\Microsoft\Windows\PowerShell /v ExecutionPolicy /t REG_SZ /d unrestricted /f
powershell.exe -NoProfile -NoLogo -NonInteractive -ExecutionPolicy Unrestricted -File C:\data\TerminateAndRestartWinRM.ps1
reg add HKLM\SOFTWARE\Policies\Microsoft\Windows\PowerShell /v ExecutionPolicy /t REG_SZ /d AllSigned /f

TerminateAndRestartWinRM.ps1 contains

Start-Transcript C:\Data\WinRMTerminate.log

write-host “Getting the WinRM ProcessID”
$winRMService = Get-WmiObject -Class Win32_Service -Filter “Name=’WinRM'”
$processId = $winRMService.ProcessId

write-host “Terminating processID: $ProcessId”
Stop-Process -Id $processId -Force

write-host “Sleeping for 10 seconds to wait for process to terminate”
Start-Sleep -seconds 10

write-host “Starting WinRM”
# Start the WinRM service
Start-Service -Name WinRM

Stop-Transcript

 

Not the best thing ever – and i generally don’t like these types of “hacky” solutions…. but given that MS has moved from “mostly unsupported” to “completely unsupported” for everything that isn’t in Azure…. (which even then is mostly unsupported)… we don’t have much choice anymore.

AlwaysON VPN breaks after root certificate update

Scenario

  • After updating the internal CA root certificate, AlwaysOn VPN stops working with an error (at the user end) of “A Certificate could not be found that can be used with this Extensible Authentication Protocol
  • In this case, we were using an Enterprise integrated CA and renewed the root using the same signing keys – which should ease the process – at least for all windows clients
  • AOVPN is configured to use PEAP for authentication

 

Troubleshooting

  • Initially, 4 out of the 6 AOVPN servers had not received the new root cert from a GPupdate yet – so i forced that, restarted the service, but no difference
  • We discovered that the issue only occured on devices which had the updated trusted root cert in trusted root store. Additionally, for those that had updated, if we deleted the updated trusted root cert, AOVPN would connect again
  • We quickly found this article by the doyen of DirectAccess and AOVPN – https://directaccess.richardhicks.com/2020/10/19/always-on-vpn-ipsec-root-certificate-configuration-issue/  
    • While its a good article – it ended up not being our issue and actually led our down the wrong path a little
    • At the same time, for someone that wasn’t overly familiar with AOVPN (This was implemented by someone else and i’ve not had much to do with AOVPN) it was great, because i could look at the scripts and suss out some of the relevant powershell commandlets
  • After checking and re-checking every setting under the sun, a colleague could connect again after updating the client end
  • Once she worked that out, we then clarified and replicated the change on a different machine to be sure – and confirmed it was all good

 

Resolution

  • On a client machine, we updated the AOVPN configuration to include (i.e. tick the new as well as the old root cert) the updated root cert in 3 places under
    • <AOVPN connection name> / Properties / Security / Properties
    • <AOVPN connection name> / Properties / Security / Properties /Configure
    • <AOVPN connection name> / Properties / Security / Properties /Configure / Advanced
  • Confirm that the AOVPN connection is working
  • Export the profile using the script from https://directaccess.richardhicks.com/tag/profilexml/
  • Look at the xml – you should now see the thumbprints of both the “old” and “new” root certificate listed in multiple sections
  • Copy the section <EAPHostConfig> from its open xml tag to its close xml tag and insert into the “EAP xml” part of intune AOVPN configuration

Documenting AD ACL’s

A while ago i joined an organisation whose MS estate was in need of a significant amount of love, time and effort. Getting them off of 2012 R2 DC’s and onto 2022 DC’s, upgrading the forest/domain functional levels and getting replication times down were the obvious first jobs… but once they were done – there was so many other things to do – it was hard to know what to go with first. So… i made a start on all of it at once – knowing that it would probably take all year to get the AD into a semblance of decent condition.

The more i looked, the more i found… one thing that was/is particularly disturbing is that the DS ACL’s have been fucked with at the top level – and flowed down to all descendant objects for some admin accounts, service accounts etc…. stuff that clearly doesnt need, or has never needed that level of access….

Before changing anything, the goal is to document the permissions – as is a spaghetti of inherited, non-inherited and multi-nested groups applied at many different levels…. resulting in one severe head-fuck for anyone trying to do anything effective with permissions delegation.

First of all i tried

https://powershellisfun.com/2022/08/22/report-on-active-directory-container-permissions-using-powershell/

A decent solution – which works perfectly in my test environment, but in the prod environment with thousands of OU’s and a stupid level of excessive custom permissions, uses approx 4GB of memory before dying consistently. So while this is definitely a good script – it just doesn’t work in this prod environment…. and that’s because of how fucked the environment is, not because the script is bad.

I moved on and found

https://github.com/canix1/ADACLScanner

Which seems to be an exceedingly nice (powershell based) AD ACL solution…. an optional GUI, plenty of configuration options and great output options – a really good solution.

For me – i needed to tick “inherited permissions”… as it is important for me to demonstrate how incredibly stupid (in case you haven’t noticed, I’m still flabbergasted that someone would do this….) it is to allocate permissions at the top level of a domain – along with having complete documentation.

 

Well done & thanks to the author – Robin Granberg – for creating a genuinely awesome tool.

 

Now comes the hard bit – removing the permissions without breaking anything.

Removing folders with a trailing space on NTFS volumes

At the moment im cleaning up a very poorly designed and implemented file server structure.

and before you say it – a large amount of data has been moved into teams/sharepoint/onedrive etc already – but the storage costs were getting excessive – so there is still plenty of data on prem.

One of the issues ive run into while cleaning up un-user DFS-R replicas is folders that have spaces at the end of the name, such as “D:\Sales\December ” for example – which NTFS does not support…. but seems to be something Mac users do regularly (for unknown reasons)

These folders cannot be deleted via the GUI.

Open an elevated command prompt and

rmdir /q “\\?\D:\Sales\December “

AADConnect – get Sync’ed and excluded OU’s via powershell

AADConnect has a JSON file and the ability to export – and there are also various AADConnect documenters out there… but sometimes you just want to get a core piece of info without having to start the GUI of wade through many pages of JSON.

Get-ADSyncConnector | select Name

Note the name of your “internal” domain as the connector (the one that doesn’t have “AAD” at the end)

(Get-ADSyncConnector -name <ConnectorName>).Partitions.ConnectorPartitionScope.ContainerInclusionList

(Get-ADSyncConnector -name <ConnectorName>).Partitions.ConnectorPartitionScope.ContainerExclusionList