Recently – i have had the following scenario:
RDS Farm with 2 x DNS-RR brokers and approx 30 session hosts, all server 2022.
Some session hosts, seemingly randomly, after a reboot will all look ok, but won’t accept any connections.
cycling the session host to not accept new connections/accept new connections would bring the server “back”
After looking through the logs and posting on a few forums (and getting some exceedingly poor responses) – i came to a point where i knew i could implement a “hack” – but would prefer to find the root cause.
To that end, i engaged Andy from https://purerds.org/ – who i’d previously worked with and seems to have that “next level” RDS knowledge – partially for a sanity check that i hadn’t missed something – and partially with the hope that he had seen this before.
The guts of it is:
- In SQL – the RDSHProperty table shows a “drainmode” of “0” for all servers – so the servers not accepting connections is not recognised by the brokers (as we expected)
- In SQL – the TargetProperty table shows a “serverMaxActiveSessions” of “10000” for all servers – in line with the above
- In the log “Microsoft-Windows-TerminalServices-SessionBroker-Client/Operational”
- We can see the server leave the farm @ 1am (reboot time) with
- Event Id 1283
- Remote Desktop Services successfully left a farm on the Connection Broker server <broker1>;<broker2>
- But no corresponding entry to re-join the farm (unlike the healthy servers)
- We can see the server leave the farm @ 1am (reboot time) with
- If I restart the service “TermService” on the local server, I get the following events (as expected – but just for the sake of documenting things)
- EventId 1280 – Remote Desktop Services failed to join the Connection Broker on server <broker1>;<broker2>. Error: Current async message was dropped by async dispatcher, because there is a new message which will override the current one.
- EventId 1281- Remote Desktop Services successfully joined a farm on the Connection Broker server <broker1>;<broker2>
In the end, we were unable to find a root cause, so i ended up using the following powershell script as a scheduled task on each session host:
$LogPath = “C:\Windows\Temp”
$LogName = “RDSRestartOnJoinFail.log”
$startTime = (Get-Date).AddHours(-12)
#Logging
Function Write-Log {
[CmdletBinding()]
param(
[Parameter()]
[ValidateNotNullOrEmpty()]
[string]$Message,
)
$Date = (Get-Date).toString(“yyyy/MM/dd HH:mm:ss”)
$LogString = $Date + “, ” + $Severity + “, ” +$Message
If ($LogType -eq “Console” -or $LogType -eq “Both”) {
If ($Severity -eq “Information”) { Write-Host $LogString -foregroundColor Blue}
If ($Severity -eq “Warning”) { Write-Host $LogString -foregroundColor Yellow}
If ($Severity -eq “Error”) { Write-Host $LogString -foregroundColor Red}
If ($Severity -eq “Success”) { Write-Host $LogString -foregroundColor Green}
If ($Severity -eq “Break”) { Write-Host $LogString -foregroundColor White}
}
If ($LogType -eq “LogFile” -or $LogType -eq “Both”) {
Add-Content $LogPath\$LogName -value $LogString }
}
#Main
$Events = Get-WinEvent -FilterHashtable @{ProviderName = “Microsoft-Windows-TerminalServices-SessionBroker-Client”; LogName = “Microsoft-Windows-TerminalServices-SessionBroker-Client/Operational”; id=’1281′; StartTime = $startTime} -ErrorAction SilentlyContinue
if ($events.Count -eq 0) {
Write-Log -Message “No events with ID 1281 found in the past 12 hours – Can assume that machine has not re-joined the farm. Restarting TERMSERV service” -Severity Warning -LogType LogFile
Restart-Service -Name TermService -Force
} else {
Write-Log -Message “$($events.Count) events with ID 1281 found in the past 12 hours – Can assume that machine HAS re-joined the farm” -Severity Information -LogType LogFile
}