AlwaysON VPN breaks after root certificate update

Scenario

  • After updating the internal CA root certificate, AlwaysOn VPN stops working with an error (at the user end) of “A Certificate could not be found that can be used with this Extensible Authentication Protocol
  • In this case, we were using an Enterprise integrated CA and renewed the root using the same signing keys – which should ease the process – at least for all windows clients
  • AOVPN is configured to use PEAP for authentication

 

Troubleshooting

  • Initially, 4 out of the 6 AOVPN servers had not received the new root cert from a GPupdate yet – so i forced that, restarted the service, but no difference
  • We discovered that the issue only occured on devices which had the updated trusted root cert in trusted root store. Additionally, for those that had updated, if we deleted the updated trusted root cert, AOVPN would connect again
  • We quickly found this article by the doyen of DirectAccess and AOVPN – https://directaccess.richardhicks.com/2020/10/19/always-on-vpn-ipsec-root-certificate-configuration-issue/  
    • While its a good article – it ended up not being our issue and actually led our down the wrong path a little
    • At the same time, for someone that wasn’t overly familiar with AOVPN (This was implemented by someone else and i’ve not had much to do with AOVPN) it was great, because i could look at the scripts and suss out some of the relevant powershell commandlets
  • After checking and re-checking every setting under the sun, a colleague could connect again after updating the client end
  • Once she worked that out, we then clarified and replicated the change on a different machine to be sure – and confirmed it was all good

 

Resolution

  • On a client machine, we updated the AOVPN configuration to include (i.e. tick the new as well as the old root cert) the updated root cert in 3 places under
    • <AOVPN connection name> / Properties / Security / Properties
    • <AOVPN connection name> / Properties / Security / Properties /Configure
    • <AOVPN connection name> / Properties / Security / Properties /Configure / Advanced
  • Confirm that the AOVPN connection is working
  • Export the profile using the script from https://directaccess.richardhicks.com/tag/profilexml/
  • Look at the xml – you should now see the thumbprints of both the “old” and “new” root certificate listed in multiple sections
  • Copy the section <EAPHostConfig> from its open xml tag to its close xml tag and insert into the “EAP xml” part of intune AOVPN configuration

Removing folders with a trailing space on NTFS volumes

At the moment im cleaning up a very poorly designed and implemented file server structure.

and before you say it – a large amount of data has been moved into teams/sharepoint/onedrive etc already – but the storage costs were getting excessive – so there is still plenty of data on prem.

One of the issues ive run into while cleaning up un-user DFS-R replicas is folders that have spaces at the end of the name, such as “D:\Sales\December ” for example – which NTFS does not support…. but seems to be something Mac users do regularly (for unknown reasons)

These folders cannot be deleted via the GUI.

Open an elevated command prompt and

rmdir /q “\\?\D:\Sales\December “

Error: SWbemObjectEx: Invalid index when trying to update a NIC using SConfig on server core

When using SConfig on a server core install, i was getting the following error

had similar issues when trying to configure the NIC using powershell.

Thanks very much to Mike and his post @ https://mikeconjoice.wordpress.com/2017/01/24/windows-server-core-error-swbemobjectex-invalid-index/

for pointing out that it was because IPv6 was not bound to the adapter.

Using the following powershell worked for me

Enable-NetAdapterBinding -Name Ethernet –ComponentID ms_tcpip6

 

the other important thing here is that unbinding IPv6 from adapters is a relatively common and completely silly practice. It frequently causes issues and doesn’t even achieve the goal of properly disabling IPv6 on the machine.

If you want to disable IPv6 – do it properly – via the registry as per

https://learn.microsoft.com/en-us/troubleshoot/windows-server/networking/configure-ipv6-in-windows

LocationHKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip6\Parameters\
Name: DisabledComponents
Type: REG_DWORD
Min Value: 0x00 (default value)
Max Value: 0xFF (IPv6 disabled)

Windows Server 2019 networking fails on Hyper-V 2012 (non R2) host once CU 10-2022 applied

Yes, i know… no-one should be running Server 2012 anymore…. but due to this clients ludicrous outsourcing agreement, upgrading servers is simply too expensive – so they still have some 2012 (non-R2) Hyper-V hosts.

Anyway – quite a specific issue

In short – A server 2019 guest sitting on a Hyper-V 2012 (non-R2) host will have issues with networking once Server 2019 CU 10-2022 is applied.

There is no workaround that I’m aware of – and the solution in this case was simply to move to a different host.

RDS Farm HA and Microsoft OLE DB Driver for SQL Server/SQL native client

This post is written in August 2022 – and may or may not current by the time you read this!

 

Recently i was was refreshing a larg-ish RDP farm to newer OS’s/version ready for an upgrade of the core business application.

As part of this, new Server 2019-based RD Brokers were to be setup – and setting up the HA proved to be more challenging than it has been in the past.

And before you say it – the vendor of the business application presented by RDS doesn’t support AVD/WVD – so “just put it in the cloud” is not an option.

 

Most of the articles around the web talk about using the SQL native client – which is now deprecated. Then in turn, recommends the OLE DB Driver for SQL.

Trying to use the OLE DB driver resulted in may lost hours for myself, the clients best tech and their SQL guru.

The most current MS document that we could find, talks about using ODBC 13 – which according to these tables, is not supported on anything above SQL 2017 and Windows Server 2012…. so not exactly current.

Following a few links – we can see that the current ODBC driver version (at time of writing) is 18.1.1.1 – available from https://docs.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server?view=sql-server-ver16  which supports current OS’s and SQL versions.

 

Our findings were:

  1. We could not get the OLE DB driver to work… we suspect it was due to to security issues – but do not know for sure – as the HA setup for RDCB does not appear to have any verbose logs that i could find.
  2. The ODBC driver v13 worked fine with a connection string of DRIVER={ODBC Driver 13 for SQL Server};SERVER=tcp:server.com.au,1433;APP=Remote Desktop Services Connection Broker;Trusted_Connection=Yes;Database=RDFarm;Encrypt=yes;TrustServerCertificate=yes;Connection Timeout=30 – however, given that ODBC driver v13 was not supported on the platforms we were deploying with (Server 2019 and SQL 2019) – this made us uncomfortable
  3. The ODBC driver v18 worked fine with a connection string of DRIVER={ODBC Driver 18 for SQL Server};SERVER=tcp:server.com.au,1433;APP=Remote Desktop Services Connection Broker;Trusted_Connection=Yes;Database=RDFarm;Encrypt=yes;TrustServerCertificate=yes;Connection Timeout=30 – This version of the driver has current version OS and SQL support – but was not mentioned in the RDP broker article. Some of you may be saying “so what”…. and the reality is when you have many thousands of users reliant on a core business app 24×7 – little things like that become important. Its incredibly difficult these days to get onto someone within MS support that even knows what your talking about – so saying you have deviated from an official document only makes things even more difficult.
  4. I have lodged a request for modification/clarification to the official doc here – and we’ll see what happens with that.

VMWare guest server CPU and memory issues

Got a call from a client who was having issues with the SQL instance on their SCCM server – and investigation showed that the SQL service was crashing due to various memory errors (event log and SQL logs) – but the descriptions weren’t overly helpful.

The SQL exception.log shows errors such as

09/12/19 12:23:58 spid 125 Exception 0xc0000005 EXCEPTION_ACCESS_VIOLATION writing address 000001E1F29E3390 at 0x000001E1F29E3390

 

After a bit of investigation, i noticed that the “system” task in task manager was constantly utilising between 20-40% CPU. The “system” task has no associated command line in task manager, so tracking it down required the use of the ever-helpful sysinternal tools – in this case, process explorer.

Once opening process explorer, you can go to the properties of the “system” process and view all its threads – and most importantly, sort by CPU usage.

In this case, i could see that Vmmemctl.sys was using the vast majority of the CPU time within this process.

A quick google lead me to this https://kb.vmware.com/s/article/2138677

While i wasn’t getting blue screens, i was definitely getting memory errors – so this lined up.

Checking the installed programs, i could then see that VMWare tools 10.2.5 was installed, but so was 9.1.

Removed VMWare tools 9.1 from the server and the CPU use immediately dropped – and the memory issues, at least so far, are not longer occurring.

Surprisingly, this didn’t seem to require a reboot after the VMWare tools 9.1 uninstall.

I guess the moral of this story (post) is – keeping your VMWare tools version up to date is wise….. but don’t forget to uninstall old versions as well.

Windows and NTP

It’s important that Windows time is set correctly – but how Windows time works seems to be a poorly understood area.

In this article, I’ll try to clear up the concepts and explain what is, in my opinion, the best way to implement time services throughout your domain(s).

Background

  • Windows, by default, will automatically set its time from the domain controller which holds the FSMO role “PDC emulator”
  • In a multi-domain environment, the PDCe in the forest root domain is the overall master
  • Port 123 (NTP) is used for all communications
  • All other DC’s will, by default, look for the PDCe as their time source. There is no need to set anything here unless something has gone wrong.
  • All workstations will, by default, look for the PDCe as their time source. There is no need to set anything here unless something has gone wrong.
  • Windows 2016 time service offers (optionally) more accurate time services than previous versions – https://docs.microsoft.com/en-us/windows-server/networking/windows-time-service/accurate-time

Setting up NTP on the PDCe

I strongly recommend utilising group policy to set up NTP on your PDC emulator, not the command line. Using a group policy makes the settings a) obvious and b) easily transportable to new DC’s as your migrate upgrade in the future

  • Create a new GPO, I name mine “Domain Controller – Set NTP on PDCe”
    • Narrow it down to your PDCe by either
      • Removing “authenticated users” and adding your current PDCe (This will need to be manually updated if/when the PDCe role moves)
      • Utilising the WMI query “Select * from Win32_ComputerSystem where DomainRole = 5” (This will auto-update when the PDCe moves)
    • Set the following within the group policy
      • Computer Configuration > Administrative Templates > System > Windows Time Service > Time Providers
      • Enable Windows NTP Client: Enabled
      • Enable Windows NTP Server: Enabled
      • Configure Windows NTP Client: Enabled
        • NtpServer: <YourExternalNTPServer1>,0x1 <YourExternalNTPServer2>,0x1 (for Adelaide based clients, i used ntp.internode.on.net and ntp.adelaide.edu.au – a local ISP and a local University – but these could be any publicly available NTP server)
        • Type: NTP
        • CrossSiteSyncFlags: 2
        • ResolvePeerBackoffMinutes: 15
        • Resolve Peer BAckoffMaxTimes: 7
        • SpecilalPoolInterval: 3600
        • EventLogFlags: 0

Commands to check status and troubleshoot

  • w32tm /monitor – this exceedingly useful command will show you the status of all DC’s in the domain, where they are configured to get their time source from and their offset from the authoritative time source
  • if a domain controller is having issues
    • w32tm /config /syncfromflags:domhier /update
    • net stop w32time
    • net start w32time
  • w32tm /query /status

Using policy to set clients to look at AD for time

This is the default behaviour of windows – and you should not need to set this, however, for some places I’ve found we have had to

  • Computer Configuration -> Administrative Templates -> System -> Windows Time Service -> Time Providers
    • Configure Windows NTP Client: Enabled
      • NtpServer: <YourDC1>,0x1 <YourDC2>,0x1
      • Type: NTDS5
      • CrossSiteSyncFlags: 2
      • ResolvePeerBackoffMinutes: 15
      • ResolvePeerBackoffMaxTimes: 7
      • SpecilalPoolInterval: 3600
      • EventLogFlags: 0

References

https://docs.microsoft.com/en-us/windows-server/networking/windows-time-service/how-the-windows-time-service-works

https://docs.microsoft.com/en-us/windows-server/networking/windows-time-service/accurate-time

https://blogs.technet.microsoft.com/nepapfe/2013/03/01/its-simple-time-configuration-in-active-directory/

https://theitbros.com/configure-ntp-time-sync-group-policy/

 

Always on VPN – technical follow up

As a follow up to my article a few days ago on Always on VPN vs DA – http://www.hayesjupe.com/always-on-vpn-and-da-a-comparison/ – an employee of mine was having a test with some spare time today and came up with the following findings.

  • Configured and tested the VPN server using L2TP/IPSec + PSK, User/Pass using MS-CHAP-V2
  • Attempted to export the VPN profile using the Microsoft script MakeProfile.ps1 (https://docs.microsoft.com/en-us/windows-server/remote/remote-access/vpn/always-on-vpn/deploy/vpn-deploy-client-vpn-connections#bkmk_fullscript)
    • Doesn’t work if you’re using Folder Redirection, as it tries to write to C:\User\UserID\Desktop instead of using %desktop%
    • Adjusted the script to just write to C:\Temp and it works fine
  • Ran the generated VPN_Profile.ps1 and it comes back with “A general error occurred that is not covered by a more specific error code”. After doing some troubleshooting and googling, worked out that the MakeProfile.ps1 has “<AlwaysOn>true</AlwaysOn>” in it, when it actually needs to be “<AlwaysOn>True</AlwaysOn>” (upper-case T). Thanks Microsoft.
  • Finally got it imported. Attempted to connect and received an error that the destination address didn’t exist.
    • Checked the XML, the “Servers” item was populated correctly
    • Checked the VPN connection in Windows, the “Server” item wasn’t populated. Awesome.
  • Populated the Server field manually, tried to connect, failed.
    • The export also didn’t bring across the PSK
    • Populated the PSK, works.

To sum up:

  • Microsoft’s MakeProfile.ps1 is helpful, but isn’t even remotely reliable for exporting all of the settings
  • No idea why the server isn’t be populated. It’s in the XML, it just doesn’t populate it
  • There doesn’t seem to be a way of using PSK instead of certs – the XML doesn’t seem to have any options for specifying a PSK (that I’ve been able to find)

 

So let me revise my earlier “its very much a v1 product” to “its very much a v0.1 product”

Always on VPN and DA – a comparison

Date: 04/09/2018

For a while now, Microsoft salespeople have been telling customers that Direct Access is old technology and that Windows 10 always on VPN is the way to go.

There are a number of points which are widely acknowledged when it comes to DA vs Always on VPN, most of which can be found here

https://docs.microsoft.com/en-us/windows-server/remote/remote-access/vpn/vpn-map-da

What is the Difference Between DirectAccess and Always On VPN?

https://directaccess.richardhicks.com/2017/12/04/3-important-advantages-of-always-on-vpn-over-directaccess/

As usual, the Microsoft article, while technically correct, is not really very helpful. The ones from Richard Hicks are far more helpful, but still misses a few key points in my opinion.

With that in mind, lets have a bit more of a look at comparing these technologies.

 

Windows version support

Direct Access is supported on Windows 7, 8, 8.1 and 10 enterprise editions only. The target machines must also be domain joined.

Always on VPN is only supported with Windows 10 (1607 and newer), however, any edition of windows 10 (standard etc.) and the target machines can be domain joined or in a workgroup, or part of Azure AD.

Depending on where your organization is with its Win 10 migration, always on VPN may not be an option for you (but probably will be in the future at some point). Likewise, DA might not be an option if you do not have enterprise licensing or non-domain joined machines.

 

Backend

Direct Access requires a direct access server (or multiple if it is going to be made highly available), certificates, a public IP/DNS entry and port 443 opened at the firewall.

Always on VPN can connect to multiple vendors back ends such as Cisco, Fortinet, F5 etc.

If you are already locked in to a certain vendors VPN – always on VPN being able to leverage that is great.

 

Future Development

Direct Access is in Windows Server 2016 and we expect will be in windows Server 2019 and is fully supported, however, there are many places around the internet that state it is no longer under active development – and considering there have been no improvements in 2016 (over 2012 R2) – that does (unfortunately) seem to be the case.

Always on VPN comes across to me very much like a v1 product at the moment, we can assume that it will get further development… machine-based sessions got added in Windows 10 1709, but its hard to know with Microsoft. The current attitude seems to be to attack anything that isn’t azure or O365, even their own on-premise products – and always on VPN would tend to sit in the “on-premise” category at the moment….

 

Supportability

The first version of DA (in UAG) was a support nightmare, but it became quite servicable in 2012 R2 and 2016. It can be a steep learning curve for people that haven’t dealt with DA before, however it does actually have quite good support tools, even though they are a little disparate. There is a reasonable-ish level of community support for DA

Always on VPN as stated above very much seems to be a v1, Microsoft doco is generally unusable, TechNet no longer exists, so guides written by 3rd party bloggers etc remain important – and due to relative newness of the solution, there aren’t many guides out there – but there are a few.

 

Management

The DA management interface is pretty good and powershell commands (both client and server) are pretty complete.

Always on VPN – xml files and powershell…. very v1. Cant configure it via the three MS endpoint delivery systems natively – Intune, SCCM or group policy…. so yer, this is not up to scratch.

 

Performance

DA performance has never been great, the solution utilises multiple encapsulations and has to convert from IPV4 to IPV6 in most instances. Over a reasonable internet connection, its fine for the vast majority of things, but don’t try and transfer large files etc. I tend to think of it as a connection technology – the focus is on ease of connection, not performance.

Always on VPN, depending on what type of VPN you are using, is not going to have as many encapsulations and is going to be utilizing a tried and tested VPN technology. The result of this is that performance should be substantially better than DA.

 

Network detection

DA utilizes NLS (network location server) to detect if the client is on the internal network or not. The issue with this is that if your NLS does down, the client wont be able to connect to the network at all. This was one of the things I would hoping would be addressed in a future version of DA, the ability to configure multiple NLS’s in a “OR” configuration.

Always on VPN utilizes the DNS suffix of the network connection to determine if Always on should be utilized or not. This is definitely a step forward – it would be nice in the future to be able to define multiple DNS suffixes.

 

Machine vs user based VPN

DA can bring up the infrastructure tunnel prior to the full DA connection. This is incredibly good for machines which are permanently off the corporate network, as they can still apply computer policy prior to logon, update via internal WSUS or SCCM services (and AV definitions) without logon etc.

Always on VPN is a user based solution – so the user must logon before the VPN tunnel is established….  so no go for the machine updating policy, getting updates etc when not logged on. Before you say “big deal”, you try getting a user without admin rights to update group policy… now try that times all the roaming users in your org.

As of Windows 10 1709 – device based VPN sessions are now also available and require the machine to be domain-joined running enterprise or education editions….  which is fine, as that’s where device based connections are needed – when the machine is domain joined. The tunnel is IKEv2 by default, but can be configured for SSTP.

 

Locking down access

DA has no options for locking down access. Just none – they either have DA or they do not. Theoretically you could prevent access to certain locations/devices at a layer 3 level… but that’s a lot of effort. You could also create multiple DA services, but again, its a lot of effort…. it would have been nice to see more development in this area.

Always on VPN has many more options when it comes to locking down the connections – and different configurations can be deployed to different users. The current options are pretty good.

 

Split and force tunneling

DA supports both, but is difficult to configure and effectively unsupported when used in force tunneling mode

Always on VPN supports force tunneling and given its different architecture to DA, should less of an issue… but I haven’t actually tried it as yet.

 

Manage-out

DA manage-out functionality is great, its not hard to configure, but it is “harder than it needs to be” – but much of this complexity is due to the IPv4/6 transition technologies.

Always on VPN does not have the IPv4/6 transition issues and simply allows you to configure DNS registration in the xml via – <RegisterDNS>true</RegisterDNS> – which in turn effectively enables manage out – much easier.

 

Summing it up

As per many things, its whatever suits your environment and needs best.

If DA ticks boxes for your environment as it is right now – its still a good technology, particularly if you wont be off of Windows 7 for a few years yet – but don’t expect it to get any improvements – and expect to have to migrate off it at some point (but, assuming its in Server 2019, not for quite a while)

Always on VPN clearly has some pretty good stuff around access control and back end support. The windows 10 (1709+ for the most options at time of writing) requirement will become less of any issue over the next few years, but it really needs to get its deployment methods sorted out – with native support in Intune and SCCM….. and ideally, group policy (but I think the last one is a long shot)

Creating large stand-alone media from SCCM – issues when the HP SPP 2018.03 is a package in the TS

We often create one task sequence for all workstation builds and another for all server builds, utilising task sequence variables to perform the decision making within the task sequence.

One of the downsides of this is that the task sequence binaries can get quite large, especially for server builds where we have current and legacy versions of the HP SPP, the Dell SUU and (at a minimum) server 2012 R2 and server 2016.

This isn’t an issue for network based builds, as non-required content is simply skipped, however, for media builds, it can lead to 40GB+ requirements, which the SCCM console doesnt handle well.

This is where Rufus comes in.

Rufus can help out by allowing use of larger hard drives (just tick the “list USB hard drives” option) and can apply bootable iso’s generated by SCCM (utilising the “unlimited size” option) to a USB hard drive.

This has been incredibly useful for us in the past, utilising large hard drives as USMT stores at slow link sites for client deployment.

In this instance, ive been using Rufus to apply a 50GB server build iso to a hard drive, but keep getting presented with a warning

“This ISO image seems to use an obsolete version of ‘vesamenu.c32’. Boot menus may not display properly because of this.”

Irrevelant of how you proceed (allow Rufus to download an update or not), the drive is not bootable.

Upon investigation of the rusfus logs and the resultant media, i found that syslinux.cfg was actually pointing to my HP SPP package.

This forum post then confirmed that Rufus is finding a syslinux.cfg, assuming that it is “real” bootable media and hence the ‘vesamenu.c32’ prompt.

After a few hours of troubleshooting and trying to get around it, i simply removed the “usb” and “system” folders from my HP SPP packages (as we wont be booting to it ever, its only for use in SCCM), re-created my standalone media iso – then used Rufus to write the bootable iso to the USB HDD, this time with no issues.

I realise this is a fairly obscure issue , but hopefully it helps someone.