Server and SQL upgrades – lessons learned

Im just finishing up on a project where i was upgrading a bunch of servers from 2012 R2 to 2019 or 2022 (depending on what the associated app supported), including a bunch of SQL clusters.

I’ve always been SQL adjacent – working wit/upgrading/installing SQL for other products to utilise… so i have some incidental knowledge – but its not my core skill set.

Things of note from the upgrades were:

 

When performing an in-place OS upgrade – upgrade speed can be significantly increased if you remove old user profiles

Some of the servers i was upgrading had hundreds of profiles on them that had not been used for a year or more….. all servers had at least 20 “Account unknown” profiles 

 

SQL Error Logging

The best way to find the error log if any upgrade goes wrong is to look in the registry at

KEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\<instance/version>\MSSQLServer\Parameters\

You can then copy/paste the path to the error log and get some helpful errors out

 

SSISDB is the bane of SQL cluster upgrades

SQL 2014 and below don’t support replicating SSISDB via AAG, so before you service pack, this DB must be removed from the AAG replication and the passive nodes have the DB deleted.

SQL 2016 and above support replicating SSISDB – so service packs can be applied without having to remove SSISDB from anywhere

All SQL upgrades (e.g. SQL 2014 or 2016 to SQL 2019) do not allow SSISDB to be part of an AAG – so SSISDB must be removed from the replication group and have the copy on the passive nodes deleted first.

If you forget this, you will likely see an error message similar to 

Script level upgrade for database ‘master’ failed because upgrade step ‘SSIS_hotfix_install.sql’ encountered error 15151, state 1, severity 16

 

Starting SQL to fix issues

So – you have run into an issue with the upgrade, as, for example, SSISDB was still replicated….. but now you cant start the SQL service to delete it

This is where /T902 comes in handy

  • Get the short name of your SQL service (from services.msc)
  • open a elevated command prompt
  • net start MSSQL$Instancename /T902

You can then do what you need to the SQL configuration.

 

Reporting services

Reporting services in 2017 and above is not a straight upgrade from 2016 and below. There’s plenty of articles around the web on the upgrade process – but…..

 

During inventory, make sure your discover SSISDB and Reporting services instances

In hindsight, one of the things i would have focused on more in my pre-upgrade inventory script was to identify SSISDB and reporting services instances.

Many of these in the recent project were present but not actually needed/in-use and could just be uninstalled.

 

Cluster rolling upgrades

This is well documented – but just to make it nice and short (the MS doco makes it seem harder than it is)

  • Ensure SQL AAG and cluster resource active node is node “X”
  • Ensure failover is set to manual
  • Verify SQL AAG is healthy and all databases are sync’ed
  • Service pack the current version of SQL – so i will support server 2019
  • Node Y – Upgrade 2012R2 to 2016 – Check node is still able to join cluster
  • Node Z – Upgrade 2012R2 to 2016 – Check node is still able to join cluster
  • Node X – Failover SQL AAG and cluster core resources to another node (e.g. Node Z)
  • Node X – Upgrade 2012R2 to 2016 – Check node is still able to join cluster
  • Upgrade cluster functional level
  • Node X – Upgrade 2016 to 2019 – Check node is still able to join cluster
  • Verify SQL AAG is healthy and all databases are sync’ed
  • Node X – Upgrade SQL 20xx to SQL 2019 with current CU
  • Node X – Failover SQL AAG and cluster core resources back to node Z
    • Once you do this – you will not be able to fail over to other nodes until they are also upgraded. Replication will also stop to “lower” version nodes – don’t freak out when you see this (like i did on my first upgrade!)
  • Node Y – Upgrade 2016 to 2019 – Check node is still able to join cluster
  • Node Y – Upgrade SQL 20xx to SQL 2019 with current CU
  • Node Z – Upgrade 2016 to 2019 – Check node is still able to join cluster
  • Node Z – Upgrade SQL 20xx to SQL 2019 with current CU
  • Upgrade cluster functional level
  • On each database on Node Y an Node Z, you will need to go into SQL management studio and select “resume data movement” – this tells SQL to try again – which will now work – as the same version of SQL is in use across the cluster

 

Microsoft Edge works best with the latest Windows Updates

When installing, seemingly randomly i will get the following in the application event log and msi log for CrEdge

Microsoft Edge works best with the latest Windows Updates. Once you download updates and restart your device, rerun the installer.

This is particularly frustrating as

  • The device has all current Windows updates applied
  • The install works on thousands of other machines – but just has a smattering where it doesn’t with this error
  • The error is in no way actually helpful… it doesn’t specify what updates i am supposedly missing… so doesn’t actually help with troubleshooting in anyway. Not quite as bad as “the task failed successfully” – but not far off.

 

Fortunately, Dr google provided some assistance

Microsoft Edge install issues on some computers
byu/jasonin951 inMicrosoftEdge

 

Microsoft Edge works best with the latest Windows Updates Error
byu/xxx59712 inedge

 

The answer, for me was setting the following reg key in the task sequence prior to Edge installing

reg add HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Microsoft\EdgeUpdate /v Allowsxs /t REG_DWORD /d 1

 

The idea of preventing edge installs without providing an actual reason – genuinely bizarre behaviour by MS here.

 

SCCM – Config baseline to detect Windows optional features

Detecting and removing Windows 10/11 optional features via SCCM can be a bit of pain via inventory – as the class is not inventoried by default.

One alternative is using a compliance baseline – which provides a quicker turn-around for any remediation you may want to do and wont bloat your database with additional inventory.

Im going to use an example of the Windows 10 XPS printer.

  • Create a configuration item
  • Give the item a name, such as “CI – Detect XPS Printer”
  • Select your platforms – most likely Windows 10 and 11
  • Under Settings
    • Give the Setting a name: “Settings – XPS Printer Disabled”
    • Setting Type: Script
    • Data Type: String
    • Script: (Get-WindowsOptionalFeature -FeatureName Printing-XPSServices-Features -Online).State
    • Remediation script : Disable-WindowsOptionalFeature -FeatureName Printing-XPSServices-Features -Online
  • Compliance rules
    • Name: Compliance – XPS Disabled
    • Selected setting: The setting we made in the step above
    • Value: Disabled
    • Run the remediation script when this setting is noncompliant: Enabled

Then create your baseline which contains this CI and deploy to your desired collection(s).

This can obviously be adapted to work with any optional feature – Use Get-WindowsOptionalFeature -Online to help find the exact name of the feature you are looking for.

Windows 10 version numbers – and how they actually show up in SCCM when added as WIM

Windows 10 version numbers when running within Windows are fairly well known are are available from https://en.wikipedia.org/wiki/Windows_10_version_history 

However, when adding the WIMs to SCCM as OS images, the numbers don’t match up. When aligned with the absurd method of obtaining Windows 10 21H2 which does not give you any information on which version your getting  (detailed here – https://www.hayesjupe.com/windows-10-21h2-getting-enterprise-edition-and-extracting-when-using-m365-licensing/ ) it can create confusion/concern that the correct version has actually been downloaded / imported.

In order to allay this (somewhat) – i have imported each version into my SCCM environment and have a screenshot of the results below.

As you can see – the version numbers of the images do not match up with the version numbers of the OS’s they contain….

And when compared to actual versions within Windows

Windows 10 Version Name                               Windows 10 actual version number                            Version number thats shows up next to imported image in SCCM

Windows 10 1903                                                18362                                                                              18362.356

Windows 10 1909                                                18363                                                                              18362.418

Windows 10 2004                                                19041                                                                              19041.264

Windows 10 20H2                                               19042                                                                              19041.631

Windows 10 21H1                                               19043                                                                              19041.928

Windows 10 21H2                                               19044                                                                              19041.1288

 

Windows 10 21H2 – getting Enterprise edition and extracting when using M365 licensing

Recently i had a client which had M365 licensing, but was a little on the smaller side, so did not have an EA or earlier Windows 10 enterprise licensing through the VLSC.

In this situation, MS would have you install the Pro version of Windows 10 – and let you upgrade to enterprise via M365 subscription activation – https://docs.microsoft.com/en-us/windows/deployment/windows-10-subscription-activation 

When deploying via SCCM – it seems counter-intuitive to deploy a version of the OS you don’t want to actually use… and have an additional step where something can go wrong (and considering MS support has become completely un-usable – trying to avoid potential for having to engage them is wise)

Please note that the following procedure does require you to have a valid Windows 10 enterprise license key

In order to just deploy enterprise in the first place:

  • Go to https://www.microsoft.com/en-ca/software-download/windows10
  • Download the media creation tool
  • Run the media creation tool with the command line “MediaCreationTool21H2.exe /Eula Accept /Retail /MediaArch x64 /MediaLangCode en-US /MediaEdition Enterprise”
  • Enter an enterprise license key when prompted
  • Select the options to create an ISO
  • Play the waiting game
  • Extract the esd to a wim
    • Create a directory (e.g. D:\ESD)
    • Mount the iso
    • Copy the install.esd from the mounted ISO to D:\ESD
    • From a command prompt run “dism /Get-WimInfo /WimFile:install.esd” and take note of the image index for your desired version. Enterprise is index “3”, education is index “1” in 21H2 for example
    • run “dism /export-image /SourceImageFile:install.esd /SourceIndex:3 /DestinationImageFile:install.wim /Compress:max /CheckIntegrity”
      • Ensure the SourceIndex value matches with the index number of your desired version
    • Play the waiting game again
  • You now have a wim you can use to image from SCCM

SCCM – auto uninstall of applications once removed from collection

This would have to be one of the most requested features in SCCM – at least from our clients.

https://docs.microsoft.com/en-us/mem/configmgr/core/get-started/2021/technical-preview-2106

 

To quote the specific part of the article:

 

On the Deployment Settings page of the Deploy Software Wizard, configure the following options:

Action: Install

Purpose: Required

Enable Uninstall this application if the targeted object falls out of the collection

 

CB SCCM releases continue to deliver useful stuff.

Migrating SCCM to new hardware using active/passive site servers

The main reason for this article is because a friend asked “we are looking at migrating SCCM hardware for the hierarchy – can the active/passive site server functionality be used for this?”

The TL;DR response is “yes”. Whether its supported or not is another matter – but i can say for sure that it does work, because i’ve done it.

In 2019, i had a client who had let their hierarchy go…. it had been outsourced to people that claimed to have SCCM skills – but clearly did not, nothing had been updated for years etc etc… it was a real mess.

To paint the scene… there was approx 35,000 clients, thousands of collections, packages, OSD, software updates etc…. so starting a new hierarchy was not an option. There was a hell of a lot of crap – but there was a lot of stuff that was current too. The core servers were 2008 R2 with SCCM 2012 R2 SP1 with an SQL 2008 dedicated cluster behind the scenes. This was all on physical hardware – so reverting back to a snapshot in the case of a failure was not an option. So yer, you get the picture – not an easy upgrade – just because there was so many dependencies.

The full upgrade process is a long story – but a very compressed version is:

  • Preparation – get everything ready such as media, license keys, access over SCCM and SQL etc. Design your end state – and have a step by step guide from your current state to your end state. My migration plan was split into 15 “phases” – some phases were a couple of hour piece of work, some took a week.
  • Migrate SQL – create your new SQL server and use build in SCCM setup functionality to migrate it
  • Migrate custom SQL reports – this can be a pain…. there are a few scripts out there that can help such as https://www.scconfigmgr.com/2014/11/28/export-and-import-reports-in-configmgr-2012-with-powershell/
  • Upgrade SCCM to 1606 – as this is the first version that supports upgrading the underlying OS
  • Migrate the SUP – as the SUP does not survive the OS upgrade
  • Upgrade the primary site server OS (depending on your version this might be multiple steps)
  • Upgrade to the current version of SCCM, in my case, this was 1902 (at that time) – it has to be 1806 or above to support active/passive site servers – but given how old that is now – i dont see any reason why you would not move to the most current CB version – again, this may be multiple upgrades depending on where you are at
  • Prep for site failover – this will generally be creating your new VM’s that will become your site servers, move your content library to a “3rd” server
  • Add one of the “new” VM’s as a passive site server
  • Once complete, activate the passive server to make it into the active server
  • Take care of moving the roles around as required (e.g. this became an MP and one node of a shared SUP for me)
  • Remove the “old” passive site server
  • Add the 2nd VM as a passive site server
  • Take care of moving the roles around as required
  • Decommission your old site server, starting with the SMS provider

There are many other steps around this – many of which will be specific to your environment….

 

If you were like me and wondering if this can be done and looking for someone else that had done it – here you go 🙂 Depending on where you are, it might be a big piece of work – but it can be done.

SCCM – CAS Collapse feature

Recently had this twitter exchange around an SCCM feature i didn’t know existed – SCCM CAS collapse

When SCCM 2012 was first around, we had some clients that had previously had other “consultants” design and implement 4 primaries and a CAS for 2000 devices – and other insane rubbish. At the time, we used traditional migration methods to move them back to one primary. This probably would not have directly helped – as there was nothing about the hierarchy worth saving – but i really like this type of feature… its the type of thing that you don’t need often – but can save large amounts of time and effort when you run into a situation needing it.

 

As a side note – i highly recommend following @djammer on twitter if your an SCCM nerd.

While social media seems to be generally used for spreading conspiracy theories, spreading hate and pictures of food (for some fucking bizarre reason) – i have found that @djammer provides decent quality information in a timely way around SCCM….. something i haven’t really seen through any other channel.

Adding F8 command prompt to legacy boot wims

Adding and managing legacy boot wims into SCCM is a pain….. the removal of GUI options for non-current versions makes things more painful than they need to be…. and the instances of newer versions of ADK (such as ADK 2004) being incompatible with certain things, seem to be increasing.

Adding command prompt to a legacy boot wim is substantially less than intuitive – so im leaving this here as a quick reference for myself!

WBEMTEST

connect to \root\sms\site_<sitecode>

Open instance

sms_bootimagepackage.packageID=”<boot image package ID>” (quotes are required)

Set “EnableLabShell” to TRUE, save property, then save object

Re-open the boot image and verify the property is set