At work I am consolodating SCOM Management Groups into a single Management Group. This requires multiple domains/forests/environments (without a full trust) to communicate back to the SCOM Management Group’s Management Servers.
Naturally we use SCOM Gateway servers residing in the other domains that communicate back to Management Servers via certificate communications. A problem comes when we lose a gateway server (be it from a server issue, network issue, SCOM HealthService issue on the Gateway, etc), the agents who’s primary Management Server is having issues will fail to heartbeat and since they are in environments that the SCOM Management Servers do not have access to, a ping fails.
So, to work around this issue, we have multiple Gateway servers in the other domains, but by default the agent’s won’t fail over to the other Gateway servers. So, I wrote a Powershell script that will work around this by setting the agent’s fail over Management Server to the “secondary” Gateway server. I added a lot of comments, so the code probably speaks better than words.
I setup this script to run as a scheduled task every morning; this ensures that new agents and changes in Gateways get updated to the agent.
# Load SCOM snappins add-pssnapin "Microsoft.EnterpriseManagement.OperationsManager.Client"; Set-Location "OperationsManagerMonitoring::"; # Create the event log, if it exists it will not error New-EventLog -LogName 'SCOM.Custom' -Source 'SetAgentGateway' -ErrorActionSilentlyContinue # Be sure to change the RMS name New-ManagementGroupConnection 'SCOMROOTMANAGEMENTSERVERNAME' # Log the start Write-EventLog -LogName 'SCOM.Custom' -Source 'SetAgentGateway' -EventId 100 -Message 'Started' # Get a distinct list of domains that have gateways $domains = New-Object 'System.Collections.Generic.List[string]' $gws = get-managementserver | where {$_.IsGateway -eq $true } foreach ($gw in $gws) { if((!$domains.Contains($gw.Domain)) -and $gw.Domain -ne '') { $domains.Add($gw.Domain) } } # Loop through the distinct domains foreach($domain in $domains) { # Get gateways servers for the domain $mss = Get-ManagementServer | where { $_.Domain -eq $domain -and $_.IsGateway-eq $true} # Loop through gateways for the domain foreach ($ms in $mss) { # Get the failover GW's $failMS = Get-ManagementServer | where {$_.Domain -eq $domain -and $_.DisplayName -ne $ms.DisplayName -and $_.IsGateway -eq $true } # Procede with this GW only if we have more than one gateway server in that domain if($failMS -ne $null) { #Get the gateway's agents $agents = $ms | Get-Agent #set the management servers Set-ManagementServer -AgentManagedComputer: $agents -PrimaryManagementServer: $ms -FailoverServer: $failMS } } } # Log the completion Write-EventLog -LogName 'SCOM.Custom' -Source 'SetAgentGateway' -EventId 500 -Message 'Completed'
At the day job we utilize SCOM monitoring via SCOM email alerts, so once a problem comes via a monitor, we get the email alert once until the problem is resolved. Take low disk space for example, the monitor turns to critical and an email gets sent. If the problem is not resolved and the email does not get attention, the problem is not caught to be manually resolved.
To workaround this issue with monitors, while maintaining the same monitoring processes, I was using TimHe’s GreenMachine. One problem was that it was that the application was kinda slow. GreenMachine gives a lot of flexibility and has a nice command-line output. I did not need any of that, I just needed to reset all monitors.
The code below used the managed SDK and .NET 4′s parallelism tasks to resolve alerts quickly. I use a scheduled task to run the application once every morning.
using System; using System.Collections; using System.Collections.ObjectModel; using System.Collections.Generic; using System.Linq; using System.Text; using System.Data; using System.Configuration; using System.Threading.Tasks; using Microsoft.EnterpriseManagement; using Microsoft.EnterpriseManagement.Configuration; using Microsoft.EnterpriseManagement.Monitoring; namespace Chad.SCOM.ResetAllMonitors { class Program { static void Main(string[] args) { String rootManagementServerName = ConfigurationManager.AppSettings["rmsServerName"]; // Connect to the Management Group ManagementGroup mg = new ManagementGroup(rootManagementServerName); // Find all non-healthy monitors MonitoringClassCriteria criteria = new MonitoringClassCriteria("Name = 'Microsoft.Windows.Computer'"); ReadOnlyCollection<MonitoringClass> monitoringClasses = mg.GetMonitoringClasses(); // Loop through all monitorign classes Parallel.ForEach(monitoringClasses, (MonitoringClass monClass) => //foreach (MonitoringClass monClass in monitoringClasses) { if (!monClass.Abstract) { MonitoringObjectCriteria mgtObjCriteria = newMonitoringObjectCriteria("", monClass); ReadOnlyCollection<MonitoringObject> mgtObjs = mg.GetMonitoringObjects(mgtObjCriteria); // loop through all objects Parallel.ForEach(mgtObjs, (MonitoringObject mgtObj) => //foreach (MonitoringObject mgtObj in mgtObjs) { try { // only reset if the monitor is not healthy if (mgtObj.HealthState == HealthState.Error | mgtObj.HealthState == HealthState.Warning) { Console.WriteLine(mgtObj.FullName + " " + mgtObj.HealthState); mgtObj.ResetMonitoringState(); } } catch (Exception ex) { Console.ForegroundColor = ConsoleColor.Red; Console.WriteLine("Failed to reset the monitor for '"+mgtObj.FullName+"'. " + ex.ToString()); Console.ResetColor(); } }); } }); } } }
I needed to be able to find all users that had access to a resource. Access was granted to the resource via a group and that group had groups as members. The PowerShell below returns all members of a group (and members of the security groups internal to it).
$domainName = "domain.com" $groupName = "group-sam-name" Function Get-GroupPrincipal($cName, $groupName) { $dsam = "System.DirectoryServices.AccountManagement" $rtn = [reflection.assembly]::LoadWithPartialName($dsam) $cType = "domain" #context type $iType = "SamAccountName" $dsamgroupPrincipal = "$dsam.GroupPrincipal" -as [type] $principalContext = new-object "$dsam.PrincipalContext"($cType,$cName) $dsamgroupPrincipal::FindByIdentity($principalContext,$iType,$groupName) } Function GetAllGroupSecurityMembers($group) { $group = Get-GroupPrincipal $group.Context.ConnectedServer $group.Name [Array] $AllMembers = "" foreach($member in $group.Members) { if($member.GetType().Name -eq "UserPrincipal") { $AllMembers += $member } if($member.IsSecurityGroup) { $AllMembers += GetAllGroupSecurityMembers $member } } return $AllMembers } # end GetAllGroupSecurityMembers Function GetAllSecurityUsersInGroup($domainName, $groupName) { $group = Get-GroupPrincipal $domainName $groupName return GetAllGroupSecurityMembers $group } # end GetAllSecurityUsersInGroup foreach($user in GetAllSecurityUsersInGroup $domainName $groupName) { $user.DisplayName }
I had an issue where I discovered all servers in a domain and tried to install SCOM against all servers. There were firewalls open between the management server(s) and the targeted servers so the installations failed. I then manually installed the agent on the target servers and waited in the SCOM console in the “Pending Management” section.
The server was not in “Pending Management”, but there were 2000 events on the Management Server.
So, to view the pending agents you can run the powershell command “get-agentpendingaction”. Then to approve the agent you can run “approve-agentpendingaction”. If you want to approve all pending agents you pipe the pending into the approve command :
get-agentpendingaction | approve-agentpendingaction
At work we use SCCM OSD to create machines. We automate this via a customer application that creates a machine entry and adds it to a OSD collection. A problem was that the collection membership would have to be cleaned up manually. So the script below will loop through the direct membership rules of a collection and delete the membership if the resource has the client installed (the OS was deployed successfully). I set this up to run as a scheduled task evey morning.
[String]$SMSSiteCode = "SITE" [String]$SMSManagementServer = "SCCMSERVER" [String]$CollectionName = "SITE000010" $SmsCollection = Get-WmiObject -ComputerName $SMSManagementServer -Namespace "Root\Sms\Site_$SmsSiteCode" -Query "Select * From SMS_Collection Where CollectionID='$CollectionName'" $SmsCollection.Get() # Loop through the direct members of the collection ForEach ($Rule in $($SMSCollection.CollectionRules | Where {$_.__CLASS -eq "SMS_CollectionRuleDirect"})) { # Get the SMS_R_System object for the rule $machineResourceID = $Rule.ResourceID $smsObject = Get-WmiObject -ComputerName $SMSManagementServer -Namespace "Root\Sms\Site_$SmsSiteCode" -Query "Select * From SMS_R_System Where ResourceID='$machineResourceID'" # If the resource is a agent if($smsObject.ClientType -eq 1) { #Delete the membership rule $SMSCollection.DeleteMemberShipRule($Rule) } }
The company that I work for found some issues fully utilizing SCOM performance information. SCOM collects a lot of valuable information, but with the built in reports it’s hard to get a fast an effective view. After talking to other SCOM admins and users, I saw that this was a common complaint. So, I created [REDACTED - SITE NO LONGER EXISTS] which contains a Management Pack that can be used to view some simple reports.
This first article focuses on the basics of the data warehouse.
When performance information is collected, it gets written to the operations (OperationsManager) database and the datawarehouse (OperationsManagerDW). In the operations database, the information gets stored in the one of 60 different tables (dbo.PerformanceData_XX) based off of the collection day/period. When the performance information gets saved to the datawarehouse, the data gets stored in the Perf.PerfRaw_{guid} table (to query, use the Perf.vPerfRaw view).
Since this is the data warehouse, this is where we want to report from. The raw performance collections could get pretty big. Imagine if we setup a performance collection for every 15 seconds for all servers, after a short amount of time this gets huge.
So, SCOM aggregates the information into the Perf.PerfRaw_{guid} table to the Perf.PerfHourly_{guid } table(s) (to query, use Perf.vPerfHourly). The initiation of the aggregation is in the data warehouse Management Pack. Once SCOM aggregates the information, it adds three new columns.
Since this is an aggregation, the numbers can be skewed based off of the delta of samples. Image a performance utilization collection : 0%, 0%, 0%, 100%, 0%, 0%, 0%, 0%, 0%, 0%; the average is 10%. 10% isn’t too accurate representation of the collections, so SCOM also has MinValue, MaxValue, and StandardDeviation.
When you are writing a report it is important to analyze what you are trying to report on and determine what value set best represents the goal.
So far we have the raw data, the hourly aggregation, and the there’s also one more aggregation, daily. This information is stored in the Perf.PerfDaily_{guid} table (query as Perf.vPerfDaily). The same type of information for the hourly aggregation applies for the daily.
Storage is cheap, but not that cheap (especially if the database is stored on SAN storage). So, we can’t everything forever (well you can, but you’ll need to make a change. The grooming parameters are stored in the data warehouse. If you want to change the time to keep the data, you will need to update the values in the database. Microsoft documents this process, but doesn’t show a way too user friendly on how to view your current parameter.
Now that you know about the different type of aggregation, the query below should make some more sense.
SELECT B.DatasetDefaultName, C.AggregationTypeDefaultName, MaxDataAgeDays, LastGroomingDateTime FROM StandardDatasetAggregation A LEFT OUTER JOIN vDataset B ON A.DatasetId = B.DatasetId LEFT OUTER JOIN vAggregationType C ON A.AggregationTypeId = C.AggregationTypeId ORDER BY B.DatasetDefaultName, C.AggregationTypeId
To update the grooming thresholds, follow the information at MSDN.
That’s the basics. Please see the SCOM tag for more articles that will focus on querying the data warehouse.
I’m not experienced in programming native code, so I thought I’d share this code snippet for others. In order to enable / disable dial in permissions I thought that you could just set the value of the msNPAllowDialIn attribute. But, as the documentation indicates, you should use the (native) RAS methods. This is because the actual permissions is held is the userParameters attribute. This attribute can hold information for multiple purposes, so directly editing it probably isn’t the best idea. So, if you directly set msNPAllowDialIn to true, the permission will not work; but setting it to false will disable the permission.
This took a good amount of research and a lot of testing, but below was my solution. It uses a hard coded PDC value, but you may want to look into using MprAdminGetPDCServer to get the current PDC. I wrapped the code in a .NET class to make it easier for those less familiar in native programming.
public class MsRapi { public const byte RASPRIV_DialinPrivilege = 0x08; [StructLayout(LayoutKind.Sequential)] public struct RAS_USER_0 { public byte bfPrivilege; public string phonenumber; } [System.Runtime.InteropServices.DllImportAttribute("mprapi.dll", EntryPoint = "MprAdminUserSetInfo")] public static extern uint MprAdminUserSetInfo([System.Runtime.InteropServices.InAttribute()] [System.Runtime.InteropServices.MarshalAsAttribute(System.Runtime.InteropServices.UnmanagedType.LPWStr)] string lpwsServerName, [System.Runtime.InteropServices.InAttribute()] [System.Runtime.InteropServices.MarshalAsAttribute(System.Runtime.InteropServices.UnmanagedType.LPWStr)] string lpwsUserName, uint dwLevel, IntPtr lpbBuffer); }
To call the method:
MsRapi.RAS_USER_0 user0 = new MsRapi.RAS_USER_0(); user0.bfPrivilege = MsRapi.RASPRIV_DialinPrivilege; user0.phonenumber = null; string server, user; server = "PRIMARYDOMAINCONTROLLER"; user = "USERACCOUNT"; int len = Marshal.SizeOf(user0); IntPtr ptr = Marshal.AllocHGlobal(len); Marshal.StructureToPtr(user0, ptr, false); MsRapi.MprAdminUserSetInfo(server, user, 1, ptr); Marshal.FreeHGlobal(ptr);
I was unable to get the MprAdminUserSetInfo to return an error when there was a problem (like entering a bad PDC), so you would definitely want to test before implementing.
