Drew

  • It has been a long time since I’ve posted anything, having transitioned to a role focusing on Azure. Like a number of you, I’m taking point on getting our on prem servers to send data to OMS directly. This […]

  • Update: Thanks to Larry Mosley at Microsoft we have figured this out, and a KB article will be forthcoming. I am going to see if this applies to the Exchange 2010 schema/StandardDataSet entries as well; I suspect it will.

    I’ve attached a text file containing SQL statements to run against your DW. Remove the MP, give it about 15 minutes, then run…[Read more]

  • I have a tier 3 SR open for this, and sadly not getting much help. I am hoping someone out there can demystify the DataWarehouseDataSets element within Microsoft.Exchange.15 management pack. Specifically, why the <Install> section only executes the very first time the MP is imported in a management group, and why the <Uninstall> section never…[Read more]

  • This issue has been fixed, going to test ASAP. Below is from the readme for the just released R2.

    Topology views optimized and new diagram dashboards added

    Veeam MP 7.0 for System Center built a very detailed and deep VMware and/or Hyper-V topology which extended from clusters, through the physical host servers, to the virtual machines, and…[Read more]

  • Disclaimer: OpsMgr management group environments are all different, and this is my experience with two management groups. I am currently working with Veeam and Microsoft Tier 3 support.

    Note: A lot of troubleshooting detail is omitted b/c it would require deep dives into each troubleshooting element. Churn-like symptoms can be tricky to troubleshoot, requiring peeling layers away until you reach the root cause.

    The Issue:

    In August I updated our existing Veeam 6.5 environment to v7 in two management groups:

    Staging (approx. 800 Windows servers)
    Production (approx. 2000 Windows servers)

    Things seemed fine, and I went about my business. It was during production agent updates post UR3 that things slowed to a crawl. Management servers were full of 2115 events for the CollectDiscoveryData workflow. SQL blocking was extensive. The Program FilesMicrosoft System Center 2012 R2Operations ManagerServerHealth Service StateCompleted File Uploads directory was full of files on each MS.

    Not good.

    Our DBA’s use tools to automatically collect profiler traces under extended blocking conditions, and one of them explained the cause was a session performing updates/inserts on the Relationship table in the OperationsManager database. A corresponding trigger, triu_Relationship fires, causing the blocking. MSFT’s tier 3 SQL confirmed this as well after analyzing PSSDIAG results. What caused the change? Everything seemed to be running fine.

    Analysis led to checking the size of the production RecursiveMembership table; it had 4 M rows. MSFT felt this was an extremely high value. I took a look at staging, and was surprised to find it was also experiencing 2115’s for the same CollectDiscoveryData workflow, You would not have known there was a problem unless you happened to be looking for trouble in the event logs. Staging’s RecursiveMembership table had 1.3 M rows. OK, maybe rowcount is the issue, or perhaps it’s the content in RecursiveMembership that mattered most. I pieced together some queries and deduced that at least 25% of RecursiveMembership pertained to relationships involving elements discovered by Veeam. That seemed disproportionate. Time to test that hypothesis.

    I removed the Veeam MP’s from staging and let things settle. The 2115’s stopped. RecursiveMembership rowcount dropped to 1 M rows. This proved the hypothesis, so I then took a closer look at the discovery XML and developed another hypothesis. Veeam v7 builds a very detailed VMWare topology in OpsMgr with multiple relationships between objects. Perhaps with > 2,000 VMs spread across 60+ ESX hosts, this built a relationship structure that caused any convergence of discovery data to start the blocking. I then cross checked the 2115 storm timestamps with modified property report timestamps. IIS discoveries in particular lined up nicely, as they tend to have larger discovery payloads.

    I did not apply this analysis to staging as the Veeam MPs had already been removed, but it’s safe to say the same situation occurred, just on a smaller scale.

    The Proposed Solution:

    Rolling back to 6.5 was an option, but I like to fix things and move forward. My next task was to determine the possibility of eliminating unneeded elements of topology discovery, stop OpsMgr from being hammered, and still meet our requirements:

    Host discovery and some monitoring/data collection
    VM discovery and nothing else

    This required some experimentation in staging; fortunately not too much. Veeam is still reviewing our data, but importing the MPs and disabling these discoveries is working:

    VMGuest to OpsMgr Agent Relationship
    Virtual Switches
    Resource Pools
    Datatores
    Populate VMWare VMs that run OpsMgr agents

    The staging MG RecursiveMembership table increased to 1.1 M rows. There were no CollectDIscoveryData 2115’s after two days. Our requirements for monitoring and inventory were met. Good news!

    I followed the same procedure in production. After removing the MPs, RecursiveMembership dropped to 2.1 M rows. The 2115’s stopped. I waited a few hours. After reimporting with the new changes, RecursiveMembership is 2.4 M rows. No CollectDIscoveryData 2115’s. Result!

    There is one side effect resulting from limiting discovery. I’m certain Veeam will address and correct it. All the management servers report this every twelve minutes:

    Note: None of our management servers are Veeam collectors

    The Windows Event Log Provider is still unable to open the Veeam VMware event log on computer ‘MSSERVER.SOMEDOMAIN.com’. The Provider has been unable to open the Veeam VMware event log for 22320 seconds.

    Most recent error details: The specified channel could not be found. Check channel configuration.

    One or more workflows were affected by this.

    Workflow name: many

    Instance name: many

    Instance ID: many

    Management group: OURMG

    Conclusion:

    If you have a medium to larger OpsMgr management group, and your vCenter environment has > 2,000 VM’s, you may run into discovery performance issues after discovering the full VMWare topology using default settings.

    I have fixed my share of churn over the years, and this was a great puzzle to solve. I need to learn more about the RecursiveMembership and Relationship structures since they play a big part in management group performance.

    • Hi Drew,

      Alec King of Veeam here, AKA “The MP King” 😉 as I run our Management Pack R&D group.

      Thanks for the very interesting and detailed post!

      Discovery churn is something we always seek to minimize in the Veeam MP. I’m diving right now with my team into the root cause of the churn you were seeing, and I’ll post more detail back here soon….

    • Hi Drew,

      A quick update from Veeam R&D as promised!

      Most of the additional rows in the RecursiveMembership table are generated when we create the Containment relationship between a VM (Veeam MP object) and the Ops Mgr agent (Windows Computer object) running inside the VM, if present.

      That relationship is a very useful one, as you can imagine – allowing a link to be shown between the virtual infrastructure and the applications and services that depend on it. We use it in dashboards, reports, groups…etc.

      However we’ve established that the RecursiveMembership table is not populated with just one entry, when we create that single VM-to-Agent relationship. In fact that table populates with an entry referring to each child object under the Windows Computer, and then is multiplied by a factor of the topology depth. On the Veeam (VMware) side this can be pretty deep, as our topo starts at vCenter, then through Datacenter, Cluster, Host….

      As you saw this generates a huge amount of additional rows in this table and you experienced performance issues with the additional relationships.

      We believe Microsoft implemented this table to optimize calculation/rendering of topology diagrams – however in a large/deep topology it creates bottlenecks.

      And your issue was exacerbated by some problems in communicating direct with vSphere Hosts to gather CIM (hardware) data – this caused our topology to ‘flap’ and lead to repeated discovery update triggers, which made things worse.

      So, we continue to dive in. I’ll probably reach out to you direct via our Support org, so we can discuss in details.  Apologies that you experienced this issue – but thanks for your patience and detailed research, and we can already see here at Veeam how we will solve this!

    • This issue has been fixed, going to test ASAP. Below is from the readme for the just released R2.

      Topology views optimized and new diagram dashboards added

      Veeam MP 7.0 for System Center built a very detailed and deep VMware and/or Hyper-V topology which extended from clusters, through the physical host servers, to the virtual machines, and even included the Ops Mgr agents running inside the virtual machines (if present). In large environments, the depth of this topology could be an issue for Ops Mgr to maintain and could cause SQL performance issues (specifically in the RecursiveMembership table), including problems with insertion/update of discovery data.

      In Veeam Management Pack 7.0 R2, the relationship between a VM and the Ops Mgr Agent (Windows Computer object), was replaced by discovering a relationship between a VM and the specific Veeam MP object “Ops Mgr Agent in VM” which is discovered inside each Windows OS. Because this object (unlike Windows Computer object) does not have any child topology objects, the overall Veeam topology depth and total number of contained objects is greatly reduced, which addresses the SQL performance issue.

       

       

    • Thanks Drew! I was planning to post a notification here for our R2 release – but you beat me to it 😉

      I believe we have addressed the performance issue you found – and without losing any functionality. In fact, the new in-context Diagram Dashboards have added new capabilities – you can now browse from the VM “down” the hierarchy (into the OM agent) and also “up” the hierarchy (into the Host for this specific VM).

      Thanks again for your initial research into this issue – looking forward to your feedback!

      Cheers,

      Alec

  • I mistakenly started this as a blog posting, which has been deleted….

    OpsMgr 2012 R2 UR2 Operator Console running on any platform (Windows 8.x, Server 2008 R2/Server 2012)

    You create a dashboard view in the console comprising of multiple Powershell Grid Widgets. You successfully build each widget and see the results display, so the code’s f…[Read more]

  • Hello folks,

    Maybe I’m missing it, but cannot find this in either cmdlets or browsing the assemblies in VS. Closest I’ve found is this:

    IsNullPropertySkipped under

    Microsoft.EnterpriseManagement.ServiceManager.Connectors.OpsMgr

    And nothing here

    Microsoft.EnterpriseManagement.ServiceManager.Sdk.Connectors.Connector

    Thanks much–Drew

     

  • FYI my friends:

    Be aware that the gateway update does not include the agent hotfixes; you must manually copy them to each gateway. Very annoying in enterprise deployments like ours. The management server update does include the agent hotfixes.

    I’ve updated three environments so far, and none of them pushed the agent hotfixes when updating via…[Read more]

  • Thank you for white listing me, SCC admin!

    Yep, location changed, and upgrading does two things:

    Leaves behind pieces of the old dir path that need to be manually deleted
    Overwrites configservice.config, so pre-R2 changes are lost. I did not see any changes from the SP1 defaults, so I used my handy dandy PS script to reset them with the…[Read more]

  • Drew replied to the topic Images in Forum Posts in the forum Avicode 4 years, 10 months ago

    I get the same upload error using IE 10. Made the image an attachment for now.

  • Hey Curt, here’s how we did it for the physical SQL boxes hosting SCCM databases:

    Each server has two processor sockets, which matched NUMA count = 2 (from the query). Powershell returned two rows with 6 each. […]

  • Pete, etc. thank you all for taking the time to make those Dr. Gonzo blog posts coherent. Often I write them late night as I transition from one troubleshooting/scripting deep dive to another.

  • I’m not a DBA, but it seems logical MAXDOP should be optimized for any SQL instance. The moment we saw the transformation with OpsMgr and SCCM, I adjusted every SQL instance (even the sandboxes) I am responsible for.

    The DBA’s seem sold too, and I’ve written some Powershell for them to use with xp_cmdshell in their install image.

  • UPDATE—

    I chose to update this posting versus blogging about a new topic. We have found that correctly setting MAXDOP in SQL has transformed our System Center environment. There are a number of opinions out there on how to set this. I’ll use our VM SQL 2012 servers as an example; the same 2012 servers I reference in this posting.

    Number of…[Read more]

  • Closed the book on an odd one last night. Most of you will never see this, but if you do, here’s the solution. Many thanks to Sergey in the APM section of the System Center forums!

    The Problem:

    After […]

  • Appreciate that, TG. One of these days I will get the formatting down so the lunatic ravings at least ‘read better’. Perhaps I should just attach the Word doc versus copy/paste.

  • Disclaimer: I am not a DBA, nor do I play one on TV. In June, I spent two weeks working with Microsoft support to unclog our newly migrated OpsMgr 2012 SP1 production management group. We went from an all physical […]

  • Scott Moss and Profile picture of DrewDrew are now friends 4 years, 11 months ago

  • Yep, partial sync. I just tacked up ServMan (2012 SP1 and UR2) for an internal POC. In OpsMgr we have a custom sealed MP with one class, deriving from Windows Computer. It discovers unique properties that identify applications (custom registry entries), and in OpsMgr we use it heavily for building computer groups.

    In order to import this MP into…[Read more]

  • Quick update, folks. This may be of interest to those who experience intermittent trouble with the SDK.

    We’re on OpsMgr 2012 SP1 and UR3, with two (RMSE and web console) of the six management servers load balancing SDK connections via an F5. The remaining four MS handle agents. All MS remain in the default resource pools, we just chose to split…[Read more]

  • Load More