Sometimes you have a general feeling, that there is something wrong with the infrastructure, and by looking around you catch the symptoms one after another until you are able to compose an overall image of the problem. This is what happened in this case I had with one of the customers recently, that has been resolved together with Microsoft Premier Support. It seems very interesting though, and that's why I have decided to share it with you. Here are all the symptoms observed before pinning the problem down, in more or less chronological order:
1. The groups created in the SCOM were not available for choice in the reports. They appeared in the console, but not in the Reporting part of SCOM (which suggests problems with processing data from Ops DB to DataWarehouse DB)
2. Big amount of data stored in the Staging area of the DataWarehouse DB. Running the following T-SQL query revealed hundreds of thousands of rows in the Alert and State parts of the Staging area
1. The groups created in the SCOM were not available for choice in the reports. They appeared in the console, but not in the Reporting part of SCOM (which suggests problems with processing data from Ops DB to DataWarehouse DB)
2. Big amount of data stored in the Staging area of the DataWarehouse DB. Running the following T-SQL query revealed hundreds of thousands of rows in the Alert and State parts of the Staging area
SELECT count(*) from Alert.AlertStage
SELECT count(*) from Event.EventStage
SELECT count(*) from Perf.PerformanceStage
SELECT count(*) from State.StateStage
SELECT count(*) from Event.EventStage
SELECT count(*) from Perf.PerformanceStage
SELECT count(*) from State.StateStage
3. Data Warehouse Data Collection State errors showing up in the Health Explorer of Management Servers themselves in SCOM
4. Large amount of 31551 events in SCOM event viewer log informing about failures while storing data into Data Warehouse. They look similar to the following event:
Reason:
Log Name: Operations Manager
Source: Health Service Modules
Date: 27/01/2013 22:00:15
Event ID: 31551
Task Category: Data Warehouse
Level: Error
Keywords: Classic
User: N/A
Computer: XXX
Description:
Failed to store data in the Data Warehouse. The operation will be retried.
Exception 'SqlException': Management Group with id
'VVVVVVVV-VVVV-VVVV-VVVV-VVVVVVVVVVVV' is not allowed to access Data
Warehouse under login 'YYY\WRITER'
One or more workflows were affected by this.
Workflow name: Microsoft.SystemCenter.DataWarehouse.CollectPerformanceData
Instance name: XXX
Instance ID: {WWWWWWWW-WWWW-WWWW-WWWW-WWWWWWWWWWWW}
Management group: ZZZReason:
It turns out, that we suffered from an issue, that Microsoft admitted
to be kind of a bug, which seems to randomly occur in different
environments. It turns out, that on rare occasions default configuration
of SCOM Run As accounts for Data Warehouse created during the
installation of SCOM servers might disappear from Run As profiles
configuration. The root cause of this behavior unfortunately hasn't been
yet identified by Microsoft.
Resolution:
In order to resolve the problem you have to re-introduce the settings once again. Below you can find the screenshots of properly configured Data Warehouse Account and Data Warehouse Report Deployment Account Run As profiles
Data Warehouse Run As Profiles default configuration
After re-introducing the configuration everything should get back to normal.
Brak komentarzy:
Prześlij komentarz