May 13
2013

Subtle Benefits of Virtualization

Posted by Solutions Architects in VMware , VM , vCOps , vCenter , Template Standardization , Policy Compliance , Performance Triage , Memory Savings , Hardware Upgrades , Benefits of Virtualization

By David Klee (@kleegeek)

Earlier this week, a blog post by guest blogger Christian Wickham was posted at VMware.com entitled: “Hidden benefits of virtualisation – reboot time and the impact on server availability and regular operations”. In this blog, Christian discussed a hidden benefit of virtualization – the positive impact to business on the shorter reboot times of servers. This post hit the proverbial nail on the head, but after taking the time to digest it, a few more subtle benefits of virtualization popped into my head. Some of these might be even more important to you than others, but all are greatly beneficial to businesses everywhere.

These benefits include:

  • Memory savings through Transparent Page Sharing (TPS)
  • Quick and simple hardware upgrades
  • Quicker infrastructure performance triage
  • Policy compliance through template standardization

Memory Savings

VMware vSphere has the ability to detect duplicate memory blocks on a host and deduplicate them – lending to reduced physical memory consumption on a host.

Shared_Storage

The amount of memory block deduplication potential grows if you run multiple VMs with the same operating system or programs on the same host. As you can see in the screen shot above, the amount of memory that can be effectively shared between virtual machines on a host can be quite substantial. This savings, in turn, can yield a greater VM consolidation ratio, which saves you more money because you have less hardware to license. Saving money is always good!

Simple Hardware Upgrades

Think about how much time over the last couple of decades you have spent doing server upgrades – not because of the software, but because of the hardware. The operating system and core software was probably just fine, but since the hardware was aging out, you had to put together a long-winded project. Downtime was planned. Late weekend nights were usually required to make it work with the least impact to the business.

Nowadays, thanks to virtualization, this is a complete nonissue. Hardware upgrades are a drag and drop event that can be performed whenever you wish, all with no interruption to service. Just add the new hardware to the virtual cluster and migrate your virtual machines over. Remove the old hardware from the cluster and you’re done!

Quicker Performance Triage

VMware vCenter builds performance statistic histories for you over time. vCenter Operations Management (including the free vCOps Foundation edition) builds a historical performance baseline automatically. If you suspect something in the environment is not healthy, a quick glance at the vCOps dashboard can help you see what is abnormal in the environment.

vCOps
(Image source: http://www.vmware.com/products/datacenter-virtualization/vcenter-operations-manager/overview.html)

Even just the historical views within vCenter can be enough to help you start down the right path of performance troubleshooting and remediation by showing you the performance metrics that matter.

Performacne_Metrics

Policy Compliance

Last, but certainly not least, comes the means to ensure server policy compliance, thanks to the standardization that comes from using virtual machine templates. Pre-load all of your best practices, policies, and organizational requirements into that VM template. At the deploy time, all of these normally time-consuming requirements become part of the deployed based VM. It is a gold standard, and the potential for human error or inconsistencies because of things like hardware differences or a different installer are now completely removed.

Hopefully these subtle benefits can help you as you work towards total virtualization of your datacenter. What less mentioned and more commonly overlooked benefits can you think of? Tweet your answers to @kleegeek.

Apr 26
2013

VMware Virtualization - RDM to VMDK Migration

Posted by Solutions Architects in VMware , vMotion , VMDK , SQL Server , RDM-V , RDM-P , RDM , Migration , Fusion-io , ESXi 5.0

By David Klee (@kleegeek)

A fun customer of ours asked me today if you could convert an RDM directly into a VMDK. They had a SQL Server instance residing on a Fusion-io SSD card through a pass-through Raw Device Map (RDM) on a VMware ESXi 5.0 host. They wanted to migrate the SQL Server to the SSD-based area on their SAN so that the VM could benefit from VMware HA between ESXi hosts in the vSphere cluster rather than pinned to a single compute node.

At first, I did not think this was directly possible. However, we found a VMware knowledge base article (1005241) that says you can do it. The methodology behind it makes perfect sense.

The method behind the migration depends on your current RDM configuration. Two kinds of RDMs exist – RDM-Physical (RDM-P) and RDM-Virtual (RDM-V). As my wizard-like colleague Jim Hannan (@HoBHannan) noted in his recent blog post ‘VMFS vs. RDM’, RDM-V specifies full virtualization interception of the SCSI commands to the storage device, and snapshots can be taken. With RDM-P, the SCSI commands are directly passed-through to the storage. VMware-level operations are very limited, but full SAN tooling exists.

Raw Device Map – Virtual Options

If your storage pointers are configured in the RDM-V manner, your options are better than RDM-P. You have to methods – cold migration and Storage vMotion (if the licensing level allows for it).

First, if you have the option for a period of downtime, the simplest method is to gracefully shut down the VM and perform a cold migration. You don’t even have to migrate between hosts. The migration process will create a new virtual disk and will clone the RDM into the virtual disk.

The original mapped LUN is left alone after the migration completes, and can be removed as necessary to reclaim space.

You can also use a Storage vMotion operation to perform the same migration task.

If you are moving SQL Server or Oracle data from RDM to VMDK, please use the advanced tab in the migration wizard to specify ‘Thick Provisioned Eager Zeroed’ as the virtual disk format. It ensures that the entire space required for the database objects is contiguous on the storage, yielding optimal storage performance.

Raw Device Map – Physical Options

If you have the RDM configured as RDM-P, you cannot migrate the storage using a Storage vMotion. Cold migrations are still fully supported.

The end result is that you can successfully (and simply, to boot!) migrate from RDMs to VMDKs using fully supported methods.

Apr 19
2013

Oracle Benchmarking and Tier 1 Readiness

Posted by Solutions Architects in VMware , Tier 1 , Swingbench , Storage , SQL Server , SLOB , OS Metrics , Oracle RAT , Oracle Orion , Oracle , NMON , Metrics , iperf , IOMeter , ddm , CPU , Benchmarking , Baselines , AWR Reports

By Jim Hannan (@HoBHannan)
I strongly believe one of the most beneficial exercises an IT staff can undergo is benchmarking. There are truly some substantial benefits to benchmarking:

  • Building of baselines
  • Establishing key metrics
  • Communicating the key metrics across the organization
  • Identifying scalability
  • Testing for Tier 1 readiness
  • Validating the infrastructure (network, storage, CPU, software, drivers, and configurations)

I think many organizations agree with this concept, but benchmarking can be resource intensive and inaccurate if done improperly. This tends to deter organizations from doing this kind of work.

Because I am a consultant, I have the benefit of working with many different organizations. I get to see what works and what does not. In fact, I spend quite of bit of time with customers doing benchmarking and Tier 1 readiness testing. In this blog I will discuss the various methods and present what works and what does not.

 

stopwatch

Why benchmark? Benchmarking provides a measurement of your hardware and scalability. Often during benchmarking, customers expose weaknesses or misconfiguration. Additionally, benchmarking establishes baselines for future benchmarking and performance troubleshooting. But most importantly, benchmarking tests the platform for Tier 1 readiness.

I think it is important to clarify a few things before continuing on. What does it mean to baseline? A baseline establishes a performance profile when things are running well. The baseline can record things like TPS (transactions per second), I/O peaks, CPU usage, maximum I/O throughput, and many other performance indicators. The baselines can come from tools like AWR reports, NMON statistics (OS metrics), storage tools, and user response time.

What does Tier 1 readiness mean? Tier 1 readiness is simply a key set of benchmarks designed to validate and record the performance of a platform (hardware and software) in its ability to service the workloads.

There are several different approaches to benchmarking. In the table below we categorized the different approaches and identified the level-of-effort for each category. A more detailed description of each test is below the table.

Test

Category Description

Industry Trends

Load Generation

Category 1 – Validates configurations and hardware. Pass/false results. (Example: Swingbench)

Common, typically short in duration (1 – 2 days).

Tier 1 Readiness

Category 2 – Benchmarking tools (Examples: IOMeter, dd, Oracle Orion, SLOB, iperf)

Common, short execution. (Less than 1 day)

Right-Sizing

Category 3 – Very accurate and low cost. Dependency that existing baselines and metrics exist. (Examples: AWR, perfmon, NMON)

Very accurate, predictable outcome.

Capture and Replay

Category 4 Validation – Recommended for VBCA and first time deployments on new infrastructure. (Example: Oracle RAT)

Highly recommended for first time deployments and VBCA.

Load Generation

Working from the top down, lets look at Load Generation techniques. Load Generation tools like Swingbench and even Spec 2006 (CPU/hardware analysis) have a preset of tests. They may not behave anything like your database or application. They have the benefit of being typically the easiest to setup. They fall short in accuracy of results.  That said, people who are very familiar with Swingbench can accurately measure performance by running tests across many different hardware configurations. The results have a point of reference – meaning that they can compare the previous results to the new results.

Tier 1 Readiness

I should tell you that some of my favorite tools live in this category. These tests tend to be very focused on particular elements like, for example, storage, network, or CPU. I like to tell customers these test expose the tipping points. What I mean by that is, these tests give you the absolute best case for throughput. Another reason that I am a really big proponent of these tests is because they are the most approachable. They do not take a lot of time to setup. They don't need an Oracle installation. In fact, our SQL Server team will use a lot of the same tools.

Right-Sizing

It could be argued that right-sizing does not fit into the benchmarking category, but I would disagree with that argument. It’s a process, and probably the most important of the four. After several edits of this paragraph, I realized something. I was originally going to say that right-sizing is a virtualization only step. That is simply not true. Right sizing is just as important for physical hardware as it is for virtual.

What data do you need for right-sizing? For Oracle, customers are most successful when they have collected OS metrics and AWR reports. The metrics should be collected every day. And metrics should exist as far back as a month ago – or even a year ago.

Why so much data? For your business critical applications, understanding their peaks is important. For example: what does month end look like? Does the application get very busy around Christmas or the beginning of the school year?

Capture and Replay

This is the most accurate of the testing options available. A capture and replay benchmark takes the application code and data and tests it on the new hardware. I have seen all kinds of various sets of tests. Some customers only replay critical or most prevalent SQL – others test all SQL over a defined period.

Summary

I mentioned in the beginning of the blog that I had the luxury of working with many different customers. I get to see what works and, in some cases, what does not work as well. For most customers a combination of the 4 different approaches in the right mix. Understanding when to use the appropriate tool is important. Additionally, the customers that are most successful with benchmarking have adopted their own methodology and processes. This allows the organization to be clear about what the key metrics are and how to run the test consistently each time. This is a deep topic with host of considerations. If you have questions or comments, I can be reached via Twitter @HoBHannan.

Apr 11
2013

VMware Certifications, Q2 2013

Posted by Solutions Architects in VMware Certified Professional , VMware Certifications , VMware , VCP5-DT , VCP5-DCV , VCP4-DCV , VCP-IaaS , VCP-Cloud , VCP , vCloud Director , VCDX5-Cloud , VCAP5-DTD , VCAP5-DTA , VCAP-CID , Datacenter Virtualization , Datacenter Design , Datacenter Administration

By David Klee (@kleegeek)

A few weeks ago I posted a short post on the current SQL Server certification paths from Microsoft. This week I’m following up with current VMware certification paths. The VMware certification paths are rapidly evolving, and VMware’s direction in the enterprise becomes immediately clear when looking through the list.

The original VMware certification started with the VMware Certified Professional back in the day for VMware ESX 1 – way back in 2003.

VMPro

Susan Gudenkauf  is VCP number one and tells her story at a great interview at TrainSignal.com.

Fast forward about ten years, and now four different certification tracks exist from VMware.

  • Cloud
  • Data Center Virtualization
  • End User Computing
  • Cloud Application Platform

While I feel all of these certification tracks are very valuable, I’m a bit less focused on the Cloud Application Platform certifications at this time, as they deal primarily with Spring software development and we do very little application development.  We try to stay ‘below the line’ when it comes to data.

For each of the three tracks that I am focusing on here, multiple certifications exist in a hierarchical progression.

Data Center Virtualization

First, the Data Center virtualization track is the granddaddy of the VMware certifications.

At the heart of the Data Center track is the VMware Certified Professional – Data Center Virtualization (VCP5-DCV – formerly VCP then VCP-DV). This certification is the starting point towards VMware virtualization mastery, and is the cornerstone of a solid VMware infrastructure understanding. This exam is quite challenging, and the candidate should be quite versed in VMware setup, configuration, and troubleshooting.

Pro5

As with other VMware base certifications, obtaining your VCP5-DCV requires that you attend a qualifying VMware-authorized class if you are not upgrading from the VCP4-DCV certification. A list of the qualifying courses can be found here.

Next come the VMware Certified Advanced Professional certifications for Data Center Administration and Data Center Design. Both certifications are great, and I personally feel that obtaining one or both of these certifications demonstrates that a person can properly manage any scale of a VMware infrastructure. I hold the VCAP5-DCD certification, and am looking to obtain the VCAP5-DCA certification by the end of the summer.

pro5dd pro5da

Finally comes the certification that demonstrates absolute mastery of the vSphere stack – VMware Certified Design Expert – Data Center Virtualization (VCDX5-DCV). This is such a high level certification that most people don’t strive to achieve because they simply do not need it. For those that do, they are usually technical architects or consultants who design large and/or complex cloud infrastructures and wish to demonstrate the technical capacity required of their profession. Only a few hundred individuals worldwide have completed the requirements for VCDX.

DE5

I am a glutton for punishment with my nonstop self-education regimen, and hope to achieve VCDX-DCV in the near future. Earning the VCDX-DCV designation is a personal goal that I have had since I learned about the VCP certification many years ago.

Cloud

Next, VMware’s huge push to the cloud is now backed by a number of certifications in the path to product mastery

VCPpath

First, comes the newly released VMware Certified Professional – Cloud (VCP-Cloud). It is a mix of the core vSphere information for traditional virtualization, mixed with a strong dose of subject matter surrounding the VMware vCloud suite of products. I took and passed the VCP-Cloud exam this past December, and it is a tough exam.

procloud

To be eligible to earn the VCP-Cloud designation, you must either already have obtained the VCP5-DCV certification and pass the VMware Certified Professional – Infrastructure as a Service (VCP-IaaS) exam, or take a ‘qualifying’ course from VMware and then pass the VCP-Cloud exam. Appropriate courses include:

These pre-requisite courses can be a barrier to VMware certifications because these classes are not cheap, but my colleagues and I have had very good experiences with these classes. Urge your employers to send you to these classes if you do anything at all with VMware virtualization management.

Additional information on the VCP-Cloud certification, along with learning blueprints and practice questions, can be found here.

Once you have gotten the VCP-Cloud certification under your belt, next in line are three VMware Certified Advanced Professional certifications – Cloud Infrastructure Design (VCAP-CID), Cloud Infrastructure Administration (coming soon), and Cloud Governance (also coming soon). Not much information exists on these certifications except for VCAP-CID because they are so new.

apcid

Finally, the pinnacle of VMware cloud certifications is the VMware Certified Design Expert – Cloud (VCDX5-Cloud). Very few individuals strive to achieve this level of certification, and even fewer make it.

As I grow with my vCloud experience, I am considering working towards the upper certifications in this track.

End User Computing

Last but not least, comes a certification track that VMware is actively building upon – End User Computing. It is centered on the VMware View platform for virtual desktops.

VCPenduser

First comes the VMware Certified Professional – Desktop (VCP5-DT).  Next comes the VMware Certified Advanced Professional – Desktop Administration (VCAP5-DTA – coming soon) and Desktop Design (VCAP5-DTD). Virtual desktops are exploding in adoption, and these certifications are a great way to prove that you are qualified to manage the largest and/or most complex View installations.

Conclusion

In the past, technical certifications were rather polarizing. Quite a few people thought they were irrelevant and unnecessary. HR managers and recruiters thought they were vital for recruiting the ‘right’ staff. Some certifications were (and still are) easy to pass, and do not have much value in the industry. The VMware certifications are far from it. Having passed a number of industry certifications myself, VMware creates some tough exams. The format of the tests is such that I consider them true tests of demonstrating the knowledge required to apply these technological platforms to business challenges. I hold the VMware certifications high, and when I see folks with a VMware certification, I can trust that they know their stuff.

If you work with VMware administration or architecture, I encourage you to explore these certifications, see which ones make sense for you, and go after them!

Apr 05
2013

IOMeter: Simple and Effective

Posted by Solutions Architects in vSphere , IOMeter , IO testing , Benchmarking

By Jim Hannan (@HoBHannan)
I was working on some documentation for a client and started thinking back on how long we have been using IOMeter — my best guess is since 2007. It is a great tool. You could argue that there are newer, better suited tools for database I/O testing available like:

  • SLOB
  • Oracle ORION
  • SQLIO

Nevertheless, I continue to use and recommend IOMeter. I think it is because of its approachability and simplicity. Is this blog I will cover the usage, history, and how to configure and run IOMeter.

The History of the Open Unofficial Storage Thread

Back in March 2009, a group of vSphere administrators got together and created a standard set of test to run on their storage unit. The thread is still active (although you are redirect to a new thread branch) here.

The purpose of the thread was to compare and test storage arrays before virtualizing workloads. The results were uploaded via CSV files or pasted into the forum to compare results against other storage units. It became essentially a database to compare data. The thread lives on today and hundreds maybe thousands of results. You can often find your exact model and compare results. Obviously a model is not the only determining factor in throughput, but it gives administrators a really good way to determine if performance is where it should be.

IOMeter Only on Windows

All of the tests in Open unofficial storage thread are conducted using a tool called IOMeter. IOMeter is licensed under the GPL (GNU Public License) and was written and distributed by Intel back in 1998. The primary application runs on Windows — there is a Linux binary, but it was written for older Linux kernel that can do only synchronous I/O not the faster more efficient asynchronous mechanism of today. For those of you unfamiliar with the difference in the async kernel and sync kernel, it is dramatic. Think of your I/O as water dripping out of a faucet. Now compare it to faucet running full blast. For this reason the only approachable solution is to run it on Windows, which for many customers make it less attractive for I/O testing. That being said, I am still an advocate for running IOMeter in Linux based shops. Remember that you are testing for I/O throughput not the OS. Something else I should tell you: NTFS v5 is faster than Linux ext3.

Since 2007, HoB has run thousands of test using IOMeter for customers and recorded the results. Below I have included the steps for IOMeter setup for Windows 2008.

IOMeter Setup

To install IOMeter, double-click on the executable and follow the prompts. After the installation is finished, click on the Iometer icon to start the GUI and complete the basic configuration.
To run IOMeter on Windows Server 2008

  1. Right-click IOMeter and select Run as administrator to run IOMeter with administrative privileges.
    Fig_18
  2. In the User Account Control window, select Allow.
    Fig_19
  3. After IOMeter starts, a window called C:Program Files(x86)I…opened with Iometer is displayed. This thread drives the I/O and file creation. Do not close it.
    Fig_20
  4. Select Open and go to the OpenPerformanceTest.icf configuration file location. The default configuration file is a standard agreed upon by a group of users who test VMware I/O performance. You can modify the file, but the default configuration file is often a good starting point for testing. The file and user forum are available at http://communities.vmware.com/thread/73745.
    Fig_21
  5. Select Worker 1, and then select the drive. This creates the test output file if it does not already exist. Iometer creates a 4GB test output file by default.
    Fig_22
  6. Under the Access Specifications tab, select a test and click Add. This example selects Max Throughput - 100% Read.
    Fig_23
  7. Under the Test Setup tab, confirm that the settings are correct. The default values might be OK. You can modify the length of the test in the Run Time section.
    Fig_24
  8. For a write test, select the Max Throughput - 100% Read test, select Edit Copy and change the slider Percent Read/Write Distribution to 100% Write.
    For a read test, select Max Throughput - 100% Read. This test usually exposes any problems with the storage configuration, ESX/ESXi host, HBAs, or drivers.
    Fig_25
  9. Under the Results Display tab, change the Update Frequency to 4 and select Last Update. Click Run Test (the green flag icon).

    Fig_26

IOMeter asks for a filename to save the results. The results are written to a .CSV file with the data from the test. After the results file is specified, the test begins.

IOMeter first prepares a file for use by the test. If the defaults are used from the .icf configuration file, the file is 4GB and is named iobw.tst. The first time IOMeter runs it generates this file, so the actual test is delayed until the file is ready. Subsequent tests on the same drive use the existing file, so a new file is not generated.

The iobw.tst file must be manually cleaned up after testing is complete.

Summary

At HoB we really encourage customers to adopt a standard set of benchmarking test. There is so much to gain by doing so. A big part of benchmarking is testing I/O — and IOMeter is a good tool to assist with that. If you have any questions or comments let me know @HoBHannan.

Apr 01
2013

Recording the SQL Server System State

Posted by Solutions Architects in System State , SQL Server 2012 , SQL Server , SLA , RAID-5 , Query , Perfmon , Metrics , Glenn Berry , DBA , Database , Data Collection , Chris Shaw , Checklists , baselining

by David Klee (@kleegeek)

Your database is running slow, and you need to drop every other emergency and fix it right now!”

Do you get nailed with exasperated comments such as this without warning? Don’t you ever wish the person would say “I inserted a billion records into table dbo.XYZ and now the nonclustered index dbo.IX_XYZ_MakeReportingFaster needs to be rebuilt because it’s slowing down reporting”?

That’s always fun (not). Not only are folks in the organization in a panic, but you are left with no useful information to help you start the triage process except for a particular server is running an arbitrary “slow” value. You must start from ground zero on the server in question and work your way into the problem and eventual solution, and that’s time consuming.

What if this same user comes to you tomorrow and says “Sorry about the emergency yesterday… but here’s another question. I have not budgeted for anything for your team for the next couple of years, but when are we going to run out of space on your database servers? I just thought of that today…”

Do you have the system in place that could just pop out the answer?

You should consider automating a system that can help automate some of the collection of common runtime metrics and system states so that you can help yourself determine what is out of the norm during an emergency, as well as provide for a long-term capacity management baseline for all key metrics on your servers.

Recently, I was reading one of Chris Shaw’s (B, L, T) fantastic chapters on the utility database from the new book Pro SQL Server 2012 Practices and thought that I would share some of my practices that can help backup some of his recommendations with some more practical examples.

Checklists and Data Collection

First, reference a previous blog post of mine where I outline all of the usual tasks that I perform during normal day-to-day maintenance of these servers. Now, I’ll show you how to create a system that can assist in automating these processes!

Let’s take a pretty routine task – checking database file sizes.

To keep things simple, I am adapting query number 17 from Glenn Berry’s SQL Server 2012 diagnostic queries, January 2013. You can see more fantastic queries for these types of purposes in those scripts.

Create a placeholder for the data.

CREATE TABLE [dbo].[FileSize](

[ServerName] [nvarchar](128) NULL,

[DatabaseName] [nvarchar](128) NULL,

[FileID] [int] NOT NULL,

[FileName] [sysname] NOT NULL,

[PhysicalName] [nvarchar](260) NOT NULL,

[TypeDesc] [nvarchar](60) NULL,

[StateDesc] [nvarchar](60) NULL,

[MB] [bigint] NULL,

[SampleDT] [datetime] NOT NULL

) ON [PRIMARY]

To start populating this table, you could create a job to periodically execute the following query and store the results. I normally sample database file sizes once a week unless that environment has a high number of transactions and frequent file growth.

-- Adapted from Glenn Berry SQL Server 2012 Diagnostic Queries, January 2013

-- SQLServerPerformance.wordpress.com

-- File Names and Paths for TempDB and all user databases in instance (Query 17)

INSERT INTO dbo.FileSize

SELECT

@@ServerName as ServerName,

DB_NAME([database_id])AS [DatabaseName],

[file_id] as FileID,

name as FileName,

physical_name as PhysicalName,

type_desc as TypeDesc,

state_desc as StateDesc,

CONVERT( bigint, size/128.0) AS MB,

GETDATE() as SampleDT

FROM sys.master_files WITH (NOLOCK)

WHERE [database_id] > 4

AND [database_id] <> 32767

OR [database_id] = 2

ORDER BY DB_NAME([database_id]) OPTION (RECOMPILE);

The following query can present to you the previous month’s samples, and calculate the amount of growth per file over that month.

with cteFileSizes

(ServerName, DatabaseName, FileID, FileName, SampleDT, MB, WeeksPrevious)

as (

select

ServerName, DatabaseName, FileID, FileName, SampleDT, MB,

DENSE_RANK() OVER (ORDER by SampleDT DESC) 'WeeksPrevious'

from

dbo.FileSize

)

select

w1.ServerName, w1.DatabaseName, w1.FileID, w1.FileName,

w1.MB - w5.MB as OneMonthGrowthMB,

w1.MB as Week0MB, w2.MB as WeekMinus1MB, w3.MB as WeekMinus2MB,

w4.MB as WeekMinus3MB, w5.MB as WeekMinus4MB,

w1.SampleDT as Week0DT, w2.SampleDT as WeekMinus1DT,

w3.SampleDT as WeekMinus2DT,

w4.SampleDT as WeekMinus3DT, w5.SampleDT as WeekMinus4DT

from

cteFileSizes w1

left join cteFileSizes w2 on w1.ServerName = w2.ServerName

and w1.DatabaseName = w2.DatabaseName and w1.FileID = w2.FileID

left join cteFileSizes w3 on w1.ServerName = w3.ServerName

and w1.DatabaseName = w3.DatabaseName and w1.FileID = w3.FileID

left join cteFileSizes w4 on w1.ServerName = w4.ServerName

and w1.DatabaseName = w4.DatabaseName and w1.FileID = w4.FileID

left join cteFileSizes w5 on w1.ServerName = w5.ServerName

and w1.DatabaseName = w5.DatabaseName and w1.FileID = w5.FileID

where

w1.WeeksPrevious = 1

and w2.WeeksPrevious = 2

and w3.WeeksPrevious = 3

and w4.WeeksPrevious = 4

and w5.WeeksPrevious = 5

order by

ServerName, DatabaseName, FileID

Voila! You now have a quick report that you can whip up in SSRS, schedule it to automatically deliver, and include it in your weekly routine. The sky is the limit with the items that you can record and analyze. I always say more data is better than less, so monitor and record anything you can possibly consider valuable.

Perfmon

Next, every server that I manage has Perfmon running in the background at all times, constantly sampling performance data. Perfmon data is always good to have, because not all system activity comes from SQL Server. Other items, such as system backups, antivirus scans, and other programs running in the background can have a negative effect on performance. Understanding what a system was doing at the time of a problem is one of the crucial components that can make or break a triage investigation.

Here’s an example of this in action. A while back I was at a customer site and they were having random SQL Server database mirror failovers in the middle of the night on a couple of older database systems. We did not have much to go on with the logging available at the time. I set up Perfmon to capture data every five minutes. A few days later, we experienced an unplanned failover. Sifting through the Perfmon data, we discovered that disk write activity onto RAID-5 set of local disks were high an hour before the event, and then went through the roof a few moments before the unplanned failover. We were then able to correlate the activity to a couple of mis-timed jobs. A database-level backup was overlapping into the sporadic runtimes of a system-level backup, and the onboard SAS controller was being periodically overwhelmed and went unresponsive while the cache flushed to disk.

We set up Perfmon to collect a number of important counters every five minutes, and store the data to a log file that is date-stamped and rotated every night. The counters that we usually start with include the following items.

Object Name

Counter Name

PhysicalDisk

Current Disk Queue Length

PhysicalDisk

Average Disk Read Queue Length

PhysicalDisk

Average Disk Write Queue Length

PhysicalDisk

Average Disk sec/Read

PhysicalDisk

Average Disk sec/Write

PhysicalDisk

Average Disk Bytes/Read

PhysicalDisk

Average Disk Bytes/Write

PhysicalDisk

Disk Read Bytes / sec

PhysicalDisk

Disk Write Bytes / sec

Memory

Page Faults/sec

Memory

Pages / sec

Memory

Available Mbytes

Paging File

% Usage

Processor

%User Time

Processor

%Privileged Time

Processor

%Processor Time

Processor

Interrupts / sec

System

Processor Queue Length

SQLServer:Access Methods

Forwarded Records / sec

SQLServer:Access Methods

Full scans / sec

SQLServer:Access Methods

Page splits / sec

SQLServer:Memory Manager

Memory Grants Pending

SQLServer:Buffer Manager

Buffer Cache Hit Ratio

SQLServer:Buffer Manager

Checkpoints / sec

SQLServer:Buffer Manager

Lazy Writes / sec

SQLServer:Buffer Manager

Page Life Expenctancy

SQLServer:Buffer Manager

Readahead pages / sec

SQLServer:Databases

Transactions/sec

SQLServer:General Statistics

User Connections

SQLServer:Latches

Average Latch wait Time

SQLServer:Locks

Average Wait Time (ms)

SQLServer:Locks

Lock Wait time (ms)

SQLServer:Locks

Lock waits / sec

SQLServer:SQL Statistics

SQL Compilations / sec

SQLServer:SQL Statistics

SQL Re-Compilations / sec

SQLServer:SQL Statistics

Batch Requests / sec

You want to remember to collect all instances of these items, not the cumulative rollups that can wash out key information.

Infrastructure Statistics

Finally, I prefer to record all statistics underneath the SQL Servers and the operating systems. This includes items like SAN performance, VMware or Hyper-V performance statistics for both the VM itself and the hosts it resides on, and even down to networking activity.

Why would I suggest these items? Ponder this situation. Your organization has a SAN with 50 servers connected. A system administrator misconfigures an antivirus scan setting and accidentally triggers a full scan on all 50 of those servers at the same time.

Your SQL Servers now grind to a crawl. Someone runs into your office and demands that you investigate why the database server is performance poorly (it’s always the DBA’s fault, right?). The SQL Server and Windows performance data stats now record a burst of suddenly high disk latency and reduced throughput. You look at the vCenter statistics and find that most of the virtual machines are strangely maxed out on CPU utilization. Outside of those metrics, you do not know what is occurring, but you now have a great set of metrics to go to the storage group and ask them to investigate further.

Quite frequently, DBAs just do not have access to these items. However, you can task the different administrators for these systems to setup automatic reports to be routinely delivered to you. Find a way to get these statistics, because after all, data is the most important part of the business, and you should be aware of how the infrastructure is performing.

Wrap Up

Maintain your SLAs by baselining the environment and being proactive on resource contention. Use this information to help triage pain points in the infrastructure, and ensure that your systems are running at their peak performance!

What do you normally do on a routine basis that can help you capture the state of your systems? I’d love to hear your practices and experiences!

Mar 15
2013

Routine SQL Server DBA Tasks

Posted by Solutions Architects in Windows , SQL Server DBA , SQL Server , Routine , Management , Database Management , Database , Daily Tasks

By David Klee (@kleegeek)

Every once in a while I get asked about checklists and routine tasks that DBAs should be performing. I have a set routine for the various activities around day to day management.

First, I have a routine for the servers that I manage. There are tasks that I perform on a set time period, and just about all of these items are collected automatically and then a report delivery mechanism (usually SSRS) has my daily/weekly/etc. report in my inbox when I arrive at work. I review the reports over my morning cup of tea and see if I can catch anything that might normally catch me off guard later in the day or week.

Daily Checklist

  • Check the previous night’s SQL Server database and transaction log backups and SQL Server Agent jobs for errors.
  • Check all databases to make sure all are up and not marked as suspect.
  • Check previous night’s DBCC CHECKDB for errors.
  • Check SQL Server and Windows application and system event log entries for warnings and errors and determine if any entries warrant further investigation.
  • Check mirroring status for all databases being mirrored.
  • Check for service status for all Windows services that are required for operation (i.e. Windows Full-Text Indexing, Search Server, etc.).
  • Look for any security policy violations.
  • Look for resources on the server, such as file sizes and disk space, and audit growth for long-term projections.
  • Check system performance levels against established baselines. Use long-running queries or tasks, Perfmon, etc. to generate the data.
  • Double check that no configuration changes have been made on the server, and if so, document and investigate.
  • Ensure that all data replication tasks are operating normally.
  • Record and compare last day of CPU activity with known baseline.

Weekly Checklist

  • Check for full system backup status.
  • Verify that the MSDB database is being backed up.
  • Verify that index and statistic maintenance has completed.
  • Perform sample restores of database backups on preproduction servers to spot-check backup set integrity.
  • Examine SQL Server wait statistics to see if new ‘pain points’ have become large points of contention within SQL Server.
  • Audit job execution time for dramatic variances against baseline runtimes.
  • Investigate databases for objects that violate established rules.
  • Verify that SQL Server is sending database mail properly.

Monthly Checklist

  • Compare SQL Server run book specifications against current configuration. Update run book if necessary.
  • Perform system maintenance, such as disk defragmentation, Windows updates, SQL Server cumulative updates
  • Verify that data access speed is running at normal levels.
  • Perform full system growth projection update as part of normal capacity management.

Quarterly Checklist

  • Perform a full index analysis for all databases. Determine unused indexes and disable if appropriate. Determine missing indexes and investigate process to update or add indexes to better cover data usage.
  • Review systems for tuning opportunities. Common tuning indicators include buffer cache hit ratio, page life expectancy, I/O and network performance, long running queries, average CPU utilization, etc. Perform tuning as necessary.
  • Perform full test of disaster recovery plan.
  • Perform full system restore against a virtual machine so that all core systems and services can be brought up in isolation and tested for integrity.

 

Mar 01
2013

Oracle Automated Deployments on VMware Part II

Posted by Solutions Architects in VMware , Virtualization , Oracle on VMware , Oracle , DBCA , Automation

by Jim Hannan (@HoBHannan)

DBCA2

In my previous post I discussed tools to automate deployments for virtualized Oracle workloads. I outlined some of the current industry methods and their tools. In this blog post I will take a deeper look at one of those tools, Oracle’s DBCA (Database Configuration Assistant). I cannot advocate for DBCA being the perfect solution for every company, but I think majority of IT teams will find it very well suited for their deployments. Before jumping into DBCA, we should first quickly revisit the table from the last blog post. The table below outlines the opportunities for automation. Each step has various methods available to it. In this blog we are targeting the final step, automated deployment of the Oracle database after the virtual machine and software has been deployed.

Table 1 - Oracle Automated Deployment and Tools

Steps

Tool(s)

Create a virtual machine

[1] Standard – Create VM through vSphere client of APIs (this step involves the process of allocating memory, CPU and storage)

Boot virtual machine

[1] Boot off media

[2] Network boot and receive boot image from network server

[3] Deploy virtual machine from template (template contains base OS with no additional software)

Automated OS install

[1] Linux kickstart

[2] Spacewalk

[3] VMware template with base image

Scripts for customization

[1] OS scripts to further customize the guest OS. Often includes network settings and deployment of organization scripts for managing Oracle.

Oracle software install

[1] Oracle Universal Install with a response file

[2] Zip of Oracle home with CPU patch already applied. Requires the creation of an Oracle inventory

Oracle Database

[1] Use Oracle DBCA command line

[2] Deploy Oracle database from zip files.

Most DBAs are familiar with DBCA and the GUI wizard. Probably less familiar is DBCA's ability to run silently from the command line. The command line options really lend themselves to automation. You can create a database template or a catalog of templates. In the following example I have created the template SAND-forblog.dbc. This is stored with the other default templates. After creating and selecting the configuration you can than override the setting by using DBCA command line options. An example of overriding a parameter is changing the character set or datafile location. From the command line you can type “dbca –help” to get a full list of options. I have listed some of the key options in the table below. Remember this command line options can override settings defined in the template, very flexible.
Creating the database:

[oracle@oel63-64-11g Desktop]$ dbca -silent -createDatabase -templateName SAND-forblog.dbc -gdbname FORBLOG -sid FORBLOG -responseFile NO_VALUE -sysPassword oracle -systemPassword oracle -storageType ASM -asmSysPassword oracle -diskGroupName DATA

Output of the database creation:

Copying database files

1% complete

3% complete

35% complete

Creating and starting Oracle instance

37% complete

42% complete

47% complete

52% complete

53% complete

56% complete

58% complete

Registering database with Oracle Restart

64% complete

Completing Database Creation

68% complete

71% complete

75% complete

85% complete

96% complete

100% complete

Look at the log file "/opt/oracle/cfgtoollogs/dbca/FORBLOG/FORBLOG.log" for further details.

You can delete a database with the command:

[oracle@oel63-64-11g Desktop]$ dbca -silent -deleteDatabase -sourceDB FORBLOG -sysDBAUserName sys -sysDBAPassword oracle

And output of the delete command:

Output

Connecting to database

4% complete

9% complete

14% complete

19% complete

23% complete

28% complete

47% complete

Updating network configuration files

48% complete

52% complete

Deleting instance and datafiles

76% complete

100% complete

Look at the log file "/opt/oracle/cfgtoollogs/dbca/DEV2.log" for further details.

I thought it would be worth mentioning that customers that have successfully implemented automation typically have matured their processes first by building standards. This reminds me of software development. First, build the flow chart (standard process), and then look for automation opportunities.

If you have any questions about DBCA automation or previous topics I can be reached on twitter @HoBHannan.

Feb 22
2013

SQL Server Standardization

Posted by Solutions Architects in VM Template , Virtualization , VBCA , Standardization , SQL Server , SQL Configuration

By David Klee (@kleegeek)

How much time are you wasting every month with your SQL Servers because of nonstandard server configurations? Sit back and add it up.

I’ll pause while you pick your jaw up off the floor.

Every server is different. They have different purposes. They could be from different vendors, from different purchase cycles in different years, and have different operating systems at different patch levels. Even identical hardware can have different BIOS and driver revisions. Windows and SQL Server can be wildly different with patches and Service Packs. How can you keep everything up to date when every configuration is unique?

Do you even try?

How standard are your SQL Server configurations? From an ongoing operations standpoint, as your servers converge towards a common configuration, you save time and reduce risk over the lifecycle of that server.

Want an example? Just look at Southwest Airlines as a prime example of standardization. The airline standardized on Boeing 737 aircraft for their entire fleet. By standardizing on one aircraft, they have one aircraft type to maintain and operate at all of their airports. Other airlines have many more types in operation. The maintenance crews can master just one aircraft, simplifying operations. Spare part stores can be kept at a minimum. You get the picture.

Why do I bring up this topic? Consider asking your server admins and DBAs to determine the number of distinct server configurations in your environment. Now, filter out those systems that require a special configuration (vendor requirements, SQL Server collation settings, security, etc.).

I will guess that this number is a bit shocking.

Now consider the time spent handling patches, firmware and driver updates, and routine maintenance that is tailored to each system. Every different configuration requires special considerations and care to manage properly, and each unique configuration requires its own amount of update rollback planning prior to each maintenance cycle. You DO have rollback plans, don’t you?

These numbers are astounding. This amount of time that you just calculated could be most of someone’s full-time job duties. Or more frequently, it could be the amount of time that pushes an employee into dangerous levels of overtime.

Where does virtualization fit in?

By now, you have read some of my posts and know that I am a huge proponent of virtualization. Virtualization can help you standardize your infrastructure. You can create a master SQL Server virtual machine template on any modern hypervisor, and use it to deploy your entire server infrastructure with the same standardized and pre-approved configuration. Of course, server customizations are normal, but the differences between your templates and your production deployments are few. The benefits of this practice can be tremendous.

Some benefits are obvious.

  • Pre-approved templates mean guaranteed consistency with your server builds. The server was built to your standards going in, and you can guarantee that those standards are maintained with each server deployment.
  • You also dramatically reduce your new server deployment time. Server deployments can now take mere minutes, rather than the weeks or months of a traditional physical server procurement and deployment cycle.
  • You also eliminate the need to constantly monitor and upgrade things such as BIOS and hardware driver revisions. The only component that needs updating is the VMware Tools, and as of vSphere 5.1, those updates do not require a reboot.

Now, some of the benefits of template-based virtualization are much less obvious.

  • Your maintenance and upgrade operation rollback plans drop from terrible, time-consuming processes each and every time you have to update or patch — down to just a few clicks, thanks to VM-level snapshots.
  • The time and resources required to perform the inevitable hardware refresh goes to zero once the VM migration has been completed. Simply hot-migrate (via vMotion or Live Migrate) the VM to the new hypervisor hosts and you are done! This act alone will pay great dividends across the organization if you factor in staff time into the ROI calculations.
  • You can now hot-clone a production server to an isolated network and test various configuration changes (and witness the effects) on a common server configuration, thanks to VM-level cloning operations. This task can reduce the risk of unexpected impacts of configuration changes on those servers with common templates.

Now, how do you handle standardization when your business treats server configurations like it is the Wild West?

  • The business must sign off that these templates will be used across the board, and that one-off configuration changes be documented once they are approved.
  • The business should also standardize on common configurations, policies, pre-installed tools, etc. Some businesses have different organizational units that each have their own server build standards. Physical and virtual server sprawl can turn into template sprawl if not kept in check, and the goal of VM standardization goes right out the window.
  • Audit server configuration changes periodically, and check your audit history for compliance. Anyone sneaking configuration changes onto a server without going through a change control process should be discovered.
  • Patch as needed, but try to keep the deployed servers as close to the same patch level as possible. This keeps the server configuration close, and makes testing patches and updates that much easier.

Keep your server builds as close as possible, and watch the time spent on rote operations drop! Your administrators will thank you for it.

Feb 07
2013

Right Sizing Your Virtual Machines

Posted by Solutions Architects in VMware , VM , vCPU , SQL Server , Right Sizing , Performance

by David Klee (@kleegeek)
You have allocated too many resources to your virtual machines, and now your business-critical server performance is suffering! How can this be? That does not make sense!

In this post, I will demonstrate how allocating too many vCPUs to a virtual machine with a low workload actually hinders performance instead of helps it.

Test Setup

To prove this cause and effect, we used a dedicated HP DL580 G7 server with four 10-core Intel Xeon E7-4850 CPUs at 2.0GHz per core and 512GB of RAM. An EMC DMX4 SAN was used for the storage underneath the virtual machines. VMware vSphere 5.0 Update 1 was used as the host hypervisor. One virtual machine was created, and was the only VM running on this host. The VM was configured with two vSockets and four vCPUs per socket, as well as 128GB of vRAM. SQL Server 2008R2 was installed on the VM and configured with all of our best practices.

Dell’s freely available DVDstore (a database benchmarking tool) was used to generate a synthetic workload against our SQL Server testbed.

A 50GB workload was generated and loaded into a new SQL Server database. A DVDstore workload test was performed for one hour. The vCPUs were then changed from 8 to 32 in a 4x8 configuration. The database was restored and the test rerun. The output from each test is in the form of ‘Orders Placed per Minute.’ For each test, the maximum degree of parallelism for the SQL Server instance, or MaxDOP, was adjusted from one to six (a requirement from the project).

Test Results

Threads

MaxDOP

8 vCPUs

32 vCPUs

2

1

19277

13589

2

2

19251

17858

2

3

18841

17453

2

4

15839

15640

2

5

15953

15779

2

6

16263

16055

8

1

76590

63910

8

2

76592

70705

8

3

75441

69335

8

4

57508

61412

8

5

55021

61579

8

6

56859

61151

16

1

152782

135484

16

2

151462

140577

16

3

147618

136376

16

4

86078

112365

16

5

81383

106862

16

6

84634

101230

32

1

298444

274629

32

2

291692

278024

32

3

280824

272659

32

4

108952

147444

32

5

102808

133270

32

6

106140

124293

64

1

487146

542351

64

2

429131

532679

64

3

368718

515461

64

4

113664

153877

64

5

117480

136862

64

6

0

127634

100

1

0

0

100

2

375301

539928

100

3

337446

480744

100

4

0

150850

100

5

0

132887

100

6

0

127498

The results are pretty clear. At a low volume of work, the SQL Server instance performs slower with more vCPUs assigned to the virtual machine. As the volume of work grows, the 32 vCPU VM eventually overtakes the 8 vCPU VM in performance.

Why?

The answer lies in the overhead of vCPU scheduling at the hypervisor layer. All vCPU activity is scheduled into a runnable queue, even if a vCPU is almost idle. You can see this measured indirectly via the vCPU Ready VMware performance counter. As the vCPU count goes up, the hypervisor schedules all activity in this queue. If some vCPUs are almost idle, they still have to get scheduled to run.

However, the priority of the request can decrease if VMware determines that a vCPU is idle, and the overhead of these tasks and queues has a cumulative effect. Now, if all vCPUs are busy, priority is given and these effects become negligible – for this VM. The effects of becoming deprioritized in the runnable CPU queue can potentially be felt by other VMs on the same host, however, so keep this in mind and constantly monitor CPU Ready times of all of your mission-critical virtual machines.

Therefore, baseline and benchmark your workload and determine the actual resource consumption of your workloads. Allocate your VM resources appropriately, and you might just see a noticeable jump in performance!

Note: This blog post was taken from an earlier engagement. For the full case study based on that engegement, please refer to this blog post: "SQL Server Performance on Itanium vs. x86 on VMware: A Case Study".

<< Start < Prev 1 2 3 4 5 Next > End >>