jump to navigation

SQL Server 2005 I/O issues – Part 2 November 4, 2008

Posted by msrviking in Performance tuning.
trackback

As a continuation of my earlier post SQL Server 2005 I/O issues – Part 1 here is what had happened after my initial analysis. With more indicators from the system administrators, my apprehensions started coming true – that there is a disk IO issue in the system.

To satisfy my curiosity and to strengthen my apprehensions into facts, I went ahead and configured performance counters,

PhysicalDisk Object:

1. Avg. Disk Queue Length

2. Avg. Disk Sec/Read

3. Avg. Disk Sec/Write

4. %Disk Time

5. Avg. Disk Reads/Sec

6. Avg. Disk Writes/Sec

Database:

1. Log Bytes Flushed/sec

2. Log Flush Waits/sec

And few others like User connections and Processor time for a 24 hour period spanning the off-peak and peak hours of the application usage, the counters being polled every 30 seconds.

After gathering the results from the .csv log, I picked the data for key counters Avg Disk Read (ms), Avg Disk Write (ms), % Disk Time, Log Bytes Flushed/sec, Log Flush Waits/sec for the period of 8 AM – 5.30 PM knowing that the application would have got the hardest hit during these hours.

Now I have the data ready, I decided to bring up some graphs so that my analysis becomes easier, and conclusions will be quicker. Here is what the analysis (using DMV’s and graphs) talks about

<Start of Report>

DMV results:

Percentage of I/O for each database

Database Name

IO in MB

% age

Tempdb

109986

63.79

NGDemo

51205

29.7

Msdb

2018

1.17

Master

31

0.02

Model

8

0

Database and IO Stall

Database Name

IO in MB

io_stall_s

io_stall_pct

tempdb

109986.17

40046.95

50.94

NGDemo

51205.22

19398.37

24.68

msdb

2018.13

7711.44

9.81

master

31.86

52.67

0.07

model

8.05

2.61

0

Although there are no threshold values defined for any system for the counters % age, io_stall_s or io_stall_pct but any high numbers are the pointers for us to investigate. Here io_stall means “Total time, in milliseconds, waiting for the IO to be completed.” and for purpose of calculations and reports the io_stall value is converted into seconds.

There is benchmark number defined for the above counters and here is the quick gist of these,

Avg. Disk Sec/Read and Avg. Disk Sec/Write: The average time, in seconds, of a read of data from the disk. Use the following to analyze numbers in the output.

·         Less than 10 milliseconds (ms) = very good

·         Between 10-20 ms = okay

·         Between 20-50 ms = slow, needs attention

·         Greater than 50 ms = serious IO bottleneck

Physical Disk: %Disk Time: The percentage of elapsed time that the selected disk drive was busy servicing read or writes requests. A general guideline is that if this value > 50 percent, there is an I/O bottleneck.

Database: Log Bytes Flushed/sec: The total number of log bytes flushed. A large value indicates heavy log activity in tempdb.

Database:Log Flush Waits/sec: The number of commits that are waiting on log flush. Although transactions do not wait for the log to be flushed in tempdb, a high number in this performance counter indicates and I/O bottleneck in the disk(s) associated with the log.

The graphs talk about what is happening in the disk of the system and tempdb, which lead me to conclusions and action points.

Graph 1: For better performance, the values of Avg Disk Read time in ms should be less than 20. But the results show that we have average read time between 20 -50 ms and sometimes above 50 ms which means that we need to pay attention to IO and tempdb performance.

Avg Disk Read Time (ms)

Avg Disk Read Time (ms)

Graph 2: For better performance, the values of Avg Disk Write time in ms should be less than 20. But the results show that we have average read time between 20 -50 ms and sometimes above 50 ms which means that we need to pay attention to IO and tempdb performance.

Avg Disk Write Time (ms)

Avg Disk Write Time (ms)

Graph 3: For better performance the % Disk time should be less than 50. The below results show that we have touched /crossed 50, only 2% of the duration which isn’t alarming.

% Disk Time

% Disk Time

Graph 4: For better performance the Log Flush Waits/Sec should be always as low as possible, and per below results we are not having issues with transaction log of tempdb. Moreover, the Log Bytes Flushed/Sec shows high tempdb log activity with bottleneck at transaction log of tempdb.

Tempdb log file activity

Tempdb log file activity

With the above findings, and conclusions I could probably take the below approach for fixing the performance issues from tempdb.

1) Configure the tempdb per best practices and recommendations – Move the tempdb into a drive with RAID level (RAID – 10) of higher disk performance and if possible on to different LUN, configure tempdb with more data files and appropriate growth parameters

2) Observe the performance of the tempdb after the changes

3) Monitor and identify the performance bottlenecks of SAN. In case of any performance issues with SAN configuration, we fix the issues and monitor again

4) If necessary run IO meter on the system to test the performance of the disk and SQL Server, analyze the results and implement changes followed by monitoring

What do you guys think about all the analysis and conclusions? Do let me know.

UPDT: I have been seeing for quite sometime I had posted this entry, that hits on this article have been really high. So I thought I should conclude on the findings and what was done on the systems based on the recommendations.  Well finally the performance of the tempdb got better after having TempDB moved to RAID-10 and the problem was solved. I hope all this helps you guys!

Enjoy until next posting, cheers!

Advertisements

Comments»

1. tracey - March 29, 2009

From the first two charts
Percentage of I/O for each database

and
IO in MB
io_stall_s
io_stall_pct

What counters are these……….thanks

2. Michael - June 18, 2010

I would like to create graphs like those for performance trending on my SQL box how did you implement them?


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: