Viking’s Weblog

November 2, 2009

Timeout | Blocking | Wait types

Filed under: Performance tuning — msrviking @ 4:22 AM

I am into a performance analysis and tuning assignment wherein I need to identify a “mysterious” performance issue. Sounds familiar by words, “Yes” but its really difficult if one doesn’t know where to start (the state I am in the beginning of the exercise).  Well, all these aside, but here is the problem statement given by the team “We have timeouts and deadlocks happening”.

With this small and simple one liner I started thinking like this,

- What is a time out? I had to get back to basics  for this and per definition which I love to read again and again is

“Before executing a query, SQL Server estimates how much memory it needs to run and tries to reserve this amount of memory from the buffer pool. If the reservation succeeds the query is executed immediately. If there is not enough memory readily available from the buffer pool, then the query is put into a queue with a timeout value, where the timeout value is guided by the query cost. The basic rule is: higher the estimated cost is, larger the time out value is. When the waiting time of this query exceeds the timeout value, a time out error is thrown and the query is removed from the queue.”

Well keeping in this in mind, I went hunting for memory, CPU configuration of the box.

- What is the configuration of the RAM, CPU and how is it doing?

64G RAM, and 8 core machine is having time out and deadlocks. Wow! Can’t believe it and so I decided to go to the box and check it myself on how it is doing. To my surprise all looks fine (RAM and CPU) running below the threshold levels. So what is that I am hunting for?

Over years of experience gave me a gut feeling which is usually very close to solution or at least to the  problem to start with, in such confused state of mind.  So my gut feeling was saying all looks fine so what is bad going on here.  A strong thought – maybe processes are getting blocked, and all these processes are accessing same tables through different t-sql batch of statements (procedures, functions).

- Does blocking cause timeouts?

This was the first question I had with series of questions like if timeouts are happening why isn’t evident in CPU /RAM utilization. What is going wrong and where? These questions started haunting me which led me to these interesting links which I am consuming (may be 10th time) to understand what should be my next question.

Here are the links for you guys

http://www.informit.com/articles/article.aspx?p=686168

http://sqlserverpedia.com/wiki/Wait_Types

I am curently using these links to arrive at some information for analysis and tune the damn queries causing performance issue through blocking.

I shall keep you guys posted on how I progress, but remember yes blocking & wait type are related (although I didn’t mention about wait types explicitly in the post) and yes of course you have time outs because of blocking which I shall confirm it after this.

Maybe you guys know all these, and even I do but thought of blogging as my thoughts were flowing else I need to spend “regurgetating” you see ;) .

October 28, 2009

Trace skipped records – SQL Server profiler

Filed under: Performance tuning — msrviking @ 11:22 PM

Recently I was working on performance analysis of a “mystery monster” stored procedure in one of the projects at my shop. I was working on this stored procedure in the usual way I try to pick up pain points of stored procedurs /sql statements. And I decided to run SQL Server profiler to profile the performance of the stored procedure. All was done and profiler ran fine and I could see lot of information captured although not overwhelming. I took a deep breath to understand 2000 lines of profiled data, and as I was scanning through I saw something like this and have never seen it before.

Here is the snapshot which I had in the profiler,

 

Trace Skipped records

Profiler results

 

I thought this is something new which I haven’t seen before and took it as learning (partly ignorance too :( ).  But, it is so coincident that someone, somewhere had the same problem and this is what I appreciate about sharing information across the globe.  That someone is ScaryDBA  who had the same issue, but he went and dug around on to see what is this all about?  Well here is what he had to say in his blog post – Snags with Profiler GUI.

Learnings – Don’t be ignorant like me and dig around if you find something new else it will be half-knowledge.

Thanks to ScaryDBA for his thoughts!

Enjoy.

 

September 25, 2009

Performance analysis methodology

Filed under: Performance tuning — msrviking @ 2:14 AM

Hello Guys,

Few months ago I had worked on performance analysis of database that needs to be upgraded from SQL Server 2000 to SQL Server 2008. Before doing the analysis, and providing recomemndations I thought I will document the methodology and share with the project team. As an effect of this thought I had prepare this methodology.

Start of the document:

Performance Issues – Database upgrade from SQL Server 2000 (32 bit) to SQL Server 2008 (64 bit)

Overall databases performance

Since, the source databases are moved from the existing system (SQL Server 2000 – 32 bit environment) to the destination system (SQL Server 2008 – 64 bit environment), the overall database performance would improve for the reasons that SQL Server 2008 has improvised database engine and the instance would be hosted on the 64 bit environment.

However there could be stored procedures, user defined functions, triggers and other database objects which may degrade performance after the upgrade because of the following reasons.

-          Query engine interpretation of execution plans in SQL Server 2008 is different to that of SQL Server 2000.

-          Statistics on columns and indexes will not be affective for query optimizer to prepare optimal plans.

-          Row counts or page counts will be inconsistent and /or incorrect.

To ensure that there is no performance issues due to upgrade, baseline performance will be identified for stored procedures, user defined functions, and also post-upgrade activities will be considered post the production deployment.

Poorly performing existing stored procedures

The identified poor performing stored procedures could have improvements in performance because of the improvised database engine features of SQL Server 2008 and higher processing power (64 bit) on which SQL Server will be hosted.

But the performance could be still not per existing environment because of the following reasons:

  • Poorly written queries.
  • Highly complex T-SQL logic.

Assessment

Overall databases performance

The process of measuring the current performance will be to obtain a performance baseline, and this baseline will provide statistics of current usage pattern of the stored procedures, user defined functions and will also give a reference for the future statistics.

  1. The performance baseline information will be collected after running traces. The traces will have to be created in the identified baseline environment, along with trace dependent physical tables and stored procedures for the traces to run. Essentially in short these will be the components for the traces to be run
  • Stored Procedures which will create, start, stop and clear traces in the SQL Server instance
  • Physical Tables will consist of trace based information columns like events, data columns.
  1. The traces will be run on SQL Server 2000 in all databases before migration and on SQL Server 2008 after migration. The few options on where to run the traces could be run.
  • Running trace in production environment. This is the optimal environment, for the reason that the traces will return actual figures of what is happening in the production environment on the existing stored procedures, functions, and any other DML operations.
  • Running trace in pre-production, UAT or any environment which is closest to production environment. The closest to production environment means that the boxes should be similar with that of production boxes in terms of hardware configuration. Although there will not be enough load on the server to sense how the sp’s are doing while under load, but at least there will be good readings (Duration, CPU, RAM usage) that may be close to production box.

In order to simulate production environment load, have few users could access and test the application with multiple flows and scenarios. As an alternative to simulate load, a single flow /scenario could be executed to gather enough information for the trace. To compare the performance measurements after migration the same flow /scenarios should be followed.

To baseline pre-migration and post-migration performance information, the key parameters that will be captured for the events (SP start and SP end, SQL start and SQL end) would be

  • Database Name
  • Object Name
  • Duration (ms)
  • CPU (ms)
  • Logical Reads
  • Physical Reads

This information will be gathered in trace file format (*.trc) and saved into a folders. After the trace files are captured these will be saved into physical tables, and scripts will be run on the trace tables in order to retrieve the consolidated numbers on how many sp’s, function’s, trigger’s are called through the application, and also performance indicators (above mentioned parameters).

Poorly performing stored procedures

To identify the pain areas and to improve performance in these stored procedures, below information would be required,

-       Functionality of the stored procedures and how these sp’s are being used in the application.

-       The current issues that are causing performance bottlenecks.

-       SQL Server 2008 new features or the best practices that could be considered to fit into these sp’s.

And to gather the above information the below activities will be carried out.

-          Understand the functionality of the stored procedures.

-          Analyze and evaluate the current performance of the sp’s in SQL Server 2000 environment.

-          Identify the pain areas in the stored procedures.

-          Fix the pain areas with new features of SQL Server 2008 and with best practices.

Analysis and Findings

Overall databases performance

The below steps will be executed on the trace results that are gathered from the production environment to arrive at a baseline performance.

-       Load the trace files data of each database into physical tables.

-       Extract relevant SQL statements from “textdata” column of the trace table based on the event class 41, and 12. These events represent T-SQL event class and the events are SQL Batch Completed, SQL Stmt Completed.

-       Extract the object names from the exec statements (extracted from the above step).

-       For each object name the table will be queried on the “textdata” column to see where has the sp been called with the pattern “exec <objectname>” for the event class 41, and 12.

-       The event classes 41, 12 have very useful information like duration (ms), logical reads, physical writes and CPU (ms) for each exec statement.

The trace /profiler results may have captured only the sp’s that were being called by the application during the tracing period, and may not be all the n (total sp’s) in the database.

Poorly performing stored procedures

After understanding the functionality, analyzing and evaluating the stored procedures a list of few best practices and new features will be arrived at, and which could be implemented in the stored procedures for performance gains.

The below list of best practices and new features could be used for implementation.

Best Practices:

-          Use object owner against the objects used in the sp.

-          Temp tables to be replaced with table variables wherever applicable.

-          Use sp_executesql, and parameterized queries to avoid excessive recompilation.

-          Replace cursor with SET based operations.

-          Avoid use of SELECT INTO statements instead use INSERT INTO.

New Features of SQL Server 2008:

The below list gives description of the new features of SQL Server 2008, and where could these be used.

-          Common Table Expressions (CTE):

  1. As replacement for derived table.
  2. Replace table variables /temp tables.
  3. Avoid repeat of code /same query again.
  4. Hierarchical queries (recursive queries – calling query in a query).

-          Table Valued Parameters (TVP):

With TVP data can be passed around as a single variable. What this means is that if multiple records are saved, this can be done very easily in one procedure call.

This also can be done using an XML column or a delimited list, but the difference with a Table Valued Parameter is that a typed row set which can be used directly in an INSERT, SELECT, UPDATE statement, and any other statement that can take a row set. This means no shredding of XML or having a function to split delimited string. This essentially reduces the code, and gives flexibility of using typed data, that would mean reduced calls to the database from the DAL increasing the performance. However as it is indicative that there will be performance improvements but may not be possible to implement for the reason there will be changes in the stored procedures and the code calling the sp’s (T-SQL or the C#).

-          CONVERT function:

The convert function helps in converting binary data to string characters in hex format directly.

-           GROUPING SETS:

  1. The Grouping Sets feature is really helpful when you want to generate a set of aggregate results and at the same time you want to group by varying columns.
  2. It is much easier to maintain and provides better performance when compared to running different queries against the same data and then finally performing a UNION ALL to get the desired results.
  3. It provides better performance as it is executes once against the data source.
  4. It is much easier to program and use Grouping Sets than writing multiple select statements.

-          Transact-SQL row constructors:

This feature doesn’t give performance improvement, but helps in code size reduction which would essentially mean the execution plan will be slim and easy for the query engine to interpret.

-          Compound Assignment Operator:

This feature essentially reduces the code content in a batch of T-SQL statements.

-           Sparse columns:

The sparse column is an ordinary column just like other columns but it reduces the storage requirement for null values. A nullable column can be made as sparse column by adding the SPARSE keyword when the table is created or altered. Once the column is a SPARSE column SQL Server will not allocate space for null values. Note that in using this feature it adds an overhead for data retrieval of non-null values. Therefore it is needed to carefully apply this feature for columns by calculating the space that can be saved. It is recommended to make the column a SPARSE column only if the space that could be saved is at least 20 to 40 percent.

-          Filtered indexes:

A Filtered Index is an optimized form of non clustered index. It is basically useful to cover those queries which return a small percentage of data from a well defined subset of data within a table.

Filtered Indexes is one of the greatest performance improvement introduced in SQL server 2008. A Filtered Index allows us to create a filter to index a subset of rows within a table. i.e., non clustered indexes can be created with a WHERE clause.

A very well designed filtered index will help improve query performance on very large tables; this will also generate a better execution plan as it will be much smaller than the full table non clustered index. It is more accurate than a full table non clustered index because it will only cover those rows available in the filtered index WHERE clause.

A Filtered index will help reduce the index maintenance costs as it is smaller and is maintained only when Data Manipulation Language (DML) statements affect the data in the index. It is better to have large number of filtered index, especially in scenarios when the data is known on which filtered index is created is changed very less frequently.  Similarly, if a filtered index contains only the frequently affected data, the smaller size of the filtered index reduces the cost of updating the statistics.

Another major advantage of creating a Filtered Index is that it will reduce the disk storage space for non clustered indexes when a full table index is not required. A full table scan can be replaced with multiple filtered indexes without significantly increasing the disk storage space for the indexes.

But to implement all these one needs to essentially understand the functionality of the column and indexes along with the execution plan.

-          MERGE statement:

The MERGE statement internally works as an individual insert, update and delete statement within a single Merge statement. The SOURCE and the TARGET table or query which should be joined together should be specified. Within the MERGE statement specify the type of the data modification that needs to be performed when the records between the source and target are matched and what actions needs to be performed when they are not matched. With the introduction of MERGE statement the complex TSQL codes which was used earlier to do checks for the existence or inexistence of data within the data warehouse can be replaced with single Merge statement. The use of Merge statement will also improve the query performance because MERGE Statement runs through data only once in the database which also improves performance.

-          Composable DML:

The OUTPUT operator works much like a “local trigger” on the current statement. The drawback is that there is no way to filter the returned result set directly. The OUTPUT data has to be inserted in a staging table and work from there.

With composable DML a statement of UPDATE, DELETE and even MERGE as a data source for your query can be used. This doesn’t give any performance improvements but will be useful for sp’s that use the functionality of OUTPUT clause, and this would be an advance feature.

-          Table hints:

The FORCESEEK table hint forces the query optimizer to use only an index seek operation as the access path to the data in the table or view referenced in the query. You can use this table hint to override the default plan chosen by the query optimizer to avoid performance issues caused by an inefficient query plan. For example, if a plan contains table or index scan operators, and the corresponding tables cause a high number of reads during the execution of the query, as observed in the STATISTICS IO output, forcing an index seek operation may yield better query performance. This is especially true when inaccurate cardinality or cost estimations cause the optimizer to favor scan operations at plan compilation time.

SQL Server query optimizer typically selects the best execution plan for a query, we recommend using hints only as a last resort by experienced developers and database administrators. Moreover this would involve analyzing the sp for statement level performance and judging if the FORCESEEK hint helps the sp or not will involve time and effort.

-          FOR XML PATH:

This feature doesn’t give any significant performance gain but would reduce the complexity of the code which uses FOR XML EXPLICIT to return XML data.

Fixing and Re-evaluation

Overall databases performance

The above information gives on how much time each sp is taking, how many CPU cycles is the sp taking, what are the logical reads, what are the physical writes. With this data as baseline per sp on each database, the performance will be evaluated of the same sp, by executing same exec statement (as it was captured from the production) on 64 bit SQL Server 2008 Enterprise (testing environment) against each database.

Any differences in performance based on the parameters of duration /logical reads would lead to evaluate on why there are performance difference with the issue based sp(s), and changes will be proposed to get the performance in line or get better in comparison with the existing production system. After the first iteration of proposed changes is implemented the stored procedures will be evaluated again for performance.

Poorly performing stored procedures

After the best practices and new features are implemented in the stored procedures, the sp’s will be evaluated for performance on the 64 bit SQL Server 2008 Enterprise (testing environment). In case there are any performance degradations in the sp’s due to changes, these will be analyzed for performance analysis and changes will be made followed by re-evaluation until performance reaches to acceptable levels.

End of the document:

Let me know what you guys think about all these!

Enjoy!

January 20, 2009

SAN and contention

Filed under: Performance tuning — msrviking @ 1:51 PM

This is probably little away but lot close to one of the series of performance issues that I had written. Grumpy Old DBA had been blogging some interesting articles on SAN use, performance, monitoring for SQL Server. In fact I had used one of the entries as a guideline to pin down the problem of the SQL Server io_stall and well there is one article over here by him which is interesting to read. I believe you guys would be able to co-relate it appropriately.

Also let me know how does it make sense in your scenarios?

Happy reading.

January 13, 2009

Tempdb posts

Filed under: Performance tuning — msrviking @ 4:39 AM

Hello Guys-

It has been quite sometime that I have posted anything on the blog. Man, it was quite a busy 4 weeks and I am on an adventure to a different world. Of course I am talking about work, and that too in SQL Server but of a different role. It was because of these changes that happened over past few weeks and along with were long Christmas and New Year weekends which I have relished well.

Alright, alright let me come back to what I want say :) . Here it goes,

Sunil Agrawal of SQL Server Storage Engine team is blogging wonderful topics from the 3rd of Dec’08 on Tempdb Management. Here is the link for your use and remember these are wonderful posts from the horse mouth. So utilize the learnings too!

Just a note the link I have provided is for Dec’08, and there are other entries on the same topic for this month too. Just run through the link list on this great blogging point.

Enjoy!

December 5, 2008

Oh! Bad performance.

Filed under: Performance tuning — msrviking @ 12:23 PM

Few months ago, one of the team came up to me asking for help on “How to enhance performance of large data (bulk) operations using BULK INSERT?”. After couple of hours thinking and analysis of what was going on in the environment and in the scripts, here is what I came up as recommendations.

· Disable foreign key constraints

· Truncate instead of delete. Because delete is a logged operation, and truncate is not a logged operation (per transaction basis). Disadvantage is that, recovery until point of failure will be only from the last backup.

· Disable AUTO CREATE and AUTO UPDATE STATISTICS

· Optionally disable the constraints (null, not null, check constraints), triggers

· Drop table indexes to improve performance for large bulk imports, except clustered indexes.

· Use TABLE LOCK hint for non-parallel operation

· It is not necessary to use table-locking to bulk-import data into a table from multiple clients in parallel, but doing so can improve performance for parallel load operation

· When indexes exist on a table, you cannot perform a parallel load operation by using the TABLOCK option for parallel load operation

· If you have control over the creation of your input files, make them of a size that is evenly divisible by the number of load threads you want to run in parallel.  To use the processor most efficiently, distribute the data evenly among the clients. Make sure that data files are similar in size if you want to import them in parallel from multiple clients into an instance of SQL Server.

· For parallel load operation – split the files in equal size

· Inserting n million rows in smaller groups say 10000 rows each (for e.g.). Use ROWS_PER_BATCH = 2500, First and Last row option of BULK INSERT

· For bulk-import operations, minimal logging is more efficient than full logging and reduces the possibility that a bulk-import operation will fill the log space. To minimally log a bulk-import operation on a database that normally uses the full recovery model; you can first switch the database to the bulk-logged recovery model. After bulk importing the data, switch the recovery model back to the full recovery model.

· Presort the input files and create the clustered index first before loading the records. If they are presorted it should allow us to shorten the create index step.

· Run the BULK insert on other client machines to remove the need to SQL Agent and SQL Server to handle the direct reading of the files. Move the input files to another computer to see what the load pattern looks like with the network involved.

· Database configuration – If the data or log files grow at % level it would be inefficient. Configure the database at MB level. Create database with enough reserved space

Why don’t you guys too throw your thoughts in the ring?

December 4, 2008

PAL (Performance Analysis Log Tool)

Filed under: Performance tuning — msrviking @ 11:01 AM

Sometime last week I was approached by one of my peers asking for help in analyzing the Windows performance counters of a SQL Server system and share my thoughts.

Although I was given a day time to get back, but at the back of my mind I was wanting to do two things. I should share well analyzed and conclusive thoughts, but quickly. How do I do it? For some reason I don’t remember how I got to this link – PAL but it turned out to be real worth to use it. It took me just half-hour to conclude what is right and wrong in the system based on the reports that is generated from this tool. And I got back with results quickly, of course the correct ones though :) .

You could visit the link and see it for yourself what you need to do. Thanks to PSS engineers who have spent time to build this tool and sharing it among the community. Great job, people! Here is a quick peep into what it is all about,

“Ever have a performance problem, but don’t know what performance counters to collect or how to analyze them? The PAL (Performance Analysis of Logs) tool is a new and powerful tool that reads in a performance monitor counter log (any known format) and analyzes it using complex, but known thresholds (provided). The tool generates an HTML based report which graphically charts important performance counters and throws alerts when thresholds are exceeded. The thresholds are originally based on thresholds defined by the Microsoft product teams and members of Microsoft support, but continue to be expanded by this ongoing project. This tool is not a replacement of traditional performance analysis, but it automates the analysis of performance counter logs enough to save you time. This is a VBScript and requires Microsoft LogParser (free download).”

Cheers!

November 4, 2008

SQL Server 2005 I/O issues – Part 2

Filed under: Performance tuning — msrviking @ 12:06 PM

As a continuation of my earlier post SQL Server 2005 I/O issues – Part 1 here is what had happened after my initial analysis. With more indicators from the system administrators, my apprehensions started coming true – that there is a disk IO issue in the system.

To satisfy my curiosity and to strengthen my apprehensions into facts, I went ahead and configured performance counters,

PhysicalDisk Object:

1. Avg. Disk Queue Length

2. Avg. Disk Sec/Read

3. Avg. Disk Sec/Write

4. %Disk Time

5. Avg. Disk Reads/Sec

6. Avg. Disk Writes/Sec

Database:

1. Log Bytes Flushed/sec

2. Log Flush Waits/sec

And few others like User connections and Processor time for a 24 hour period spanning the off-peak and peak hours of the application usage, the counters being polled every 30 seconds.

After gathering the results from the .csv log, I picked the data for key counters Avg Disk Read (ms), Avg Disk Write (ms), % Disk Time, Log Bytes Flushed/sec, Log Flush Waits/sec for the period of 8 AM – 5.30 PM knowing that the application would have got the hardest hit during these hours.

Now I have the data ready, I decided to bring up some graphs so that my analysis becomes easier, and conclusions will be quicker. Here is what the analysis (using DMV’s and graphs) talks about

<Start of Report>

DMV results:

Percentage of I/O for each database

Database Name

IO in MB

% age

Tempdb

109986

63.79

NGDemo

51205

29.7

Msdb

2018

1.17

Master

31

0.02

Model

8

0

Database and IO Stall

Database Name

IO in MB

io_stall_s

io_stall_pct

tempdb

109986.17

40046.95

50.94

NGDemo

51205.22

19398.37

24.68

msdb

2018.13

7711.44

9.81

master

31.86

52.67

0.07

model

8.05

2.61

0

Although there are no threshold values defined for any system for the counters % age, io_stall_s or io_stall_pct but any high numbers are the pointers for us to investigate. Here io_stall means “Total time, in milliseconds, waiting for the IO to be completed.” and for purpose of calculations and reports the io_stall value is converted into seconds.

There is benchmark number defined for the above counters and here is the quick gist of these,

Avg. Disk Sec/Read and Avg. Disk Sec/Write: The average time, in seconds, of a read of data from the disk. Use the following to analyze numbers in the output.

·         Less than 10 milliseconds (ms) = very good

·         Between 10-20 ms = okay

·         Between 20-50 ms = slow, needs attention

·         Greater than 50 ms = serious IO bottleneck

Physical Disk: %Disk Time: The percentage of elapsed time that the selected disk drive was busy servicing read or writes requests. A general guideline is that if this value > 50 percent, there is an I/O bottleneck.

Database: Log Bytes Flushed/sec: The total number of log bytes flushed. A large value indicates heavy log activity in tempdb.

Database:Log Flush Waits/sec: The number of commits that are waiting on log flush. Although transactions do not wait for the log to be flushed in tempdb, a high number in this performance counter indicates and I/O bottleneck in the disk(s) associated with the log.

The graphs talk about what is happening in the disk of the system and tempdb, which lead me to conclusions and action points.

Graph 1: For better performance, the values of Avg Disk Read time in ms should be less than 20. But the results show that we have average read time between 20 -50 ms and sometimes above 50 ms which means that we need to pay attention to IO and tempdb performance.

Avg Disk Read Time (ms)

Avg Disk Read Time (ms)

Graph 2: For better performance, the values of Avg Disk Write time in ms should be less than 20. But the results show that we have average read time between 20 -50 ms and sometimes above 50 ms which means that we need to pay attention to IO and tempdb performance.

Avg Disk Write Time (ms)

Avg Disk Write Time (ms)

Graph 3: For better performance the % Disk time should be less than 50. The below results show that we have touched /crossed 50, only 2% of the duration which isn’t alarming.

% Disk Time

% Disk Time

Graph 4: For better performance the Log Flush Waits/Sec should be always as low as possible, and per below results we are not having issues with transaction log of tempdb. Moreover, the Log Bytes Flushed/Sec shows high tempdb log activity with bottleneck at transaction log of tempdb.

Tempdb log file activity

Tempdb log file activity

With the above findings, and conclusions I could probably take the below approach for fixing the performance issues from tempdb.

1) Configure the tempdb per best practices and recommendations – Move the tempdb into a drive with RAID level (RAID – 10) of higher disk performance and if possible on to different LUN, configure tempdb with more data files and appropriate growth parameters

2) Observe the performance of the tempdb after the changes

3) Monitor and identify the performance bottlenecks of SAN. In case of any performance issues with SAN configuration, we fix the issues and monitor again

4) If necessary run IO meter on the system to test the performance of the disk and SQL Server, analyze the results and implement changes followed by monitoring

What do you guys think about all the analysis and conclusions? Do let me know.

UPDT: I have been seeing for quite sometime I had posted this entry, that hits on this article have been really high. So I thought I should conclude on the findings and what was done on the systems based on the recommendations.  Well finally the performance of the tempdb got better after having TempDB moved to RAID-10 and the problem was solved. I hope all this helps you guys!

Enjoy until next posting, cheers!

November 3, 2008

SQL Server 2005 I/O issues – Part 1

Filed under: Performance tuning — msrviking @ 10:58 AM

I had this issue presented to me by one of my customers. Alright what is the issue? – “During peak hours of the day the performance of the application slows down.” With this single statement and few other pointers on the application environment I was requested to hunt down the problem from a database perspective.

As I started looking into what could be the cause of performance issues “if any” on the database server, I was getting multiple ideas, but was unable to pin-point what could be the cause, where and what to look for.

So I requested my system administrators to help me in finding what is alarming – CPU, Memory or Disk subsystem on the box. If I know where the cause through the performance counter is, I will know what kind of queries should be run against the instance DMV’s. My request was probably the toughest with no indicators around still, I went and took the armory I use for finding these type issues. Before I used the armory, I categorized the problematic areas as

  • CPU
  • Memory
  • Disk Subsystem

I ran around 7 out of 13 emergency diagnostic queries (some of these queries are from Glenn Berry’s list, some from the Troubleshooting Performance in SQL Server 2005, and some from Working with tempdb in SQL Server 2005) and here is what are all these about, what did yield and what could we do further.

My other thoughts were to wait for reports from Windows front to conclude further on what needs to be done. I was also thinking to have the performance counters for CPU, Disk Reads, Writes, RAM to be configured for logging so that we could arrive at better conclusions.

My findings and inferences were as below

– HIGH CPU *******

– Isolate top waits for server instance

What are the results?

There is one wait type by SQLTRACE_BUFFER_FLUSH, wait time is 2037408, and running percentage 97.45.

What do we infer?

Occurs when the SQL Trace flush task pauses between flushes. This wait is expected and long waits do not indicate a problem.

What do we do?

One thing that is making me wonder is why do we see this kind of wait type. I haven’t thoroughly analyzed on this part, and I shall get back with more inputs in case if it’s going to be worth enough.

– HIGH CPU *******

– Check SQL Server Schedulers to see if they are waiting on CPU (runnable tasks above 10 is very bad)

What are the results?

None of the cpus were busy when I ran this query. The values of runnable tasks were zero.

What do we infer?

The run of this script was not at the right hour, and the information gathered at off peak hours (AM hours) wasn’t judgeable.


What do we do?

We should probably run the script during the peak hours for ½ to find if any of the cpu cores is facing issues.

– TRANSACTION LOG FULL *****

– Find the log reuse description for the transaction log

What are the results?

None of the transaction logs of the user and tempdb were waiting for any operations to complete, when I ran this query. The values of run able tasks were NOTHING.

What do we infer?

The run of this script was not at the right hour, and the information gathered at off peak hours (AM hours) wasn’t judgeable.

What do we do?

We should probably run the script during the peak hours for ½ to find if any there are performance issue faced by transaction logs to write the data.

– IO ISSUES *****************

– Analyze DB IO, ranked by IO Stall %

This query returned the most interesting results.

What are the results?

The query returned parameters like which database and which file of the database is having issues. The most prominent database that had high stall value and percentage was tempdb, and the log file being the most victimized.

File type io_mb io_stall_s io_stall_pct

log 12063.21 14488.00 23.76

data 69810.41 13228.62 21.69

What do we infer?

The query returns a value called io_stall which means “Total time, in milliseconds, that users waited for I/O to be completed on the file”. The value we get in the above results is in seconds. I have the feeling that tempdb log file is facing issues while writing data.

This kind of stalling would happen when the configuration of the log file is not optimal to perform the kind of operations generated by the application calls or the disk subsystem is tiring up doing the operations of tempdb log file.

What do we do?

My advice was to monitor tempdb utilization, and monitor the disk subsystem (drive counters) where the tempdb is configured for sometime before we make any changes. We don’t have any threshold values defined to say if the above numbers are concerning are not, but further monitoring would help to deduct further.

As I was writing this post, I thought I should keep this as Scene – I of the Act – I and will shortly post on Scene – II of the Act – I.

Until then enjoy reading, and let me know if you have other ways to find out such issues.

September 11, 2008

Performance tuning methodology

Filed under: Performance tuning — msrviking @ 6:30 PM

Hello Guys,

I had a request from one of the project teams to help them in DB performance tuning. I was delighted to help and came up with a meeting plan. I had bunch of questions before I could jump for sharing the methodology. Here are the questions I fired back at the team.

  1. Which database technology would we look at, for performance
    tuning? I am assuming SQL Server.

  2. If SQL Server, which version of SQL Server would be looked at
    during performance tuning?

  3. What kind of performance tuning does the team want to do?
    (Queries, Index, Database, Instance, OS or Hardware)

  4. Is the performance tuning, for an application in production or
    application which is being developed?

  5. Is there any information you could provide that would be good
    for understanding of what has been done, what needs to be done?

To my surprise, even though I had mentioned that my thoughts on performance tuning would revolve around SQL Server, I was given a reply back by team saying that tuning would be on Sybase 15. Okkk, now what? I started googling around for some information on how to help in tuning Sybase databases. Well, after little searching and finding interesting articles, I stepped back a little while and thought "What am I doing?". And quickly it struck me, that we are looking only methodology and not at technology based tuning. Wouldn’t methodology be common across any db technology for performance tuning? The only set of differences would be from monitoring, analyzing tools during performance tuning. I have strong belief that if we know the methodology we could get into tuning of different db technology.

What do you guys think about it? Also if you have different set of questions than what I had, do let me know.

Happy reading!

Older Posts »

Blog at WordPress.com.