jump to navigation

MySQL Series: Performance Engineering – Find What April 14, 2015

Posted by msrviking in MySQL, Performance, Performance tuning.
Tags: , , , , ,
add a comment

In the post MySQL Performance Engineering series – The Goal I had mentioned about the problem statement, and how I planned to go about it. I had series of areas that I need to attend to stabilize the performance and as an old adage I went by finding those low hanging fruits that were spoiling the backyard “the queries”.

Being a first timer on stabilizing the performance of MySQL system I was looking for a way out on, how would I know which queries are doing badly? As in SQL Server we have DMVs that capture the slow performing queries, or traces which profile the database system for a set events and columns (duration), there is something similar in MySQL world known as “Slow Query log”. Luckily I had this handy information from the Dev team who were receiving hourly notifications on which queries were running slow on a particular day.

Now to more details.

Slow Query Log, what is it?

Taking the text from The Slow Query Log page which says “A slow query log has those SQL statements that took longer than these ‘n’ seconds to fetch rows after examining ‘m’ number of rows of the table”. If I remember right, I believe we had set the value for long_query_time > 3s, and default value for min_examined_rows_limit. This would mean that mysqld would write to log file for any queries, stored procedures that need attention because they are exceeding the limit of execution of 3s.

There are few other settings too that could be configured before you enable logging of slow queries into the log, and could be pored through the above link of Slow Query Log.

A sample output of what the slow query log is here

/usr/sbin/mysqld, Version: 5.6.13-log (MySQL Community Server (GPL)). started with:

Tcp port: 3372 Unix socket: /db01/mysql/mysql.sock

Time Id Command Argument

SET timestamp=1390907486;

call sp_name (412,15000);

# Time: 140128 5:12:07

# User@Host: user[pwd] @ [xxx.xxx.xxx.xx] Id: 11963111

# Query_time: 54.035533 Lock_time: 0.000000 Rows_sent: 0 Rows_examined: 204534

Please read the above sample output for key information like

Call..-> is the stored procedure name that is called

Time -> is the time of execution of the stored procedure or a query

Query time -> the time taken in seconds for the stored procedure or query to complete

Rows examined -> number of rows looked through during the queries execution in the stored procedure

So what do you do with this raw data? I would think of running a utility that will read through the entire log and give me a summary of what is logged. Well, if I had to do this manually I would have had to use my brains stressed to a level of finding a right app, script, test, review (in cyclical pattern) and abstract the results out. I had been lucky here to and hadn’t spent that much time and effort. The next section will describe of what I did.

 

Raw data of Slow Query Log, what next?

In the above sub-section you would have noticed the output of the slow query log which is Raw, and information rich. Now, to get a decent report of the raw data I had to use the utility “mysqldumpslow” provided by MySQL. Mysqldumpslow is a perl (.pl) program that parses the slow query log and groups similar queries except if data values are different. More here on this nice little program. I didn’t use any additional parameters to parse through the log, and one could tweak the values of the options listed.

 

I had to do one more thing “Install a perl engine” before I could run the mysqldumpslow.pl in its location in my system. And then, I had to copy the raw slow query logs to the folder that standalone .pl file, and run the command mysqldumpslow through the command prompt in the path.

 

The output looks like this

 

Count: 2 Time=57.39s (114s) Lock=0.00s (0s) Rows=313.0 (626), user[pwd]@[xxx.xxx.xx.xx] || SELECT * FROM t WHERE col = ‘S’

Count: 617 Time=57.33s (35371s) Lock=0.12s (76s) Rows=1.0 (617), user[pwd]@4hosts || SELECT * FROM t1 WHERE col1 = ‘S’

Count: 713 Time=56.26s (40116s) Lock=0.72s (516s) Rows=1.0 (713), user[pwd]@4hosts || SELECT * FROM t4 WHERE col2 = ‘S’

Count: 3 Time=55.02s (165s) Lock=0.00s (0s) Rows=1.0 (3), user[pwd]@2hosts || select t1.col1 from t1 join(t2) where t1.col4 !=N and t1.col5=N and t1.col1=t2.col1 and t2.col2=’S’ and t1.col3=N

 

The above sample output has

Count -> number of times the query was called at that instance

Time in () -> total response time in seconds

Time not in () -> average response time in seconds

Lock in () -> total locking time in seconds

Lock not in () -> average response time in seconds

Rows in () -> rows examined

Rows not in () -> rows sent

User, pwd and hostname /host ip address -> speak for themselves

Query definition -> query that was executed with response time, lock time, rows examined, and sent

 

Please note the two | “||” after the query aren’t from the mysqldumpslow.pl. I had to modify the .pl file to include the || for bringing the line of output into single line, else I would have had a /n (enter character) after the hostname /ip. I wanted to pull all the data into one single line, and convert text to columns in excel for better readability and analysis.

 

Now the slow query log is formatted from the utility and it’s to be analyzed, and that wasn’t difficult as much figuring out on how to format for better reporting.

Analyze the Slow Query log

Here is the sample report I had to put in excel for better analysis

Query Profile
Calls Count Average Time (s) Total Time (s) Row Lock (s) # Rows Sent #Rows Examined Query Text
11391 43.47 495193 0.00 (2) 1 11391 StoredProcedure
3757 34.77 130632 0.45 (1695) 418 1570426 Query 1
1788 14.42 25779 0.29 (514) 1 1788 Query 2
1684 48.6 81849 0.04 (73) 1 1683 Query 3
1117 52.14 58244 0.12 (137) 1 1117 Query 4

Are there tools that could help you summarize all these, well I haven’t used any but yes there is one I have kept learning from here.

I decided to look for queries that had least calls count, and highest average or total response time. This was the onset of my performance engineering the MySQL systems for a single goal.

 

Happy reading.

A year and more gone by..no posts March 23, 2015

Posted by msrviking in Configuration, Data Integration, Design, Integration, MySQL, Oracle, Performance tuning.
Tags: , , , , , ,
add a comment

I alone know on how I have missed blogging on what all I learnt on MySQL and Oracle RDS over last 14 months, and how much I wished that what all I did was on SQL Server. Well, not always it is the way you want it to be.

There was such a learning over this period on the way things are done in MySQL, Oracle at design, development and performance engineering that I might have to spend several days blogging about. In next few weeks to months I will be spending time on variety of learnings starting from design, coding, through performance tuning to achieve concurrency and scalability. Although these topics will primarily be around MySQL, Oracle I might map on some of the done work to SQL Server features and implementations.

A brief preview on what would be the topics on, and possibly will be having its own sub-series in detail with mapping

MySQL:

  1. What are the physical design checklist
  2. What are best instance level configurations
  3. What are the optimal working configuration values of host hardware and OS
  4. What are the ways to optimize execution plans
  5. What are the coding best practices
  6. What are the ways to troubleshoot deadlocking problems

Oracle RDS:

  1. What are the best practices for coding in PL/SQL
  2. What are the best practices for designing and building integration DB
  3. What are the ways to optimize execution plans
  4. What are the ways to monitor for any performance bottlenecks
  5. What are the ways to achieve concurrency, scalability and performance
  6. What is that not do-able when compared to on-premise instances
  7. How to administer the instance

Happy reading.

What am I doing now-a-days over last few months..? January 30, 2014

Posted by msrviking in General, MySQL, Performance tuning.
Tags: , , , ,
add a comment

I had been busy doing stuff on SSIS for few months, then on MySQL from the last post I have put up in the blog. What is that I am doing on MySQL and writing a post on a SQL Server blogging site? Well, I am kind a trying to get hold on how MySQL works, while I am trying to stabilize the performance of system. Surely I am not a MySQL geek to look at OS, Hardware, MySQL configurations deeply, but with little knowledge I had and gaining as time is going by I am trying to troubleshoot performance of the queries, indexing, and partitioning tables. These are the few things that I have been trying to put in place even before I get on other levels of performance engineering.

A thread is already on if I could shard the database, introduce read-write splitting to provide a scale-out solution by using out of the box features like MySQL cluster or customizing by partitioning tables, sharding them into different nodes, read-write splitting using MemCache.

These are lot of thinking in terms of MySQL but then I don’t have such flexibility in SQL Server although there are some implementations that use read-write splitting, using load balancers at application level and not database. I am highlighting some of those that are not there in SQL Server, and available to be used at fullest in MySQL. But then there are many of those missing in MySQL which is so good to use and work out things in SQL Server.

Some of the ones that I am missing MySQL badly are

  • Profiler
  • Graphical execution plan
  • Indexes with include
  • Free and easy to use monitoring tools for OS, Hardware
  • Multiple query plan algorithms
  • Proper documentation (implementation or bugs)

This post is to share what I am seeing those top few things in MySQL and few things that I am missing when I think of SQL Server. It’s kind of missing or home-sick post I would say.

I will keep writing as and when I learn new thing and definitely put a comparison with SQL Server features.

Happy reading.

CDIs-Non Functional Requirement-Reliability May 6, 2013

Posted by msrviking in Architecture, Business Intelligence, Data Integration, Integration.
Tags: , , , , , ,
add a comment

This is a strange NFR, but very important. The word reliable is more of English and looks less applicable in here?! Well, no. We are talking of reliability with the meaning and importance of data reliability in a data integration system – CDI. Please take a note of a point which I had mentioned earlier that in a CDI there is no change in meaning of data, but trying to align across organization perspective of what the data in the CDI system means and how true it is. It is necessary that all business, and technical teams agree with the definition of these data terms. After this is agreed, the technical teams have to implement logics – business, solution in the system to bring that “value and truth” to the data or information that is consolidated from different sources.

In this post, I shall share few thoughts on the questions that we need to find answers.

  • What is the tolerance level or percentage of erroneous data that is permitted?

With the above context set, this question should gather information in order to plan error handling and threshold levels in case of any issue. The usual way is to have a failed process restarted, or aborted. Now to do this we need to identify the tolerance (threshold levels) and handle to get the essential, business valid data.

  • Is there any manual process defined to handle potential failure cases?

There should always be a process which restarts any failed or aborted process due to technical or business validation errors, should this be done done manually or automated. In less tolerant systems this process should be either semi-automated or fully-automated, and manual in some cases. A known fact and familiar thought too, but we should elicit this from the stakeholders explicitly to understand the business information at the same level.

  • Under what conditions is it permissible for the system/process to not be completely accurate, and what is the frequency/probability of such recurrence?

This question is not to be mistaken that we could ignore certain levels of error in the system and could be unattended. Rather this pointer helps in understanding what are the processes that could be considered as business critical, and the inputs gathered here could and should be used in designing a solution to handle errors – especially the business across all the modules based on importance that is agreed with the business stakeholders.

Although there just three questions in this NFR, all inputs work in tandem with other NFRs helping to build in a less-erroneous, fault-tolerant and robust CDI. There is lot of business importance to these questions which actually feeds into the solution building, designing of components, modules, and data model.

So please share your thoughts or feel free to comment on this NFR.

Cheers!

CDIs-Non Functional Requirement–Scalability March 27, 2013

Posted by msrviking in Architecture, Business Intelligence, Data Integration, Integration.
Tags: , , , , , ,
add a comment

n the last post I spoke about Non Functional Requirement – Performance, and in this post I am going to talk a bit about Scalability and this  is a closest parameter related to performance too. Generally during in a requirement gathering  phase this requirement documented as – need a scalable solution and should have high performance or quick response time. Well I don’t think I would write this note technically, but if anyone else I would expect this type of “single line” statement.

What is the definition of scalability, and what should be those finer line items that needs to be considered for a CDI solution to be scalable?

Scalability definition from Wikipedia (http://en.wikipedia.org/wiki/Scalability)

Scalability is the ability of a system, network, or process to handle a growing amount of work in a capable manner or its ability to be enlarged to accommodate that growth.

and the definition of the related word Scalable system is

A system whose performance improves after adding hardware, proportionally to the capacity added.

All these sound simple, yes, true but these sentences have great meanings inside, and to realize this as implementation is the task for any engineer, manager, technology specialist or an architect. Now this is done, let me get into next point and the most important one for this post.

Scalable database system is the system that can handle additional users, a number of transactions and more in future, data growth and all this with speed, preciseness, accuracy. Any system can be scaled up or out, and when it comes to database system like SQL Server, a scale up option is most easiest but could get costlier, and scale out needs to be well planned. When I say well planned it starts from non functional requirements understanding, solution design, design of data model, development, capacity planning with infrastructure setup.

Now that we know definition of scalability and scalable database system, I would like to list down those questions that helps me in deciding the needs of a scalable database system in a CDI solution.

  • What is the expected annual growth in data volume (for each of the source systems) in the next three to five years?

This is in same line to what a scalable database system should meet, however the importance is more when dealing with CDI – a data integration solution. An approximate projection of data growth of different data sources would help in deciding if the CDI solutions database could handle that growing data. The volumes of data plays very important part in meeting your response times or the SLAs for data processing, and then publish for analytics consumption.

  • What is the projected increase in number of source systems?

Similar question as above, but an idea of how many data source systems should be integrated along with the volume of data generated from these systems would help in knowing the volume of data that needs to be handled.

  • How many jobs to be executed in parallel?

I couldn’t disagree with an already documented point on this question, so mentioning as a repeat of someone who said about this meaningfully.

There is a difference between executing jobs sequentially and doing so in parallel. The answer to this question impacts performance and could affect the degree to which the given application can scale.

  • Is there any need for distributed processing?

An answer to this question will give a thought of existing systems distributed processing capabilities or the capacities, the organizations policies towards distributed processing. But this is at the level of understanding the the clients requirements, and should be taken as only leads or inputs to decide if distributed processing should be considered or not. A distributed processing of data is double-edged decision, and needs to be considered for solution after plenty of thought given on trade-offs.

The list of questions for scalability is short, but all these are essential to be addressed to have a high-performing and scalable system. I have nothing to give as an expert advise, but the last suggestion is to get these responses to have predictable behavior of any CDI solution.

Please feel free to comment and let me know your thoughts too.

CDIs–Non Functional Requirement–Performance March 12, 2013

Posted by msrviking in Architecture, Business Intelligence, Integration.
Tags: , , , , ,
add a comment

In today’s post I will talk about the NFR which I consider as one of the most important for the stability of the system for a CDI implementation – Performance.

This topic is vast and intense and there are several factors that needs to be considered while architecting or designing a system. A thought passes through my mind whenever, and even now – writing principles to achieve performance and scalability is easy which is mostly scholarly but the challenge lies during realization phase. If a systems performance and scalability has to be achieved then the appropriate solution blue print, right design, best coding, precise testing and deployment has to be in place. See as I said the moment this topic is opened up so much of lateral and tangential thinking comes into place. Now I am going to hold this and let’s talk about the above NFRs significance and then how it could be used for building CDI solution.

Performance – A short description and explanation that I picked from one of the sources I have shared in my earlier post, but modified for my better understanding and of course to suit the solution that I was building for the client.

So here it is, “performance is about the resources used to service a request and how quickly an operation can be complete”, e.g., response time, number of events processed. Its known that in a CDI type solution where the data is processed in GB to TB, and in large number of cycles response time could be less appropriate, but identifying the pre-determined response time for a particular process will help in defining the SLAs (SLA not from monitoring or maintenance perspective) for the downstream applications on when the data would be available.

I will give a short explanation here on what I intended by defining the performance. Say for e.g. I have to bring data from different data sources, apply lot of cleansing rules on the in-flow data, massage it for the intermediary schema, go map and publish per need of business and this is done on few millions of rows on different tables, or on GBs of data. I can’t assess the performance of batch applications at row basis or seconds basis. In a transactional system I would have gone by response time to be in seconds, and this doesn’t mean I can define that the job /batch application should finish the processing in an hour to 4 hours defined in the maintenance windows. I would prefer to abide per business needs of maintenance windows, however while looking for information on this NFR I am pre-defining the systems behavior – predictably, and also ensuring that to meet this NFR I need to have batch applications, real-time processing, appropriate design to handle volumes and velocity of the data. Yes, too much of thought process and explanation but the short message of the story is that “Meet the business needs of maintenance window by defining the SLAs for the system”. This is key for an integration solution or in specific to CDI.

Now I shall get back to the questions that I posed across to assess on what is the performance or rather SLA levels should the system adhere. The below list of questions are repeat of my previous post, but I shall put a bit of more text on why this question was asked.

  • What is the batch window available for completion of complete the batch cycle?

The batch window is that number of hours that are available for the CDI system to process data. This usually is mutually agreed inputs from the business and the IT folks, and knowing this information helps in deciding if the system can meet the proposed number of hours with the existing data sources, complexity in the processing logics, and volumes of data that needs to be processed.

  • What is the batch window available for completion of individual jobs?

The response to this question will help in deciding the dependency of batch job, and also in designing batches efficiently.

  • What is the frequency of the batch that could be considered by default?

The inputs for frequency of the job will help in assessing on how the loads on the system would vary during different weekdays. At the end this information would essentially help in deciding the frequencies of jobs, how to optimally use the resources of the system knowing that a typical day may not have the requisite load.

  • What are the data availability SLAs provided by the source systems of the batch load?

The source systems data availability would help in assessing if the proposed SLAs would be met or not. Essentially it is close to realistic if the data availability of the source system is close to what is being considered as SLA requirement for CDI.

  • What is the expected load (peak, off-peak, average)?

This would be more of a response from the IT team who would have set internal benchmarks on peak, off-peak and average load that a particular system should have. The historical data of the existing source systems or integration systems would be good references, and the answer to this question would help in designing optimal batches, frequencies and in proposing the relevant hardware for implementation.

  • What is the hardware and software configuration of the environment where the batch cycle could be run?

Strange question isn’t it!? Yes indeed, but trust me this questions response sets the pulse of what is expected from the CDI’s hardware and software if it is to be procured based on fresh inputs or if the solution has to be co-hosted on the existing systems with a pre-defined or prescribed hardware or software.

The following question will help in digging deeper and further on the standards that needs to be adhered for new solutions.

  • Are the resources available exclusive or shared?

The last and final question in evaluating on what is the expected performance behavior is to find out if the new solution has to be hosted on existing systems or new systems in to-to.

This has been a long post, and I am thinking I should pause for now. In the next post I shall talk about the NFR – Scalability.

Please feel free to comment and let me know your thoughts.

Data type and conversions December 7, 2012

Posted by msrviking in DBA Rant, Performance tuning, T-SQL.
Tags: , , , ,
add a comment

Today I downloaded SQL Server Data Type Conversion Chart from this location. I remember this chart very well and its available on BOL that would have been installed on your system or on MSDN online.

As soon as I saw the chart, it reminded me days where I used to explain to the development on its importance and always push the developers, leads to have a look on this topic in BOL. Today I am going to tell the teams whoever use SQL Server to download this and stick it around as long as they are on development side of database.

One of the points that I leave with the teams is

“Make sure you don’t have variables declared in your code that cannot match the data types that are designed in the tables. This would mean implicit conversions by query engine and when this happens it could lead to performance problems, deadlocks.”

There had been several situations where performance reviews and fixes ended by just change of variables data types in the T-SQL code. I don’t think I am going to stop sharing this point anytime, anywhere to developers forum whatsoever. There are experienced folks who do this mistake even today, and one can’t get rid of basic mistakes unless its part of practice. Practice has to be diligent, and has to be cultivated.

So if you see anyone writing a piece of code that could hit your performance benchmarks badly, please don’t hesitate to give this piece of advice again.

Let me know what you think.

Cheers and Enjoy!