Microsoft SQL Server Hardware Evaluation and Sizing – Best Practices Guide (Part 2)

October 2nd, 2014 / 1 Comment » / by admin

Continuing on from the FIRST post to this series, this installment focuses on arguably the most important aspects of SQL Server hardware sizing – CPU, RAM and storage selection. Let’s start with a processor selection.

CPU

Central processing unit is the heart of every system and that is particularly evident when dealing with database workloads as CPU selection drives other components selection e.g. how many CPU sockets you can have in the server, physical memory capacity, I/O capacity and upgrade path etc. SQL Server can work with both – Intel and AMD processors, however, there are some distinct differences between the two options. Intel has been dominating the single-threaded performance race for a long time now whereas most modern AMD CPUs have a high physical core count which sometimes makes them more suitable, when compared to Intel counterparts, for certain types of deployments e.g. SQL Server Business Intelligence edition (licensed per server + CALs rather than cores). Starting with SQL Server 2012 version Microsoft introduced SQL Server Core Factor table which allows for up to 25 percent discount when licensing a server with AMD CPUs, however, AMD CPUs are still difficult to recommend given their lower overall single-threaded performance and increased licensing cost. As a result of that I will only be discussing Intel CPUs selection for SQL Server deployments.

Based on the current range of CPUs from Intel (Q1, 2015), the following processor models constitute my recommendation selection for both OLTP and DW/DSS workloads based on the socket types.

SQL_Hardware_Eval_2_sockets_OLTP

SQL_Hardware_Eval_2_sockets_DW

SQL_Hardware_Eval_4_sockets_OLTP

SQL_Hardware_Eval_4_sockets_DW

SQL_Hardware_Eval_8_sockets_OLTP

SQL_Hardware_Eval_8_sockets_DW

As new and improved processors are introduced the above list will gradually become less applicable so make sure you look at Intel website for new or upcoming releases. Also, rather than just taking a blind shot at the CPU model, it is always a good practice to analyse CPU’s performance using benchmarking tools or at least look at the scores obtained by the systems utilizing those chips if available online. The two most commonly used ones are TPC-E (OLTP focused) and TPC-H (Decision Support focused) benchmarks.

TPC Benchmark™ E (TPC-E) is an On-Line Transaction Processing (OLTP) workload developed by the TPC. The TPC-E benchmark uses a database to model a brokerage firm with customers who generate transactions related to trades, account inquiries, and market research. The brokerage firm in turn interacts with financial markets to execute orders on behalf of the customers and updates relevant account information. The benchmark is “scalable,” meaning that the number of customers defined for the brokerage firm can be varied to represent the workloads of different-size businesses. The benchmark defines the required mix of transactions the benchmark must maintain. The TPC-E metric is given in transactions per second (tps). It specifically refers to the number of Trade-Result transactions the server can sustain over a period of time. Although the underlying business model of TPC-E is a brokerage firm, the database schema, data population, transactions, and implementation rules have been designed to be broadly representative of modern OLTP systems.

Current (18 March, 2015) top TPC-E performers are as per the screenshot below.

SQL_Hardware_Eval_TPCE_Top_Perfomers

The TPC Benchmark™H (TPC-H) is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions. The performance metric reported by TPC-H is called the TPC-H Composite Query-per-Hour Performance Metric (QphH@Size), and reflects multiple aspects of the capability of the system to process queries. These aspects include the selected database size against which the queries are executed, the query processing power when queries are submitted by a single stream, and the query throughput when queries are submitted by multiple concurrent users. The TPC-H Price/Performance metric is expressed as $/QphH@Size.

Current (18 March, 2015) top TPC-H performers are as per the screenshot below.

SQL_Hardware_Eval_TPCH_Top_Perfomers

Storage

Properly configured storage, another key component of deploying a robust SQL Server infrastructure, can have a drastic impact on how you database performs. There is a multitude of factors to consider e.g. budget, server usage, your existing hardware, performance expectations etc. but the first one to consider should be the expected workload type. As discussed in PART 1 of this series, SQL Server can perform a few different types of workloads e.g. OLTP, DW/DSS, OLAP or a mixture of those. These workloads have different I/O access patterns and the choices in storage should reflect those. Here are some of the most common storage activities and characteristics performed by the database that reflect those.

  • OLTP workloads – have frequent writes to data files and log files as well as frequent random reads from data files if the database does not fit into memory.
  • DW/DSS workloads – large sequential read from data files.
  • OLAP workloads – frequent, random reads from cubes.

SQL Server can be typically deployed onto five different types of storage which as was the case with the workload types, will affect performance levels. These are:

  • Internal drives – your traditional magnetic drives or more modern solid state drives (SSDs)
  • PCI-E storage cards
  • Storage Area Network (SAN)
  • Direct-attached storage
  • Server message block (SMB) file shares (Windows Server 2012 and higher)

Nevertheless, storage type is not the only culprit of poor storage-related performance issues. These can also come from RAID level that the disks have been provisioned with, number of disks, disk speed e.g. 10K drives or 15K drives, bandwidth of the RAID controller and many more so it’s important to not only look at the storage medium type but also at the extra features it carries and how it has been provisioned/configured. The following ’10 commandments’ typically apply to a holistic storage configuration for SQL Server.

SQL_Hardware_Eval_Storage_Config_Recom

Event though the above applies to storage configuration in general, regardless of whether the workload targets OLTP or DW/DSS environments, data warehouse systems rely heavily on storage access speed as the copious volumes of data cannot be cached in memory. Selecting the fastest storage medium and configuration with the best sequential read speed e.g. PCI-E cards in RAID1 array will provide a significant performance boost. If PCI-E card(s) cost is considered prohibitive, solid state drives can deliver much better performance over magnetic disk drives on a budget whilst including simplified management, lower operational costs, and predictable maintenance. The following advantages typically characterize SSD storage and its choice over magnetic disk drives.

  • Simplified management – Solid state storage does not require fragmentation management. The SQL Server startup option –E should still be used, but no further optimization or management of page allocation is required. This simplification makes long-term management of FTDW environments significantly easier. In addition, larger disk groups and lower volume/LUN counts can be used with no negative performance implications. This change simplifies filegroup creation and maintenance.
  • I/O resiliency – Solid state storage has minimal performance degradation under high concurrency or page fragmentation. In addition, mixed random read (seek) workload does not negatively impact large request (scan) I/O patterns.
  • Predictable maintenance – Many solid-state storage options provide software based write-life monitoring with lower frequency of difficult-to-predict physical failures.
  • Lower operational cost – While more expensive at list-price, solid-state storage offers a more efficient balance of I/O throughput to capacity per unit. Effective FTDW workload I/O rates for 300 GB 10k SAS HDD average 50 MBs. Enterprise MLC SSD delivers between 150 and 200 MBs at 600-GB capacity. Additionally, solid-state storage draws significantly less power, generates less heat, and often supports higher density solutions.

Additional to the storage configuration guidelines outline above, the following adjustments can be employed to maximise DW/DSS performance when coupled with a solid state drives for primary data storage.

  • If mirroring is required, RAID1+0 or RAID5 can be used. RAID5 provides the best capacity at no performance penalty for FTDW workloads on solid state.
  • LUN and volume count can be reduced to as low as one PRI volume per storage unit. It is helpful in some cases to have the PRI volume count be a multiple of the CPU core count. The minimum PRI volume count is two.
  • The transaction log can be placed on solid state as well, but FTDW workloads are not typically log-bound. Cost can be reduced by placing the log on traditional HDD. The same is true of local storage for Windows Server and SQL Server installation.
  • Recommendations for page fragmentation management and cluster index parallel loading can be ignored, because logical database fragmentation does not impact solid stage I/O performance.

In order to evaluate the existing storage performance or to approximate the requirements for the storage system that would resolve the most common storage-related issues i.e. low IOPS and throughput and high latency, there is a number of tools that can be used to facilitate this analysis. The most common ones are SQLIO from Microsoft and CrystalDiskMark, both free. The latter one can be downloaded as a portable edition (no installation required) from HERE and the rest is very simple as the interface is simple and intuitive. CrystalDiskMark generates sequential & random, read & write loads at your storage system in 512KB and 4KB chunks. This gets you quick answers, but not necessarily SQL Server-related – after all, SQL Server doesn’t read or write in 512KB or 4KB chunks. Nevertheless, this tool is a quick and straightforward way to get a glimpse into your storage throughput capabilities as per the images below (samples from my Samsung M.2 SATA/600 SSD and Patriot Rage XT USB 3.0 flash drive for comparison)

SQL_Hardware_Eval_CrystalDiskMark_Sample_Perf_Results

SQLIO, on the other hand, is a little more comprehensive tool that even though has nothing to do with SQL Server (as the name may imply), it supports a large number of options and provides a more comprehensive view of the storage subsystem. Brent Ozar’s blog and THIS POST has a good overview on getting started with using SQLIO so I will not go into details here but even though the tool has no GUI (most of testing is done in the command line), don’t let that scare you off – with a little bit of notepad magic and Google the implementation is very straightforward. Below is the output of my laptop’s storage (single Samsung M.2 SSD in Lenovo T440s) performance test using SQLIO with core performance measures highlighted.

SQL_Hardware_Eval_SQLIO_Sample_Output

Memory

SQL Server memory upgrade/selection is probably the cheapest way to boost its performance e.g. at the time of writing this post Amazon had a 128GB of DDR3 memory available for around 1,500 dollars which compared to the price of SQL Server licence or changing an application code seem like a rounding error. For DW/DSS systems, in their Fast Track Data Warehouse Reference Guide (SQL Server 2012) Microsoft recommends a minimum of 64GB for a single-socket server which grows linearly with the socket counts available in the system e.g. two-socket server – 128GB, four-socket server – 256GB etc. For maximum memory recommendation, doubling the minimum is a good estimate as per table below.

SQL_Hardware_Eval_MS_Memory_Recom

The following considerations are also important to bear in mind when evaluating system memory requirements:

  • Query from cache – Workloads that service a large percentage of queries from cache may see an overall benefit from increased RAM allocations as the workload grows.
  • Hash joins and sorts – Queries that rely on large-scale hash joins or perform large-scale sorting operations will benefit from large amounts of physical memory. With smaller memory, these operations spill to disk and heavily utilise tempdb, which introduces a random I/O pattern across the data drives on the server.
  • Loads – Bulk inserts can also introduce sorting operations that utilise tempdb if they cannot be processed in available memory.
  • xVelocity memory-optimized columnstore index – Workloads that heavily favor columnstore index query plans run more efficiently with memory pools at the higher end of the ranges listed in the table above.

Naturally these recommendations will need to be reconciled against the SQL Server edition you plan on running as different versions have different limitations on the amount of memory used by the database engine or other SQL Server components (see PART 1 for specifics) e.g. SQL Server Standard Edition (2014 version) allows up to 128GB for database engine and additional 64GB for SASS and SSRS.

When it comes to OLTP workloads, the general rule of thumb is that as long as the database (including indexes) fits into memory, than the performance should be adequate, however, when sizing memory for a new server more is always better. If using Standard or BI Edition (SQL Server 2014), allocating as much memory as the licence permits i.e. 128GB for database engine is a good practice. When running SSAS or/and SSAS on the same server Microsoft allows addition 64GB for Standard Edition and as much as the OS can take for the BI equivalent. Things get a little tricky when dealing with Enterprise Edition where, in theory, you can allocate as much as 4TB of RAM for a single instance if running on Windows Server 2012 and higher. If unsure what the adequate size should be buy as much as possible, otherwise running a number of tests and evaluating system performance in view of memory usage should shed some light on the actual requirements and at what point your memory is under strain.

One of the best places to start conducting capacity planning by measuring performance statistics is Performance Monitor, otherwise known as Perfmon. Perfmon allows measuring various performance statistics on a regular interval, and saving the results into a file for further analysis. The database administrator picks the time interval, file format, and which statistics are monitored. SQL Server provides numerous counters for monitoring and evaluating performance of your system. Although there are no hard and fast rules for performance monitoring, you should audit your system periodically to ensure that it lives up to user expectations. The first step in performance monitoring efforts should be collecting baseline information. A baseline has a collection of performance counter values while the system isn’t heavily utilised and performs well. Next you should take benchmarks at various times of day and during different days of week to figure out acceptable ranges for the performance counters you intend to monitor. For example, your tests could reveal that buffer cache hit ratio values between 93.5% and higher are acceptable, but any value below 93.5% indicates sub-optimal performance. Then setup continuous monitoring and define alerts which notify you if any performance counter values fall below (or rise above) a predefined threshold. Although we try to provide some guidelines for performance counter values adequate range for each counter will vary from one environment to next; this is why it’s important to create a baseline and determine acceptable values for each counter. You should watch several memory related counters very closely as they present a clear indication of acceptable or sub-optimal performance.

A large majority of these memory performance counters are available through Memory Manager, Plan Cache and Buffer Manager objects with their associated counters as per table below.

SQL_Hardware_Eval_Memory_Counters

Quest Software also released a handy poster which provides description of key Windows/SQL Perfmon counters with their recommended target values (you can download it from HERE).

Another factor to consider is the effect of different memory configurations on the memory bandwidth of the memory. As you add more DIMMs to a system to populate more of the available memory slots, the bandwidth of the memory can decrease, which can hurt memory access performance somewhat. This effect varies based on the processor and the associated, integrated memory controller that is being used. The latest Intel processors are much less susceptible to this problem. Regardless of this, I still favor using large amounts of RAM, because even ‘slower’ RAM is in order of magnitude faster than your storage subsystem. If you are concerned about this, you should consult the documentation for your server to see the recommendations for how you should configure your memory.

Conclusion

This concludes the second post in this series. The information outlined in these two installments provides a rudimentary ‘check list’ of the core functions that typically require investigating when making a decision on provisioning/procuring a new hardware for SQL Server installation and usually a much more comprehensive approach needs to be taken when evaluating future needs. When faced with an infrastructure upgrade an experienced DBA will carefully establish performance baselines and determine the production capacity needed by an organisation/application to meet current and future demands. Some tools which may assist in capacity planning have already been listed above and typically provide a single point of focus in regards to various pressure points when dealing with different realms of hardware upgrade/update. Sometimes it may be beneficial to use a more comprehensive tooling which gives the administrator more holistic view of system’s performance e.g. SQL Sentry Performance Advisor or Database Performance Analyser. A full-featured trial version of any of these applications can usually be downloaded, installed and evaluated for a period of time so having the ability to view, troubleshoot and report on the system core components’ performance levels can be a great asset and speed up the analysis phase. Below is a screenshot of SQL Sentry Performance Advisor activity dashboard for one of the SQL Server instances (click on image to expand).

SQL_Hardware_Eval_SQLSentry_Dashboard

Regardless of which tool you use to ascertain your performance needs, every SQL Server hardware selection decision, particularly when deploying a new server, should be preceded with a detailed analysis of all the core components making up the platform, not only from the hardware point of view but also factoring in licensing model and projected workload type. There is a lot to take into consideration but a little bit of elbow-grease, backed up by some investigative work and the consideration for best practice guidelines should yield a great performing system with no unexpected or costly surprises.

Tags:

Microsoft SQL Server Hardware Evaluation and Sizing – Best Practices Guide (Part 1)

October 2nd, 2014 / No Comments » / by admin

Note: Part two to this series, including CPUs selection and benchmarking tools overview, can be found HERE.

Introduction

As my last client engagement required me to provide a suitable server hardware specifications for the upcoming data warehouse deployment I though it would be a good idea to revise SQL Server hardware sizing best practices and what considerations should be given to ensure the data warehouse performance is not compromised by the inadequate components selection. The cardinal mistake most professionals make during the hardware assessment stage is that no distinction is made between the projected database workload i.e. OLTP vs DW/DSS and no consideration given to the licensing aspects, which in case of SQL Server can easily surpass infrastructure cost. Also, often very little analysis is conducted to examine existing or projected storage, CPU, and memory requirements. If SQL Server can run on a tablet (think Microsoft Surface) why would anyone in their sane mind procure a hundred thousand dollar server? A database is a database, just like any other application, right? Not quite!

To make matters worst, the sprawl of cloud-based services with their elastic hardware-sizing capabilities has exacerbated this lassitude further and even though the concept of dynamic hardware sizing is a great in principle, there is still a huge number of businesses that can only host their data locally and are not ready for IaaS deployments. As a result, correct hardware sizing is paramount as future expansion can become very cumbersome if under-provisioned. Similarly, if over-provisioned, the hardware could be under-utilized due to licensing restrictions thus making the whole endeavor a rather expensive mistake.

There is a number of techniques and best practice guidelines that can help with appropriate hardware sizing so let’s look at some of the key factors (budget issues aside) that should be analysed in order to make the most accurate decision.

The Importance of Proper Hardware Selection and Sizing

The choice of hardware will have a huge (adverse or favorable) effect on the overall SQL Server cost and performance. As total processor count (with the exception of SQL Server BI Edition) drives licensing cost, making poor selection a potentially very expensive mistake. Unfortunately, most decisions on what CPU to procure are taken haphazardly and with little consideration for the minor details which can influence the overall performance a great deal. Classic examples of this technical shortsightedness is businesses unknowingly purchasing servers with high core-count CPUs or higher socket-count motherboards, falsely believing that a higher core/socket number is always the cheapest option to maximize SQL Server capacity or opting for older, antiquated hardware hoping to save a little bit of money. Similarly, sometimes to much emphasis is put on CPU selection, with little consideration for I/O subsystem performance e.g. required IOPs and sequential throughput, RAID level for performance and redundancy, architectural choices (DAS, SAN etc.), required amount of storage space etc. in view of the magnitude of database workload.

Short Version

When sizing hardware for any database taking advantage of SQL Server technologies there are two approaches one should consider, each depending on the type of workload the database is designed to perform. The grossly simplified distinction concerning performance between OLTP and data warehouse databases is as follows.

SQL_Hardware_Eval_OLTPvsDW_Summary

Based on the above requirements, the following rudimentary recommendations can be made when sizing hardware for the fastest OLTP database.

SQL_Hardware_Eval_OLTP_Short_Version

Likewise, the following can be more applicable when building a DW/DSS workload-oriented system.

SQL_Hardware_Eval_DW_Short_Version

To put some concrete albeit very approximated numbers around these recommendations the following table provides a rough guideline on the core hardware components sizing when building or upgrading SQL Server infrastructure.

SQL_Hardware_Eval_DW_OLTP_General_Specs

If the above still sounds too complex, different vendors offer hardware sizing tools which can provide ballpark estimates based on the number of parameters used when filling out a survey on current or projected SQL Server environment. For what it’s worth I found those hardware recommendations on the excessive side but as the saying goes – “Nobody ever got fired for buying IBM equipment” – can now be extended to Dell and HP as well.

Dell provides their web-based SQL Server adviser under the following link: http://goo.gl/u6X3s3

HP also provides their desktop sizing tool for SQL Server which can be downloaded here: http://goo.gl/jmQos4

TL, DR Version

SQL Server sizing can be a difficult task so let’s analyse some of the core factors that have the biggest impact on the subsequent performance level (this post will be spread across two sections, second one accessible from HERE).

I have divided the most important aspects influencing hardware selection that one should take into consideration into five categories which I will describe in this and the following post. These are workload type, SQL Server edition/licensing, CPU selection, storage selection and memory selection.

Workload Type

One of the first factors to consider when making a decision on appropriate hardware sizing is to consider SQL Server workload type. Typical workload types that SQL Server is usually deployed with are OLTP (online transaction processing), DW/DSS (data warehouse/decision support), OLAP (Online Analytical Processing) and a mixture of those with some backups and maintenance activity thrown in e.g. OLTP database that also serves as a reporting source or DW/DSS database with frequent updates.

The following characteristics overview provide a good summary of the contrasts present between the two most commonly workload types.

SQL_Hardware_Eval_Workload_Type_Diffs

Based on those characteristics, it is easy to see that the two types of workloads require a slightly different approach to hardware selection. Below are some of the key factors that should be considered when choosing and sizing hardware based on the workload type and intensity.

SQL_Hardware_Eval_Workload_Type_Hardware_Diffs

Licence Type

SQL Server 2014 comes in a few different editions, each with its own set of licensing requirements that impact the hardware they may be installed on. The main differences between those flavors are as per below.

SQL_Hardware_Eval_Licence_Restrictions

As you can see from the above, the dichotomies between SQL Server licences can make it tricky to decide which edition would best suit your needs and workload. The important thing to remember is that the core differences between licensing model is based on the paradigm of core vs server-based licensing which is tied directly to SQL Server 2014 editions i.e. Enterprise and Standard editions are core-based (Standard Edition can be licensed as either – core-based or server-based) whereas Business Intelligence Edition uses server-based licensing, allowing for a large number of cores to be utilized without increased licensing cost. What makes it even more complicated is the fact that Standard Edition and BI Edition are limited to the lesser of four sockets or sixteen logical cores for the database engine and 128GB of RAM so procuring, for example, a four-socket server with high core count will mean that not all available resources can be used – a rather expensive mistake. Similarly, licensing cost can far outweigh hardware cost so analyzing hardware requirements and associated cost without looking at the bigger picture can lead to false or incomplete assumptions and bring the cost up by a considerable margin.

Bearing that in mind, the licensing model most appropriate to the given workload and budget requires a careful consideration in view of the benefits and disadvantages it can pose so analyzing a detailed a comparison of each SQL Server edition, not only from the hardware perspective, is a must. A very detailed outline of most of the key features of each edition can be found HERE.

This concludes the first part of this series with the second installment describing CPU, memory and storage considerations available HERE.

Tags: