SQL Server Health Check Checklist
Some clients want a FULL SQL Health Check, while some only want to go deep on analysis for a SINGLE database.
This is a SINGLE database health check.
Even though it’s a single db health analysis, we still check for some server/instance level items. That’s intentional.
After analysis is done and presented to you, we will go over everything on the report until you run out of questions.
All of the checks are non-intrusive, very light weight and will not affect production SQL Server. Data collection can be done during normal business operating hours.
This is about 90% of what will happen. The remaining 10% will be spent on items discovered during analysis, requiring more analysis. So we will drill into that more as we find stuff.
Here is what a database health check will (approximately) check for:
Few SERVER level checks (I know this may not make sense, but these data points will direct where problems may exist):
- isolate top server waits since sql instance start
- I/O latencies for all drives
- write and read stalls at each db file level
- are there any I/O warnings?
- which db consumes most CPU, RAM and I/O (not always obvious)
- how heavy is tempdb usage?
- review server waits for 30-60min (ideally during production workload)
- how well do parallelism settings match workload?
- look into SQL Error log, is there anything unusual?
- is deadlocking happening?
DB level – architectural design overview
- db filegroup and file layout. how is this utilized? are files placed on best matching drives?
- find top used objects: look at the sample data, data types in use, indexes, foreign keys, etc.
- check virtual log files counts (VLFs)
- drive latencies and how well db files are placed across storage drives
- file growth settings
- how often do db files grow? what causes the delay?
- db properties: multiple db options checked
- I/O statistics per each file (number of reads & writes, stalls per file)
- read vs write ratio at db file level (this will show if workload is read or write intensive, which changes how tuning is done)
- how majority of TSQL is coming in? is it adhoc SQL queries, or stored procs?
DB level – Top objects and performance checks
- what are the tables with the most activity?
- review top objects (usually up to 5-10 is sufficient) to see how well data type choices were made, indexing, constraints, triggers, statistics on those tables
- review top 5-10 tables/views and see how well indexing is used (is column order OK, etc.)
- review top 5-10 worst performers
- what are the largest tables?
- fill factor check
- which TSQL runs most frequently?
- which TSQL takes longest to execute?
- which TSQL consumes the most CPU, RAM and I/O (disk)?
- which TSQL is in cache now? anything that costs too much in resources?
- which TSQL execution time elapsed has high variance (if TSQL duration is sometimes fast and sometimes very slow, this is an opportunity for tuning)
- how up to date are statistics
- fragmentation levels (review top objects with more attention)
- are plan guides being used?
- are there single-use plans for one query that are wasting memory?
- are there tables without clustering keys?
- is there implicit data conversion happening?
- are there any foreign keys that are not trusted?
- are forced join hints? in use?
- are table triggers in use? if yes, review how bad.
- are there UDFs in use? if yes, review how bad.
DB indexing – or often the fastest way to add speed to db performance
- find missing indexes
- find useless indexes (that cost more to maintain than provide value)
- are there any disabled indexes?
- are there any indexes that can be merged into one?
- how much data is in data vs indexes?
- how many indexes per each object?
- are any foreign keys not indexed?
- missing index warnings for cached plans
- are any tables over-indexed?
- most frequently modified indexes and statistics
- ix compression opportunities
- how are db backups set up for main db?
- last corruption check date
- is transaction log larger than db data file?
Here are some items that go into a FULL database health check, that will not be done on a SINGLE:
- skip: Win OS settings
- skip: Group Policy checks
- skip: SQL Service Pack and Cumulative Update
- skip: sql service and configuration manager settings
- skip: storage/SAN load testing
- skip: SQL memory dumps
- skip: SQL Instance settings
- skip: Trace flag usage
- skip: memory config
- skip: NUMA config
- skip: how SQLAgent jobs are setup?
- skip: security
- skip: elevated permissions
- skip: excessive sysadmin privilege
- skip: maintenance plans (backups, index/statistics optimization, corruption checks, cleanup)
- skip: clustering setup
- skip: virtualization setup checks
- skip: high availability setup
- skip: hardware config
- skip: drive setup for speed
- skip: buffer pool extension
- skip: suspected/corrupted pages
- skip: CPU schedulers
- skip: log check (EventViewer, cluster log, etc)
- skip: number of connections to SQL Server
- skip: page life expectancy
- skip: db owners
- skip: SQL Agent job owners
Want a sample report? Send an email using our contact us form, and say that you saw this post, and I will send you a sample report.
For fellow DBAs – what did I miss? What can be done better?