top of page

Monitoring SQL Server for FREE with Telegraf, Influxdb & Grafana. The wonderful thing about TIGgers.

... Is Tiggers are wonderful things.


So, I love monitoring. I openly admit to having a thing about data and graphs that are showing me at a glance how badly my day is going, about to go, or why it was painful before.


We've all been there, you get a ticket, message, call, hulk slam from the boss as something isn't working any more. You check, only to find that everything looks OK and there is no immediate issue. But now you have to perform a root cause analysis of the non-issue and need to see what the state of play was at the reported time of 4am earlier that morning... righto, better fire up the DeLorean.


If you can't see performance metrics as they were, how are you supposed to know what happened? What will help?


The Sql Errorlog?

The System Health Xevent?

The (one) Ring buffer?

.. Henry, the mild mannered Janitor?


These all retain data don't they? Yes, they certainly do, but only for specific cases, and not a general overview of what we may need to look at.


Phooey! There are a lot of different subsystems, it could be anywhere.


Monitoring systems to the rescue, they're great. But unfortunately they also cost, and for those with large estates, that cost soon adds up. Some shops only licence the more critical servers, and this is a valid approach.


If only it also excluded these 'Tier 2' servers from having issues....

... that would be great.

... but it doesn't.

... darn.


We need visibility of what's going on all servers. Curses to that restrictive budget.

What are our free options? Either build something ourselves, or go opensource.


"I haven't got time to design, build and maintain my own solution, and opensource??! Are you mad you orange stripey bouncing fool?"


The lack of time I can completely understand, but a rejection of opensource? Times have changed, and opensource solutions have a place in modern solutions and SQL may not always be the best fit for our requirements. <Traitor>


So you want an open source monitoring solution, that can monitor many different subsystems, including SQL Server. In steps: Telegraf, Influxdb, Grafana. Or TIG for short.


Telegraf

This is the collection agent on the source servers, in our case, the SQL servers. It polls the instance(s) from locally on the server, and then dependent on the configuration, sends the metrics to the data respository we choose...


InfluxDB

... which in this case, is InfluxDB. This is an opensource Time series database designed to work with data structured by time data, perfect for the collections that we have.


Grafana

We have all of our magestic data now, but who can make immediate sense of pages and pages of scrolling data vertically traversing the screen? Kiannu Reeves and his Mr. Anderson alter ego perhaps, but for us who 'aren't the one', i'd like a visualisation please. Thats where Grafana comes in, as it perfect for showing us our timeseries data. It also self refreshes too, making our dashboards near realtime should we desire to have Telegraf polling as such.


Near realtime monitoring and dashboards, for free? And before you ask, no, I haven't been on Pooh's special prescription honey either. It's true, we can have such things.


There's quite a few articles out there on how to spin this up on Linux, but a lot of us still work with predominately Windows estates so I'm going to talk through setting it up purely on Windows, and provide configure templates and dashboards you could use.


 

Installation


InfluxDB

Lets start with InfluxDB.


From here, choose the latest version, and Windows Binaries



This will give you the Powershell command to grab the files,


wget https://dl.influxdata.com/influxdb/releases/influxdb2-2.7.1-windows-amd64.zip -UseBasicParsing -OutFile influxdb2-2.7.1-windows-amd64.zip

and then unpack them to the path required.

Expand-Archive .\influxdb2-2.7.1-windows-amd64.zip -DestinationPath 'C:\Program Files\InfluxData\influxdb\'

Lets have a look...theres another folder \influxdb2_windows_amd64. And underneath there, is an influxd.exe file, which is the main InfluxDB binary. I'm just keeping these paths as default, you can change them to whatever is best for yourselves.




Run influxd.exe from a command prompt. 

Influxd.exe –-bolt-path "C:\Program Files\InfluxData\influxdb\influxdb2_windows_amd64\.influxdbv2\influxd.bolt" --engine-path "C:\Program Files\InfluxData\influxdb\influxdb2_windows_amd64\.influxdbv2\engine" 

Confirm that InfluxDB starts, then Ctrl-C to stop it.


Note: the bolt and engine path are to move internals away from User profile paths, to which it uses by default, which we don't really want here.


InfluxDB Service Creation

So we now have a file that when running, starts our Time series database engine. But that really isn't suitable by itself, as this is windows, we want this running as a service, and nothing else will suffice. Alas, Influx don't supply a windows installer for it (still!), so we'll have to wrap our own.


In step NSSM - The Non-Sucking Service Manager

...No really, thats what its called. I haven't made it up...honest...go look!


Download nssm 2.24 (zip) NSSM - the Non-Sucking Service Manager

Unzip nssm.exe from within there and put in a new folder called nssm (I put it in same folder as influxd above - so C:\Program Files\InfluxData\influxdb\influxdb2_windows_amd64\nssm)

 

Create a batchfile nssmInstall.bat containing the following text: 


nssm install InfluxDB 

Save this in the nssm\win64 folder.


Run nssminstall.bat and fill in the tabs on the window that is displayed. Par example:





Application Tab

Path: C:\Program Files\InfluxData\influxdb\influxdb2_windows_amd64\influxd.exe

Startup C:\Program Files\InfluxData\influxdb\influxdb2_windows_amd64\

Arguments –-bolt-path "C:\Program Files\InfluxData\influxdb\influxdb2_windows_amd64\.influxdbv2\influxd.bolt" --engine-path "C:\Program Files\InfluxData\influxdb\influxdb2_windows_amd64\.influxdbv2\engine" 


Details Tab

Display name: InfluxDB Service 

Description: InfluxDB Engine Service control.

StartupType: Automatic 

You can leave the other tabs at their default settings, or tailor further to your environments.


Click "Install Service".


And as easy as poohsticks, we now have a Windows service that we can use for controlling our InfluxDB service. Lovely.



Now, we'd better configure it to our needs.


Open a browser to http://localhost:8086. If running remotely, just replace localhost with the server name where your Influx service now happily lives. This takes you to the onboarding page. Ensure that you get these details correct.



Click "Get Started". Then we're prompted for the initial details, treat this user like an 'sa' equivalient if you will.


You need to supply an Organization Name, this allows a basic grouping, and an initial 'Bucket' , which is like a database. Both are used in our Telegraf config also.

Beware: The case sensitive police are on duty here!



Continue, and now SAVE THE OPERATOR API TOKEN and store somewhere secure. I don't mind showing mine below, as this install will be wiped following the creation of this blog. :-)



We're feeling bold and brave, let's click Advanced.


Influx config is now actually complete, and returning to http://{servername}:8086 will bring you to the same page as the Advanced button. Providing the service is started of course, its clever, but not that clever.


This presents your main page.

Load Data Panel



Buckets tab

Exactly what it says, the place where all our buckets reside, and there's our telegraf bucket we created earlier. By clicking on Settings you can see and set the retention period for the data. It defaults to 'forever', so set as you need. As this is timeseries data, thats it, Influx handles the deletion of data behinds the scenes, with nothing else to do, no fiddling with delete scripts, FKs, partitioning, nothing. Happy Days!


Telegraf tab

Here, you can create a specific config file for your Telegraf requirements. Did I mention that Telegraf and Influx are designed to work together? It's starting to show. Feel free to have a play around, and choose the source(s) you would like, ie... SQL Server. I'll discuss the SQL Server telegraf configuration below, and attach a copy that can be used so will leave it up to you.



But pause for a second, and have a look at the possible inputs in this screen. We're using it to monitor SQL Server, and I promise that we will do exactly that... but what are all of these other sources?


That is everything else that Telegraf can be used to monitor...

...thats a big list.

...thats a big, magnificient, cover many systems monitoring requirements with one tool list.

...oh, and its free. Did I mention that?


API Tokens tab.

Think of these like your service accounts that have access to your buckets and data. They are just a long string like the 'administrator's (sa) token we created above on initial configuration.


Generate a new Token or 2.


Again, the token itself is only displayed once! Record it or lose the ability to make use of it.


One for Telegraf, that can WRITE into the telegraf bucket.

And other for Grafana, that can READ from the telegraf bucket. (You could use one token for both, but I like to separate based on the principal of least permissions.)



You'll end up with something similar to this:







Data Explorer Panel

The other Side Tab I like to make use of is the Data Explorer, and it is exactly what is sounds like. You can view and see what data is where, perfect for discovery when creating our visualisations, have a play later after you've gotten set up and looked at the provided dashboards.

So, we'd better get a move on and get some data eh?


Telegraf

So we have our Data repository, we need some monitoring data to send to it, this is where Telegraf comes in. Simply put, this provides the collection and transmission of lightweight server/appliction metrics. Boom, done.


Download the Telegraf agent 


wget https://dl.influxdata.com/telegraf/releases/telegraf-1.25.0_windows_amd64.zip -UseBasicParsing -OutFile telegraf-1.25.0_windows_amd64.zip

  

Expand the archive, this reveals Telegraf.conf and Telegraf.exe. 

You can rename the Telegraf.conf file to something more descriptive, ie.. sqlserver_telegraf.conf 


At this point, you may want to either use a generated telegraf.conf file from the Influx administration pages above, or just use the one i've attached. The standard config file in the downloaded archive contains lots of sources, so potentially scary to look at the first time, and you don't need that much for this.


Edit the telegraf.conf file  (or whatever you've renamed it to)


Find the [[outputs.influxdb_v2]] section:


Edit this so it references your InfluxDB repository, something like http://myinfluxservername:8086


Copy the telegraf API Token from what you created and saved earlier (...you did save it off didn't you? Yes, of course you did...you're awesome.)


Copy this Token value to the [[outputs.influxdb_v2]] section under 

Token = “ “  - replace $INFLUX_TOKEN 


So it will look something similiar to my version below. Remember to uncomment the line for all that you amend!




Now find the [[Inputs.sqlserver]] section


Add a connection string for the instances on the server that you wish to monitor: ie... just using localhost below as Telegraf will run on the same server as SQL:



Note the comma (,) after the connection string. This is deliberate and allows mulitple connections strings to be separated in the same config file. ie.. .for monitoring multiple instances on the same server. The above example also specifies a port number, you can use this, or the SQL server\instance name.


NOTE: When using the instance name, remember to escape the backslash character. ie... Server=Servername\\instance. Otherwise you'll get a horrible scary error message.


The above uses a SQL login, but Windows Integrated authentication can also be used to provide more security by just setting the service account for the Telegraf service. This would use a connection string like the below. This is a much more secure method and i'd thoroughly recommend it, as use of SQL logins requires exposing of the password in the config file, regardless of how low priviledged an account it is. (see below)

Server=localhost;Port=1433;Database=master;Integrated Security=SSPI;log=1;

Some other areas that you may wish to take note of in the configuration file.


Database Type

SQLServer, AzureSQLDB, AzureSQLManagedInstance or AzureSQLPool.


Yes, thats right, telegraf can pull metrics from any of the above types. This blog post is just covering SQLServer, but stop and have a think about exactly how powerful this tool can be in the great Observability Wars...


But what is it pulling? - You guessed it, you can configure that as well. Also in the configuration file, you'll find details of what each Database Type will pull by default as shown below.


Exclude as required in the exclude_query line above. It really is that simple. I like simple.


Polling

Near the top of the config file, you will find the [agent] section.

Interval - How often you are polling - i've used 60 seconds below.

Round_interval - use this for staggering the timing of the collection.




Jitter

A little further down, you will see the Collection_jitter line. Use this to introduce a random delay between connections. This is useful to smooth out the agent activity to the repository if polling many clients.



You've noticed by now that the config file is very explanatory, and makes most things easy to understand without me writing about it here.


Final Install steps

Now we need to just add the files to the server:


On the client SQL server to be monitored. 

  • Create a folder

ie: C:\Program Files\Telegraf 

  • Copy the telegraf.exe and sqlserver_telegraf.conf file to our new folder.

  • Open up PS/Cmd Admin window 

  • Navigate to the new folder that C:\Program Files\Telegraf  and run:

telegraf.exe --service install --config "C:\Program Files\Telegraf\sqlserver_telegraf.conf" 

A full listing of commands is available by running telegraf --help


Now we have a Windows service for our metrics collection.


SQL Access Requirements

Our monitoring service needs some access in order to be able to, err...monitor. These requirements are very lightweight or I wouldn't like this solution so much. Pretty much any monitoring system worth its salt requires this and often more.


Run the below on the SQL Server instance(s) to be monitored.  Again, i'm using a SQL login that matches my config file above as I don't have a Active Directory domain at home but i'd suggest that you use a domain account.


USE master; 
GO 
CREATE LOGIN [telegraf] WITH PASSWORD ='T1gg3rSLOVEtoBounce!'; 
GO 
GRANT VIEW SERVER STATE TO [telegraf]; 
GO 
GRANT VIEW ANY DEFINITION TO [telegraf]; 
GO 

As promised, VERY minimal permissions which should be justifyable to any strict Security manager you may have.


Now start your Telegraf service you installed above.


BOOM, done. We now have a windows service that is controlling our Telegraf metrics collection, and its throwing our metrics at our timeseries database repository.


Hoohoo-hoo-hoooo!


Grafana


Grab the latest version Windows installer from the below link:


Microsoft Edge may get uppity around the file being unsafe. Just to confirm, it IS safe and a valid toolset. *cough* use Chrome.. *cough*


Run the installer, leave options as default and its just a Next,Next, Finish job and this will also install as a service. 


Confirm the “Grafana” Service is Running. 

Test Access, open a browser to port 3000.




Enter admin/admin as the initial username and password. Grafana will prompt you to change the password.






 





Visualizations


So now we have Telegraf pulling our Metrics, storing them in InfluxDB, and Grafana connected to this as a Datasource.


Now lets make pictures...pretty, pretty pictures man.... Ooops, sorry, I may have jumped shows to the Magic Roundabout then...my bad.


Sign in to grafana (http://mygrafanaserver:3000) if not done so already.



Click the options bars next to Home, and the 2 areas we are interested in are Dashboards, and Datasources.


First off, we need to create the datasource that connects Grafana to our InfluxDB repository, so click on Data Sources.






Data Sources

There are 2 types of query language that can be used with InfluxDB, either InfluxQL, or Flux. InfluxQL is less advanced than Flux, but relates close to native SQL, so this is what we will set up as the data source to the Telegraf bucket. (And the fact that all the dashboards below are based on InfluxDB too)


Select Add New connection, then choose InfluxDB.



  The options to choose are below.

Query Language : InfluxQL - Set this as the default connection.


Custom HTTP Headers


Click + Add header

Header: Authorization Value: Token {Grafana Token from InfluxDB}

(The word Token, followed by a space, then the Grafana API Token value)


Again this is one of the Token API values you created and made a note of early. We're using the Grafana token as it only has read access to the bucket.



Database: telegraf


It should look something like the below. Click Save and Test to ensure all is happy and well.


We now have:

Telegraf connected to our Influx...

Influx connected to our Grafana...

...grafana connected to our thighbone..

...doin' the skeleton dance!


Oops...sorry about that, where was I? ah yes...


Dashboards


This is all about getting our performance metrics into some easily readable formats, ie... graphs and charts. So we'll add a couple of dashboards into Grafana that will display our telegraf data.


Grafana has a magnificient community of users, and a section purely for Dashboards to be created. Dashboards | Grafana Labs


My detailed dashboard below was based originally on Grafana Dashboard ID 5040, authored by Red Morris (https://grafana.com/orgs/redmorris). It may have changed a lot since, but the dashboards on there give a wonderful starting point to see how things fit together.


Click on Dashboards


Click on New (you can create folders from here if you desire) ,we're going to click Import as we are going to bring in an already created dashboard.

At this point, you can just enter in the ID of the grafana dashboard you'd like to use, or copy/paste the Json of the dashboard into the window.


I'll add the json in from my dashboard (see attachments at the end of this article).


Click Load












Now enter the name you would like for the Dashboard, the folder you'd like to import it to, and the Datasource name that you want to connect it to. Luckily we created this earlier.


Click Import






The Lions, the Tiggers, the bears, oh my!

What's that given us now?

Clicking on the SQL Server Detailed Dash, under instance - you'll see the name of a collected SQL Server - in this case, simply LAPTOP2. Any other servers you have telegraf installed on and configured to send metrics to the Influxdb repository will appear in this drop down list. Cool!


Note the time slicer at the top right. Last 30mins here.


Now lets expand on some of the panel headers.


Nice!


Now you're talking!

Wait Stats! Lovely Lovely Wait Stats too!

And look, all of this is time sliced.. Try clicking and dragging a section. You'll find it auto zooms in for you REALLY quickly.


Now go to the time slicer at the top right. All of these time values and more are there to be selected. So if you needed to see what went on overnight? ...you can.

Last week? ...you can.

Yesterday, at 3:57pm? You can.


You just can. The DeLorean has left the building tomorrow.




We also have size/capacity data.





IO...

Latency...


Plus a load of other metrics along with it, have a play though, you'll find so much data. Now I know the above information is far too much for a single dashboard, its giving metric overload. BUT, all of these panels can just be copied into new dashboards, so you can chop up what is intresting to you.


So lets have an overiding Estate Health dashboard as well.


Ah... that's better. showing us the main points from 10000ft.



Only on SQL instance shown here, my trusty old laptop. But, as before... as you add telegraf on to more servers, configured to point to this InfluxDB data repository, they will appear automatically in the dashboards above.


Links: Click on the underlined server name in the blue box, it will direct you to a relevent dashboard for granular details. Performance and Health link to the SQL Server Details dashboard for the relevent server.


(NOTE: These are set to localhost:3000, amend the links in the Overrides for the fields to a url valid for your environment).


Automatic Updating.

So you would like the dashboards to automatically refresh, so you always see the most recent data? No problemo. Just set the auto refresh schedule drop down at the top right of the dashboard.


Done. Next!



(Obviously there is no point having an auto refresh value that is less than your telegraf polling interval, but you knew that already. )


Capacity

Its always there, lurking with intent to cause us headaches.

"How much space do you need over the next X months, years"...

"Were trying to organise budgets and need to know how much you may grow"... etc


There is also a SQL Server Capacity Estate visual.


You will need to set relevent drive letters in the visualisation properties, mine are just set to C: as an example. This works out the delta between the start and end periods of the Timeslice you are looking at. In this case last 24 hours.


It shows, on LAPTOP2, the data change is -300MB (so i've used 300MB of space in the last day). If the use of this was to continue at this rate, it would run out of space in 912 'periods'. A period here is what the timeslice is set to , so i'd be out of space in 912 days. Basic yes, but better than nothing. Particularly when the time slicer is set to a week/a month, so you can get an idea of what disk your SQL estate is gobbling up.


Easy street.


And all of this won't cost you a penny, cent, dollar, pound, Euro ... nothing.


Custom Monitoring.


Oh, but you have custom SQL Statements that you need to run for your monitoring? I haven't discussed it here, but check out the SQL (not SQLServer) input plugin. You can send SQL commands via telegraf.


oooooh yeah........



SQL Data Repositories.

But I have some metrics stored in SQL tables already with timestamps already!


Have no fear, just create a Data source and start visualising that as well. You'll need to include some Grafana functions so its timeseries happy, and have those datetime columns indexed correctly (Clustered works best for here) but its all very workable.


Check out:


and



Finally...

Opensource isn't what it used to be. Telegraf is increadibly powerful, and grafana perfect for displaying. A monitoring system framework in place in the matter of an hour? Combine this with requests monitoring (Telegraf does also collect requests), or even a simple whoisactive logging to a table.


Or... the subtitle of this blog post should have been: "what else you could reuse my monitoring budget for?"


Hope this has given some ideas of what you can do with monitoring Tech. Remember, this is just SQL discussed above, but Telegraf has all of those other inputs available. Check out my ramblings on trying to get towards Observerability while you're on.


Cheers

Christopher Roddin.


 

Attachments:


sqlserver_telegraf.conf
.txt
Download TXT • 10KB

Capacity Estate
.json
Download JSON • 26KB

Detailed
.json
Download JSON • 269KB

Estate Health
.json
Download JSON • 75KB





bottom of page