Wednesday, April 10, 2019

Who is Stealing Your CPU Cycles?

Recently we noticed one of virtual server on Linode performing poorer than what we would expect to. On further investigation, we noticed abnormally high steal (st) time cpu percentage. It was ranging as high 40%. Some examples from top command is below.

top - 11:46:24 up 4 days, 23:07, 1 user, load average: 2.70, 1.51, 0.98
Tasks: 114 total, 3 running, 66 sleeping, 0 stopped, 0 zombie
%Cpu(s): 7.9 us, 5.0 sy, 0.0 ni, 36.6 id, 17.8 wa, 0.0 hi, 0.6 si, 32.3 st
KiB Mem : 4039472 total, 117044 free, 1708724 used, 2213704 buff/cache
KiB Swap: 524284 total, 523760 free, 524 used. 2074000 avail Mem

top - 11:46:36 up 4 days, 23:07, 1 user, load average: 2.36, 1.47, 0.97
Tasks: 114 total, 1 running, 67 sleeping, 0 stopped, 0 zombie
%Cpu(s): 8.5 us, 3.2 sy, 0.0 ni, 33.4 id, 19.6 wa, 0.0 hi, 0.4 si, 34.9 st
KiB Mem : 4039472 total, 115956 free, 1708536 used, 2214980 buff/cache
KiB Swap: 524284 total, 523760 free, 524 used. 2074200 avail Mem

top - 11:46:39 up 4 days, 23:07, 1 user, load average: 2.49, 1.52, 0.99
Tasks: 114 total, 1 running, 67 sleeping, 0 stopped, 0 zombie
%Cpu(s): 8.1 us, 4.4 sy, 0.0 ni, 25.9 id, 25.1 wa, 0.0 hi, 0.7 si, 35.8 st
KiB Mem : 4039472 total, 115944 free, 1708304 used, 2215224 buff/cache
KiB Swap: 524284 total, 523760 free, 524 used. 2074428 avail Mem

top - 11:46:49 up 4 days, 23:08, 1 user, load average: 2.75, 1.60, 1.02
Tasks: 114 total, 2 running, 67 sleeping, 0 stopped, 0 zombie
%Cpu(s): 7.2 us, 5.2 sy, 0.0 ni, 26.5 id, 20.5 wa, 0.0 hi, 0.6 si, 39.9 st
KiB Mem : 4039472 total, 115008 free, 1708524 used, 2215940 buff/cache
KiB Swap: 524284 total, 523760 free, 524 used. 2074200 avail Mem


We logged ticket with Linode support. They migrated this virtual server to another physical server. And server performance was as expected after that.

So, what is steal time cpu metric? 
Steal time is percentage of time virtual cpu of your virtual server is waiting for real cpu of physical server when virtualization is actually busy serving somebody else. Virtualization doesn't divide cpu exactly between various virtual servers as it divides memory or some other resources. So, possibly another virtual server is consuming more cpu cycles than it's share. So, your virtual server is not getting enough of it's share. 

We checked if any other virtual server has same issue. And we found one more. Linode support migrated that server too to another physical server. Following graph shows improvement of server before and after migration to newer physical server.


Performace improvement after cpu steal is fixed

Friday, March 22, 2019

Mysql and Mysql Slave Monitoring Zabbix Template

Most of available mysql monitoring template uses one user parameter per item. To collect values for 10 items, they will run 10 mysql command (or same command 10 times and get extract desired information from result). These monitoring templates instead get complete result in one item as json content and then uses dependent items to create multiple items on zabbix server

Github links:



Installation

  1. Zabbix client (mysql slave server) host must have jq installed. If not, please install using
  2. sudo apt install jq
  3. Add following line in zabbix client configuration. Mysql credentials are stored in /etc/zabbix/.my.cnf for me. Make sure zabbix user has read access to mysql password file

    • Myql Template
      UserParameter=Mysql.Server-Status, mysql --defaults-file=/etc/zabbix/.my.cnf --defaults-group-suffix=_monitoring -N -e "show global status" | jq -c '. | split("\n")[:-1] | map (split("\t") | {(.[0]) : .[1]} ) | add ' -R -s
    • Mysql Slave Template
      UserParameter=Mysql.Slave-Status, mysql --defaults-file=/etc/zabbix/.my.cnf --defaults-group-suffix=_monitoring -e "show slave status \G" | sed -e "s/^\s*//g" | sed -e "s/:\s*/:/g" | jq -c '. | split("\n")[1:-1] | map (split(":") | {(.[0]) : .[1]} ) | add ' -R -s

  4. Import template into zabbix
  5. Apply "Mysql Slave" template to any mysql slave

Template Contents

Mysql Template: Items
Mysql Template Triggers
Mysql Tempalte Graph
Mysql Slave Template Items

Mysql Slave Tempalte Triggers

Mysql Slave Template Graph


Monday, February 13, 2017

Generic ZuulFallbackProvider for Spring Cloud Netflix Zuul Proxy



Spring Cloud Netflix provides easy way of routing using Zuul proxy. It has a feature of circuit breaker in case of heavy load. Though underlying Hystrix provides a fallback, till recently Zuul didn't provide same. Zuul implemented this functionality in Camden SR2. Documentation for using Zuul Fallback is available here.

If you have multiple Zuul routes in your application, it will be cumbersome to write multiple fallback providers. GenericZuulFallbackProvider.java below can be used to provide a generic fallback  in your application.  Then you need to provide a bean for each route returning this generic fallback after setting route (and other fields like status code, response body etc).  Example of beans are available below in ZuulProxyApplication.java.

There is pending enhancement to provide default Zuul fallback. Below code might be still useful for cases where you need to override only some fields.

Wednesday, April 6, 2016

Migrating Zabbix 2.4.6 on sqlite to Zabbix 3.0 on mysql

Earlier, we set up Zabbix 2.4.6 with sqlite to just play around with it. But soon we started using it for monitoring various servers and applications. We were obviously aware that sqlite will not scale. We occasionally noticed database lock errors on UI. Finally, we decided to set up one with mysql for production grade scalability. Now, we loved the data we already had accumulated over couple of months. So, we migrated from sqlite to mysql and at same time also upgraded the zabbix version to 3.0. Here is how we did it. 

Steps:
  1. Export Sqlite DB: That was fairly simple using sqlite .dump command. It produced sql scripts with ddl and dml statements. 
  2. Massage SQL Script: Script created above was still a sqlite script and will not work for mysql. We needed to do following: 
    • Used this sed to convert sqlite to mysql. This worked for everything except changing AUTOINCREMENT to AUTO_INCREMENT as sqlite script didn't have CREATE TABLE and AUTOINCREMNET in same line. 
    • Used sed to change 'bigint' to 'bigint unsigned' as that is used by zabbix.
      sed -e "s/bigint/bigint unsigned/g"  zabbix.mysql.sql > zabbix_final.sql
    • Divided sql script into four parts as it was huge script. First one with all ‘INSERT INTO `history`’ statements. Second with all ‘INSERT INTO `history_uint`’. Third with all Indices and finally fourth with rest of statements. I used simple grep command.
  3. Execute SQL Scripts: We executed all scripts with autocommit turned off. And committed after every script.
    • First we executed fourth script as that had all ddl scripts for tables.
    • Then simultaneously scripts with inserts to history and history_uint table were executed. 
    • And finally all indices were created. Few indices failed for me with error - "specified key was too long; max key length is 767 bytes"
  4. Install and Upgrade: Install zabbix 3.0 as documented here, except part regarding executing mysql script. Start zabbix server. This will upgrade db too. Check for errors in zabbix server logs. If there is any sql related error, fix it manually and then restart zabbix server.
  5. Upgrade / Configure agents: Upgrade agents. This is not mandatory. But good to do. If zabbix server IP is changed, corresponding changes need to be done in agent configuration. Agents need to be restarted.
  6. Finally a Duhhhhhhhh: Blob in images table were not usable after migration. So, all rows from this table were deleted. All insert statements into images table were grepped from zabbix mysql 3.0 script and executed on this database. Not a big deal unless one has added any custom images. :)



Monday, April 4, 2016

Simple JsonPath in Python


Following python method is simplified jsonpath implementation that takes json object and jsonPath as input and returns json object at that json path as output.

For json array, we should use ".[element_num]". For example, from
    {"emp" : ["Indra", "Narada", "Yama"] }
to get "Narada", json path should be "emp.[1]". Remember, count starts with zero.

For objects with names that may vary, you may use "{object_number}". For example, from    
   {"machines": { "1a2d": { "os" : "Linux", "ram" : "16GB", "uptime" : "20045"}}}
if we need to get uptime,  json path can be "machines.{0}.uptime". But please ensure sequence of element in json is always fixed.