Logstash Elasticsearch Kibana Investigation
Investigation Guideline
Logstash+ElasticSearch+Kibana (LEK) consists a popular and versatile log collecting and searching platform. During the investigation, I try to address below problems.
- Design production-ready solution
- HA solution for each component
- Clustering, sharding and scaling
- Security, authentication & authorization (AA), encryption
- Performance benchmarking
- Small scale POC deployment
- Find integration possibilities with Openstack
- Swift be used to store log or index?
- Keystone be used for AA?
- Ceilometer be for log collecting and storing?
- Meniscus: Openstack logging as a Service (LAAS)?
- Care for business integration needs
- Per tenant AA. View log/query by tenant permission
- Multi-tenant support
- Service * Server Group (Server Type) * Per Server * Log Type
- Must we install an agent on each production server?
Architecture
LEK are lightweight and relatively free to architect. Various materials provide ways to do it.
- Basic Architectures
- Shipper -> (Logstash Receiver ->) Redis -> Logstash Indexer -> Elasticsearch (cluster)
- Shipper: run on production server. Read local log, send to remote. Can use Logstash, Lumberjack, beaver and so on.
[http://cookbook.logstash.net/recipes/log-shippers/] - Logstash Receiver: receive log from shipper and put into Redis. Most shipper can send directly to Redis, except Lumberjack.
- Redis: acts as the queue. This role is also called broker. You can also use AMPQ.
- Logstash Indexer: transform log format. Called ‘indexer’ but actually elasticsearch is who does indexing.
- How Logstash transform log - grok - structure decompositable regex
[http://semicomplete.com/presentations/logstash-puppetconf-2012/#/38] - ElasticSearch: indexing and search. Mature for HA and scaling cluster.
- Shipper: run on production server. Read local log, send to remote. Can use Logstash, Lumberjack, beaver and so on.
- Offical’s very basic guide
[http://logstash.net/docs/1.1.0/tutorials/getting-started-centralized]- Logstash supports all kinds of input/filter/output.
- Logstash support input from rsyslog/collectd/log4j. You can use server’s default logging.
[http://logstash.net/docs/1.4.1/]
- Collect & visualize your logs with Logstash, Elasticsearch & Redis
[http://michael.bouvy.net/blog/en/2013/11/19/collect-visualize-your-logs-logstash-elasticsearch-redis-kibana/]- Nice architecture picture
- Suggested to use Lumberjack shipper
- Centralizing Logs with Lumberjack, Logstash, and Elasticsearch
[http://www.vmdoh.com/blog/centralizing-logs-lumberjack-logstash-and-elasticsearch]- Shipping with Lumberjack (logstash-forwarder)
- Another shipper option Rsyslog
- Centralized logging system based on Logstash-forwarder+Logstash+RabbitMQ+ElasticSearch+Kibana
[http://jakege.blogspot.com/2014/04/centralized-logging-system-based-on.html]- Use RabbitMQ rather than Redis for queuing
- Pull from Lumberjack port 5000
- Shipper -> (Logstash Receiver ->) Redis -> Logstash Indexer -> Elasticsearch (cluster)
- A mail thread that has production Logstash setup to refer to
[https://groups.google.com/forum/#!topic/logstash-users/-392l9LHa8Q]- Provided with detail config file
- 2 Redis server in failover mode
- 5 Logstash indexers
- Config exists ERRORs. Read through the thread to know it.
- LogStach HA Cluster Cookbook
[http://www.masteringthecloud.com/2014/01/logstash-cluster-cookbook.html]- HAProxy load balancing and fail-over
- 3 Redis to do load balance
- Issue: tied Redis, Logstash and Elasticsearch on one server.
- Problem: Fail of one Redis server may lose log entries in it.
- My designed architecture
- Use Lumberjack as log shipper. Reason see below shipper comparasion.
- Use Lumberjack’s load balancing and fail-over. So no dedicated load balancer device needed.
- Multiple Logstash to demo scaling of it.
- Redis acts as queue. Use master-slave mode for HA. Multiple Redis master-slave group to distributed load.
- Elasticsearch cluster to store and index log and do searching. Kibana as front-end web server.
- Other concerns
- In above architecture, we lack a persistent log storage location. Elasticsearch cares for searching but not recommended as permanent log storage.
- Hadoop/HDFS is usually recommended for log storing.
[http://blog.mgm-tp.com/2010/04/hadoop-log-management-part2/] - We lack AA in above architecture. May need to add a proxy server on top of Kibana.
- Meniscus - Openstack logging as a service. This is another log searching framework, using Elasticsarch and Kibana, but discarded Logstash.
[http://developer.rackspace.com/blog/project-meniscus-an-update.html]- Can we use/borrow from that?
Some materials to start learning LEK
- How to set up tutorial and some tips.
[http://www.cnblogs.com/buzzlight/p/logstash_elasticsearch_kibana_log.html] - Logstash, ElasticSearch, Kibana Intro
[https://speakerdeck.com/elasticsearch/using-elasticsearch-logstash-and-kibana-to-create-realtime-dashboards] - A comprehensive bibliography for ElasticSearch
[http://blog.csdn.net/gaoyingju/article/details/23750563#1536434-tsina-1-19369-66a1f5d8f89e9ad52626f6f40fdeadaa]
Shipper Comparasion
Logstash provides many shippers to choose from. Commonly suggested shippers and comparasion are listed blow.
- Logstash-Forwarder (Lumberjack) [Recommended]
- [good] Written in go, the fastest shipper, consume little resource
[http://www.vmdoh.com/blog/centralizing-logs-lumberjack-logstash-and-elasticsearch] - [good] Compressed transmitting
[https://github.com/elasticsearch/logstash-forwarder] - [good] OpenSSL to auth and encrypt transmission
[https://github.com/elasticsearch/logstash-forwarder] - [bad] Cannot send directly to Redis. Must use extra logstash to receive its output.
[https://github.com/elasticsearch/logstash-forwarder/issues/18] - [good] Output (push) to logstash, with load balancing and fail-over
[https://github.com/elasticsearch/logstash-forwarder] - [good] After restarted, will resume to last position in log file. Won’t start-over.
[https://groups.google.com/forum/#!topic/logstash-users/Kqd8Wb5y-V8]
[https://github.com/elasticsearch/logstash-forwarder/blob/master/prospector.go]
- [good] Written in go, the fastest shipper, consume little resource
- Logstash Shipper
- [bad] Written in java, may consume too many resource.
[http://www.vmdoh.com/blog/centralizing-logs-lumberjack-logstash-and-elasticsearch] - [good] Can round-robin output to a list of Redis hosts, for load balancing and fail-over.
[http://serverfault.com/questions/459303/scaling-logstash-with-redis-elasticsearch]
[https://groups.google.com/forum/#!searchin/logstash-users/availability/logstash-users/8Km9VFqapig/w9WEaN2K3E8J] - [good] After restarted, will resume to last position in log file. Won’t start-over.
[https://groups.google.com/forum/#!topic/logstash-users/Kqd8Wb5y-V8]
- [bad] Written in java, may consume too many resource.
- Beaver Shipper
- Written in python
- [good] SSH tunneling to secure transmission.
[http://beaver.readthedocs.org/en/latest/user/usage.html#ssh-tunneling-support] - [good] Use push model. Push to Redis directly.
[https://github.com/josegonzalez/beaver/blob/master/docs/user/usage.rst] - [bad] Push model but seems cannot output a list of Redis hosts. So no load balancing and fail-over.
[https://github.com/josegonzalez/beaver/blob/master/docs/user/usage.rst]- Modify the code to add round-robin feature. Or
- Add our own load balancing device to enhance this
- [bad] After restarted, will beaver resume to last position in log file? Won’t it start-over? Seems not implemented.
[https://groups.google.com/forum/#!topic/logstash-users/Kqd8Wb5y-V8]
[https://github.com/josegonzalez/beaver/issues/6]
Not only can you use Logstash related shipper. Logstash accepts input from syslog/rsyslog/collectd/log4j,
* This means you may not need to install a shipper on each server. Use syslog/log4j …
* For what input types are supported
[http://logstash.net/docs/1.4.1/]
High Availability
Basics.
- Logstash pipeline and blocking.
[http://logstash.net/docs/1.4.0/life-of-an-event]- Logstash queue log entry in it, called pipeline.
- If an output is failing, the output thread will wait until this output is healthy again. So won’t lose data.
[https://groups.google.com/forum/#!topic/logstash-users/jwGHb00KfT8] - A full queue in pipeline will cause blocking. Thus finally blocks input end.
- Most shipper remember last position in log file and resume on restart. See above shipper comparasion section.
[https://groups.google.com/forum/#!topic/logstash-users/Kqd8Wb5y-V8]
What if Logstash shipper crashed?
- Can use monitd to watch it and restart.
- Most shipper remember last position in log file and resume on restart. See above shipper comparasion section.
[https://groups.google.com/forum/#!topic/logstash-users/Kqd8Wb5y-V8]- But there is edge condition. If shipper crashed, then log rotated, then shipper restarted. This may lose log entries.
- Shipper has queue inside. Crashing may lose the queue. How shipper handle queue loss may result in log lose or send-twice.
What if Logstash shipper, or one of Logstash, get stucking forever?
- May need to combined with “Monitoring Logstash itself”
- Suggesting check timestamp of last document to know ‘loggign’
[https://groups.google.com/forum/#!topic/logstash-users/jwGHb00KfT8]
Monitoring Logstash itself?
- Few material relates to this issue
- A script for monitoring and use document’s timestamp to know ‘lagging’
[https://groups.google.com/forum/#!topic/logstash-users/Z9WR7CJ0KRw] - Send heatbeat and metric of Logstash via JMX.
[https://groups.google.com/forum/#!topic/logstash-users/nBbQ-jXfjgI]- It’s only prototype and murders performance
What if Logstash receiver, if we used it, crashed?
- Monitd to watch it restart
- Shipper can use load balancing and failover to send to other Logstash receiver. See above shipper comparasion section.
What if the Redis, acting as the queue, crashed?
- Use multiple Redis, event no master-slave replication, can ensure log keep flowing. But log entries in queue may lose.
[http://www.masteringthecloud.com/2014/01/logstash-cluster-cookbook.html] - Use RabbitMQ instead of Reddis, as the queue. But many complains RabbitMQ is slow and awful to use.
[https://groups.google.com/forum/#!topic/logstash-users/aSAlAHmyuT8]
[https://groups.google.com/forum/#!topic/logstash-users/lvuG7UGZwVU]
[https://twitter.com/jordansissel/status/302294195945738240] - Redis HA solutions
- Redis Sentinel [Recommended]
[http://redis.io/topics/sentinel]- Master slave replication. Monitoring and auto failover.
- If master crashed before replicates to slave, may sill lose data
- Redis Cluster
[http://redis.io/topics/cluster-tutorial]- Shipped in Redis 3.0.0, still beta.
- cluster = ( 1 master + n slave (replica) ) * m (hash sharding)
- Master slave replication. Monitoring and auto failover.
- Hash sharding to do load balancing.
- If master crashed before replicates to slave, may sill lose data
- Other earlier and simpler solutions
[http://afei2.sinaapp.com/?p=360]
- Redis Sentinel [Recommended]
What if Logstash indexer crashed?
- Monitd to watch it and restart.
- Log entries are queued in Redis, won’t lose.
What if ElasticSearch crashed?
- ElasticSearch has mature clusting, HA and recovery mechanism.
[http://spinscale.github.io/elasticsearch/2012-03-jugm.html#/8]
[http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_coping_with_failure.html]
Performance & Scaling
Logstash is written in java and ruby, will there be performance problems?
- Many suggest Logstash jvm consume too much server resource. Use Lumberjack as shipper instead.
[http://michael.bouvy.net/blog/en/2013/11/19/collect-visualize-your-logs-logstash-elasticsearch-redis-kibana/]
[http://michael.bouvy.net/blog/en/2013/12/06/use-lumberjack-logstash-forwarder-to-forward-logs-logstash/] - Openstack Logging as a Service - Meniscus, doesn’t use LogStash
[http://developer.rackspace.com/blog/project-meniscus-an-update.html]-
- “Internal testing by our operations team found that it didn’t handle certain loads”
-
- “Second, because the Project Meniscus team has a goal of producing a solution that can handle massive amounts of data (two terabytes per day) and be a project that OpenStack can use, “
-
- Issues that complaining about performance
[https://logstash.jira.com/browse/LOGSTASH-1771] - More Issues that complaining about performance
[https://logstash.jira.com/browse/LOGSTASH-480?jql=text%20~%20%22performance%22]
But, what I have not mentioned above is that there are still a lot that use Logstash to handle greate load in production.
- To see it, search the Logstash mail archive or see below Practice section.
[https://groups.google.com/forum/#!forum/logstash-users] - Even if Logstash was slow, you can scale it. Usually the bottle neck is ElasticSearch.
- My suggestion is, Logstash is good but replace shipper to, e.g. Lumberjack.
Redis scaling solution.
- Redis Cluster. It has been mentioned above.
- Use multiple Redis server for load balancing. You can add a load balancer device before Redis:
[http://www.masteringthecloud.com/2014/01/logstash-cluster-cookbook.html] Or use Logstash shipper’s load balancing feature (see shipper comparasion section):
[http://serverfault.com/questions/459303/scaling-logstash-with-redis-elasticsearch] - You can use scaling with master-slave replication together for HA.
Scaling ElasticSearch
- ElasticSearch is designed distributed and super easy to scale.
- It can evey do auto node discover.
[http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-zen.html] -
- “After 17 years in this industry I’ve never seen anything scale horizontally as easy as ElasticSearch”
[http://serverfault.com/questions/459303/scaling-logstash-with-redis-elasticsearch]
- “After 17 years in this industry I’ve never seen anything scale horizontally as easy as ElasticSearch”
Performance benchmarks.
- Logstash 1.1.x transport performance
[https://twitter.com/jordansissel/status/302294195945738240]- Used Logstash 1.1.x, but now it is 1.4.x
- Redis, in batch mode, yields very good results.
- RabbitMQ is slow.
- Lumberjack is really fast.
- Logstash performance testing
[https://gist.github.com/paulczar/4513552] - Elasticsearch+logstash perf exploration
[https://github.com/jordansissel/experiments/tree/master/elasticsearch/perf#readme] - ElasticSearch - How many shards?
[http://blog.trifork.com/2014/01/07/elasticsearch-how-many-shards/]- From graph: it is strange that, with more shards, elastisearch gets slower
- Shards are not free, they add overheads.
Performance tuning tips.
- Use Lumberjack as log shipper, it is really fast, and has a lot of features.
- Change Logstash thread count. On default it may have only one thread.
[https://logstash.jira.com/browse/LOGSTASH-480] - Tips in a LEK setup tutorial
[http://www.cnblogs.com/buzzlight/p/logstash_elasticsearch_kibana_log.html] - Tips in an ElasticSearch material summary
[http://blog.csdn.net/gaoyingju/article/details/23750563#1536434-tsina-1-19369-66a1f5d8f89e9ad52626f6f40fdeadaa] - ElasticSearch and Logstash Tuning
[http://jablonskis.org/2013/elasticsearch-and-logstash-tuning/index.html]
Security
Protect log transimission
- Lumberjack log shipper supports OpenSSL authentication and encrpytion
[https://github.com/elasticsearch/logstash-forwarder] - Beaver log shipper supports ssh tunneling
[http://beaver.readthedocs.org/en/latest/user/usage.html#ssh-tunneling-support]
Kibana authentication & authorization
- Use authenticatioin for Kibana3 setup
[https://groups.google.com/forum/#!topic/logstash-users/XeDfZcVRdsA] - One approach is to authenticate by web server (nginx, apache)
[https://github.com/elasticsearch/kibana/blob/master/sample/nginx.conf] - Another approach is to use code addons
- Add authentication to kibana3 and allow users to view only thier logs.
[https://github.com/christian-marie/kibana3_auth] - Another thread discussing about this
[http://stackoverflow.com/questions/19867663/how-and-where-to-implement-basic-authentication-in-kibana-3]
- Add authentication to kibana3 and allow users to view only thier logs.
- The third approach is to use proxy to hide Kinaba
- Kibana-authentication-proxy
[https://github.com/fangli/kibana-authentication-proxy]
- Kibana-authentication-proxy
ElastiSearch authentication & authorization
- The basic idea is - “After a number of discussions on the ElasticSearch mailing list, I’ve discovered that the current solution is to host ElasticSearch behind another application layer and then to secure that layer.”
[http://stackoverflow.com/questions/4960298/how-to-secure-an-internet-facing-elastic-search-implementation-in-a-shared-hosti]- So, use Kibana or a proxy
- Use a proxy
[http://stackoverflow.com/questions/9956062/authentication-in-elasticsearch] - Another approach, replace embedded http server to jetty, so that jetty can use SSL and authentication.
[http://stackoverflow.com/questions/4960298/how-to-secure-an-internet-facing-elastic-search-implementation-in-a-shared-hosti] - Yet another approach, use 3rd pary plugin. E.g. Elasticsearch-security-plugin
[http://stackoverflow.com/questions/9956062/authentication-in-elasticsearch]
[https://github.com/salyh/elasticsearch-security-plugin] - But the common way, authenticate through Kibana
Data encryption.
- Encrypting ElasticSearch index
- Mail threads discussing about it
[http://elasticsearch-users.115913.n3.nabble.com/Enabling-encrypted-indexes-td4035771.html] - No feature in ElasticSearch. You may use full disk encryption.
- ElasticSearch also transmits data unencrypted over the wire between nodes.
- “Encrypting every piece of data within an ES cluster is way too expensive.”
- Mail threads discussing about it
Others & Tutorials
- A More Secure LogStash Install
[http://sphughes.com/2012/01/01/a-more-secure-logstash-install/]- Focusing on how to securily install them, rather than AA or encryption.
- Securing Elasticsearch / Kibana with nginx
[http://www.ragingcomputer.com/2014/02/securing-elasticsearch-kibana-with-nginx] - Securing Your Elasticsearch Cluster - A Brief Overview of Running Elasticsearch Securely
[https://www.found.no/foundation/elasticsearch-security/]
Maintainability
ElasticSearch monitoring.
- ElasticSearch has a lot of monitoring plugins
[http://www.elasticsearch.org/guide/en/elasticsearch/client/community/current/health.html] - ElasticHQ is one the most recommended
[http://www.elastichq.org]
[http://www.cnblogs.com/buzzlight/p/logstash_elasticsearch_kibana_log.html]- License: Apache 2.0
- features: monitoring and operate, indices management, no install & run in browser
[http://www.elastichq.org/features.html]
Logstash monitoring.
- see section “Monitoring Logstash itself?”
Deployment automation.
- LogStash puppet
[http://cookbook.logstash.net/recipes/puppet-modules/]
LEK Practices
See how other people are using LEK log searching platform.
- Using elasticsearch and logstash to serve billions of searchable events for customers
[http://www.elasticsearch.org/blog/using-elasticsearch-and-logstash-to-serve-billions-of-searchable-events-for-customers/]- Billions, but for mail
- Production env configurations
[https://groups.google.com/forum/#!topic/logstash-users/Yj397MdAD74] - Enterprise logstash/broker/kibana3/ setup
[https://groups.google.com/forum/#!topic/logstash-users/p-rEw6XpucM] - Logstash best practices and configuration for load distribution
[https://groups.google.com/forum/#!topic/logstash-users/hHXJz1upk9E] - Logstash configuration best practices
[http://stackoverflow.com/questions/22257956/logstash-configuration-best-practices] - Our Experience of Creating Large Scale Log Search System Using ElasticSearch
[http://architects.dzone.com/articles/our-experience-creating-large] - Rsyslog + Logstash in failover scenario
[https://groups.google.com/forum/#!topic/logstash-users/P8mTZaFGGqE] - How is the scalability of logstash
[https://groups.google.com/forum/#!topic/logstash-users/k9QxzDl7_6I] - Implementation with High Availability/Load Balancing
[https://groups.google.com/forum/#!topic/logstash-users/aSAlAHmyuT8] - Problems scaling logstash redis and elasticsearch for 3-4 million log events per hour
[https://groups.google.com/forum/#!topic/logstash-users/-392l9LHa8Q]
Openstack Integration
Swift to store log or index?
- If used as queue, Swift seems not appropriate
- If used to store index, ElastiSearch only supports filesystem/memory to store index
[http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-store.html] - If used as permanent log storage, Haddop/HDFS is somehow more popular.
Ceilometer for log collecting and storing?
- Its data is organized for metric and samples, seems not the most appropriate
- Ceilometer use MongoDB. Is it suite for queuing log (Reddis?) or storing log (Hadoop)?
- Can collect from collectd, but what about syslog or log4j? If we use Logstash to ship log, then what becomes better?
- A Blueprint: using ceilometer as log storage + elasticsearch
[https://blueprints.launchpad.net/ceilometer/+spec/elasticsearch-driver]- More info about this BP (mail threads)
[https://www.mail-archive.com/openstack-dev@lists.openstack.org/msg04565.html]
[http://lists.openstack.org/pipermail/openstack-dev/2013-September/015657.html] - Meniscus also mentioned a design goal that: “Provide common sinks for already existing systems such as Ceilometer”
[https://github.com/ProjectMeniscus/meniscus] - In above mail threads Ceilometer event/notification system is also mentioned.
[http://docs.openstack.org/developer/ceilometer/events.html] - However, the BP and mail threads seem inactive nowadays.
- More info about this BP (mail threads)
Figuring out how to integrate Openstack is still a challenge now.
Business Integration
Authentication & Authorization, with each user can only view log in his/her tenant/service group.
- Keystone and add proxy to Kinaba
Must we install an agent for each production server?
- Logstash can use default log shipper such as syslog, log4j. In this case we don’t need to add new shipper to servers.
- If we need to install Lumberjack, for performance needs, then we need to install it on each server.
Multi-tenant support.
- Logstash multi-tenant support discussion on maillist.
[https://groups.google.com/forum/#!topic/logstash-users/qiptMyaMqWs]- Basically, we add tag in log entry to separate them.
- Each server installs a Logstash, where tag is acquired.
- Use puppet for logstash’s conf on each server.
- Meniscus decides to implement multi-tenant support. We can borrow from it.
[https://github.com/ProjectMeniscus/meniscus/wiki/Tenant-Identification-Flow]
[https://github.com/ProjectMeniscus/meniscus/wiki/Tenant-API]
[https://github.com/ProjectMeniscus/meniscus/wiki/Research-on-Elasticsearch-Templates]
There may need a unified log format.
- My draft log format
- Identity part
- service name, group, component, code location,
- server, ip, process
- level, timestamp,
- session id, (request id)
- Content part
- descriptive words
- stack trace
- Tenant info is taken as service name or group. User can only access server’s log if has permission.
- Identity part
Create an Issue or comment below