08 May 2014

Investigation Guideline

Logstash+ElasticSearch+Kibana (LEK) consists a popular and versatile log collecting and searching platform. During the investigation, I try to address below problems.

  • Design production-ready solution
    • HA solution for each component
    • Clustering, sharding and scaling
    • Security, authentication & authorization (AA), encryption
    • Performance benchmarking
    • Small scale POC deployment
  • Find integration possibilities with Openstack
    • Swift be used to store log or index?
    • Keystone be used for AA?
    • Ceilometer be for log collecting and storing?
    • Meniscus: Openstack logging as a Service (LAAS)?
  • Care for business integration needs
    • Per tenant AA. View log/query by tenant permission
    • Multi-tenant support
      • Service * Server Group (Server Type) * Per Server * Log Type
    • Must we install an agent on each production server?

Architecture

LEK are lightweight and relatively free to architect. Various materials provide ways to do it.

My designed LEK architecture picture

Some materials to start learning LEK

Shipper Comparasion

Logstash provides many shippers to choose from. Commonly suggested shippers and comparasion are listed blow.

Not only can you use Logstash related shipper. Logstash accepts input from syslog/rsyslog/collectd/log4j, * This means you may not need to install a shipper on each server. Use syslog/log4j … * For what input types are supported
[http://logstash.net/docs/1.4.1/]

High Availability

Basics.

What if Logstash shipper crashed?

  • Can use monitd to watch it and restart.
  • Most shipper remember last position in log file and resume on restart. See above shipper comparasion section.
    [https://groups.google.com/forum/#!topic/logstash-users/Kqd8Wb5y-V8]
    • But there is edge condition. If shipper crashed, then log rotated, then shipper restarted. This may lose log entries.
    • Shipper has queue inside. Crashing may lose the queue. How shipper handle queue loss may result in log lose or send-twice.

What if Logstash shipper, or one of Logstash, get stucking forever?

Monitoring Logstash itself?

What if Logstash receiver, if we used it, crashed?

  • Monitd to watch it restart
  • Shipper can use load balancing and failover to send to other Logstash receiver. See above shipper comparasion section.

What if the Redis, acting as the queue, crashed?

What if Logstash indexer crashed?

  • Monitd to watch it and restart.
  • Log entries are queued in Redis, won’t lose.

What if ElasticSearch crashed?

Performance & Scaling

Logstash is written in java and ruby, will there be performance problems?

But, what I have not mentioned above is that there are still a lot that use Logstash to handle greate load in production.

  • To see it, search the Logstash mail archive or see below Practice section.
    [https://groups.google.com/forum/#!forum/logstash-users]
  • Even if Logstash was slow, you can scale it. Usually the bottle neck is ElasticSearch.
  • My suggestion is, Logstash is good but replace shipper to, e.g. Lumberjack.

Redis scaling solution.

Scaling ElasticSearch

Performance benchmarks.

Performance tuning tips.

Security

Protect log transimission

Kibana authentication & authorization

ElastiSearch authentication & authorization

Data encryption.

Others & Tutorials

Maintainability

ElasticSearch monitoring.

Logstash monitoring.

  • see section “Monitoring Logstash itself?”

Deployment automation.

LEK Practices

See how other people are using LEK log searching platform.

Openstack Integration

Swift to store log or index?

Ceilometer for log collecting and storing?

Figuring out how to integrate Openstack is still a challenge now.

Business Integration

Authentication & Authorization, with each user can only view log in his/her tenant/service group.

  • Keystone and add proxy to Kinaba

Must we install an agent for each production server?

  • Logstash can use default log shipper such as syslog, log4j. In this case we don’t need to add new shipper to servers.
  • If we need to install Lumberjack, for performance needs, then we need to install it on each server.

Multi-tenant support.

There may need a unified log format.

  • My draft log format
    • Identity part
      • service name, group, component, code location,
      • server, ip, process
      • level, timestamp,
      • session id, (request id)
    • Content part
      • descriptive words
      • stack trace
    • Tenant info is taken as service name or group. User can only access server’s log if has permission.


Create an Issue or comment below