amazon ec2 - Optimizing Solr 4 on EC2 debian instance(s) -


my solr 4 instance slow , don't know why. attempting modify configurations of jvm, tomcat6 , solr 4 in order optimize performance, queries per second key metric. running on ec2 small tier debian squeeze, ready switch ubuntu if needed.

there nothing special use case. index small. queries include moderate number of unions (e.g. 10), plus faceting, don't think that's unusual.

my understanding these areas need tweaking:

  • configuring jvm garbage collection schedule , memory allocation ("gc tuning precise art form", ref)
  • other jvm settings
  • solr's query result cache, filter cache, document cache settings
  • solr's auto-warming settings

there number of ways monitor performance of solr:

but none of these methods indicate settings need adjusted, , there's no guide know of steps through exhaustive list of settings possibly improve performance. i've reviewed following pages (one, two, three, four), , gone through rounds of trial , error far without improvement.

questions:

  • how tell jvm use 2 gb memory on small ec2 instance?
  • how debug , optimize jvm garbage collection?
  • how know when i/o throttling, such new ebs iops pricing, issue?
  • using figures newrelic examples below, how detect problematic behavior, , how approach solutions.

answers:

  • i'm looking link documentation setting , optimizing solr 4, devops or server admin perspective (not index or application design).
  • i'm looking top trouble spots in catalina.sh, solrconfig.xml, solr.xml (other?) causes of problems.
  • or tips think address questions.

enter image description here enter image description here

first, should not focus on switching linux distribution. different distribution might bring changes considering information gave, nothing prove these changes may significant.

you mentionning lots of possibilities optimisations, can overwhelming. should consider tweaking area once have proven problem lies in particular part of stack.

jvm heap sizing

you can use parameter -mx1700m give maximum of 1.7gb of ram jvm. hotspot might not need it, don't surprised if heap capacity not reach number.

you should set minimum heap size low value, hotspot can optimise memory usage. instance, set minimal heap size @ 128mb, use -mx128m.

garbage collector

from say, have limited hardware (1-core @ 1.2ghz max, see this page)

m1 small instance

  • 1.7 gib memory
  • 1 ec2 compute unit (1 virtual core 1 ec2 compute unit)
  • ...

one ec2 compute unit provides equivalent cpu capacity of 1.0-1.2 ghz 2007 opteron or 2007 xeon processor

therefore, using low-latency gc (cms) won't good. won't able run concurrently application since have 1 core. should switch throughput gc using -xx:+useparallelgc -xx:+useparalleloldgc.

is gc problem ?

to answer question, need turn on gc logging. way see whether gc pauses responsible application response time. should turn these on -xloggc:gc.log -xx:+printgcdetails.

but don't think problem lies here.

is hardware problem ?

to answer question, need monitor resource utilization (disk i/o, network i/o, memory usage, cpu usage). have lot of tools that, including top, free, vmstat, iostat, mpstat, ifstat, ...

if find of these resources saturating, need bigger ec2 instance.

is software problem ?

in stats, document cache hit rate , filter cache hit rate healthy. however, think query result cache hit rate pretty low. implies lot of queries operations.

you should monitor query execution time. depending on value may want increase cache size or tune queries take less time.

more links

hope helps !


Comments

Popular posts from this blog

c# - DetailsView in ASP.Net - How to add another column on the side/add a control in each row? -

javascript - firefox memory leak -

Trying to import CSV file to a SQL Server database using asp.net and c# - can't find what I'm missing -