Investigation of the elgg performance

Apache Benchmark testing and profiling

I have held a lot of benchmark tests for web-server to investigate it behavior in different cases and what I found that speed of the site depends on how many active plugins we have.

I have installed XDebug for profiling of our instance of Elgg and used a KCachegrind to represent and analyze those results.

What I got? 

https://docs.google.com/file/d/0B_2SUlnFsSbFeVphNFNvblMzdlk/edit?pli=1

I found that 20% of all processing time picks a functionality for loading plugins.

Also I realized that some of our plugins takes more time than other, and I think it because they has functionality for including other files, function for getting plugins settings inside start.php, and run every time for every page a few time.

Other suggestions and ideas after investigation of profiler's log

  1. Store plugins settings in cache. To optimize code and reduce extra requests to database I suggest to use caching for plugin's settings, cause we change them not so often and it is not a problem to run upgrade script after changing of some plugin's settings.
     
  2. Do not use include inside start.php and especially inside init function. In some plugins we have including a some libraries and some extra functionality using include function of php and some time it slows down the execution of the script couse intit functions runs each time for every pages. May be it would be better to use standart elgg_register_library function. 
     
  3. Store all handlers (action handlers, page handlers, event handlers) in cache. I think if we could store information about all allowed handlers in cache, we would not need to load all plugins every time for each page. And in this case we would saved 20% from all processing time.
     
  4. Store in cache list of existing plugins. Including of all start.php files for each plugin takes a lot of time.
     
  5. Reduce number of request to DB. For now for simple login page we have more tan 60 requests. Why we need all of them? May be we could find where it comes from and reduce their number. (Fortunately all those requests to database takes about 2% from all processing time so it's not a priority)
     
  6. Replace function elgg_get_viewtype() on predefined constant(!?). This function return value "default" for every time. So why we need to run it more than 380 times within loading a one page?

  7. Load core libraries only if they needed. In /engine/start.php file we have a list of standard libraries which loads for each page. There are a such libraries: 
    'access.php', 'actions.php', 'admin.php', 'annotations.php', 'cache.php',
    'calendar.php', 'configuration.php', 'cron.php', 'database.php',
    'entities.php', 'export.php', 'extender.php', 'filestore.php', 'group.php',
    'input.php', 'languages.php', 'location.php', 'mb_wrapper.php',
    'memcache.php', 'metadata.php', 'metastrings.php', 'navigation.php',
    'notification.php', 'objects.php', 'opendd.php', 'output.php',
    'pagehandler.php', 'pageowner.php', 'pam.php', 'plugins.php',
    'private_settings.php', 'relationships.php', 'river.php', 'sessions.php',
    'sites.php', 'statistics.php', 'system_log.php', 'tags.php',
    'user_settings.php', 'users.php', 'upgrade.php', 'views.php',
    'web_services.php', 'widgets.php', 'xml.php', 'xml-rpc.php',
    'deprecated-1.7.php', 'deprecated-1.8.php'. 

    Are we really need all of them on every page? We do not use a functions from many of those plugins in our project, for ex. xml.php, xmp-rpc.php, web_services.php etc.
     

  8. Use static file for login page. We do not need any data from database here so maybe user for login a simple html page because static page loads in 10-20 times faster.


Compare HDD and SSD

I have installed a new instance of Elgg site on SSD drive and moved mysql' data to SSD too, then I have conducted a few kind of benchmark testing, and I did not found a big differences. So I think it not priority to look in way of SSD for now.

 

Use PHP Precomplilers

I tried to use a two differend precomplilers. Fist one is APC(Alternative PHP Cache). It very easy to install and configure. Provided a performance improvement of about 2 times. Other one is XCache showed slightly less improvement in server performance.

  • apc does help a lot!

    #6:
    get viewtype = default ? x 380 ??
    needs some rewrite to determine if non-default -> then fetch.

    #7:
    there's a lot of code loading all over - when not needed;
    some samrt 'lazy-load' of code portions, parts will improve.
    e.g. when at 'index-page' - there's no need for any of the plugins' detailed levenls of code,
    their libraries, views, etc, etc. these should never be loaded *until needed -
    when user goes to a page that needs that code to actually execute.
    thsi is similar to e.g.g the language file data being all loaded -
    but not everything is needed for every page, is it ?
    where only *some elements are actually used ;)

    #8:
    static html for login ?
    a fair number of theme/plugins do bypass and define their own code for login;
    tho perhaps not always 'undoing' (preventing) the elgg core loads.

    #N:
    could be a fairly heavy workload to identify and streamline some if the code.
    someone had posted on another topic, months ago, about such on-demand code-loads.

     

  • From our experience

    #1 : can give you a good performance boost especially in sites with a lot of plugins.

    #5 : the new version of Elgg has less no of DB requests than previous. This was already discussed in another thread.

    #7 : The following core library files can be safely omitted : 

    'calendar.php',  'export.php',  'location.php', 'opendd.php',  'web_services.php' and if you are using only updated plugin two heavy deprecated libraries can also be skipped ('deprecated-1.7.php', 'deprecated-1.8.php',)

    #8 : You can create a cache file for pages and specific parts for pages. For example you can create a static page for the newest member's in the activity river with an expiry of one day. This way when we load the activity page repeated DB hits can be avoided. Or we can serve static pages for non logged in users.  

    Few of our inferences are posted here.

     

  • as far as overall 'caching' goes - an interesting maybe (but hypothetical solution b/c php needs to be compiled with shmop enabled) - would be to use shmop for a lot of the elgg's shared data; most servers, (@cheap hosting!) will not have shmop. or maybe even using gearman (but definite overkill!) to delegate/farm-out/manage elgg's 'system' tasks. i wonder.. what some experimenting might reveal.. point #7 needs someone to volunteer time to research which code modules are needed when and where, what circumstances...? so that perhaps we can build some kind of dependency table for all modules, includes and so on - for optimized code loading.

  • Some good suggestions & data here. Could you please post a 2nd screenshot after sorting the functions by time?

    Function calls are relatively cheap/fast. I would guess that total time spent on elgg_get_viewtype() is very little.

    Further optimization of elgg_get_entities is coming soon!

    We're moving rarely-need code into components that are lazy-loaded if/when needed. The lib files @TeamWegalli says you can do without are excellent candidates for this refactorization. For BC we at least need stub functions that lazy-load the component to do the work.

    Registration like action/page/hook handlers can't be cached without setting rules for when these are registered and requirements that they be the same on every request. I tend to think the opposite: the plugin API should announce what kind of request it is (which handler is executing) and let plugins register only what they'll need for the request.

    There's also some low hanging fruit like http://trac.elgg.org/ticket/4495

    I'd like to lazy-load the login form that appears when you click "Log in". It'd also be better to make that form an iframe so browser auto-complete would work everywhere.

  • Good remarks and good comments. I think most problems and ideas were mentioned already. I have one more to add.

    For some time we're investigating how we could improve performance by restructurizing plugin's architecture. I think it would be a good solution to rewrite all code that is in start.php to a plugin class, which would inherit from ElggPlugin. This class could take care of registering libraries when they're needed, loading page handlers, actions etc. This would have a few advantages, some of them being:

    • we have an option to lazy load some code (eg. action registration, handlers etc., but also register need for core libraries)
    • it's easy to find out what you can do with plugin by looking at a parent class
    • it's possible to easily overwrite parts of a plugins with inheritance
    • it streamlines how this code looks and works
Performance and Scalability

Performance and Scalability

If you've got a need for speed, this group is for you.