The topic of speeding up websites deployed using the Drupal content management system is a well traveled and rich literary landscape. In fact, search the internet for a variety of performance topics, both Drupal specific and more generic and you will find dozens of articles offering advice, techniques and all too often sympathy for your plight. Unfortunately, the wealth of conversation on the topic is a pretty good indication of the need to get information about making Drupal perform. Two of the most common pieces of advice involve the number of modules (low), and the complexity of the design (modest). This is very useful advice, except in those situations where it cannot apply. Sometimes stakeholders in a site, whether they be partners or clients, require complex sites with complex functionality. In such cases, what is the best strategy for employing Drupal?
The key to staying on top of this issue is to spread out, or distribute, the impact that the site has on the application, on your server(s) and bandwidth, and on the user’s browser. Each component of the system – database, web server and browser – should be taken into account and, wherever possible, functionality should be designed in such a way to minimize its impact on each page load.
Considering the above, then the first directive is an obvious one. AJAX is your friend. In a complex website, much will need to be rendered on a page, which has an impact on both the server and the user experience. However, there is no guarantee that on each page load, all of the available logic will need to be executed. Clearly, if you have a mouseover preview on a list of news items, most users will not need or want to see every preview. So, why generate the HTML, with the attendant queries, server-side processing, bandwidth drain and load time on every page load? This is especially true of pages with complex layout, which, in addition to having large style sheets, generate large amounts of HTML. The amount of HTML being sent can easily become excessive, so deferring this until a user actually clicks on something, or mouses over something, can greatly reduce the load time of a page.
The one caveat with using AJAX involves search engines. Content delivered by AJAX requires a user’s interaction before it shows up on a page and therefore it is effectively hidden from external search engines. An effective strategy for addressing this is to make sure that all AJAX content on the site is also delivered in dedicated pages somewhere on the site. For example, if you have a list of relevant blog or forum entries on the home page of your site, these can be delivered by AJAX (in a tabbed interface, for example) if there is a page in the site that lists all of the forum topics. As long as the “only” way to access the content is not AJAX based, you will be OK.
GZip/LZip HTTP compression is another simple way to enhance the speed and performance of your website. While it actually does marginally increase server load by requiring additional processing of the outgoing content stream, this load is more than offset by the reduced transmission time required to send each page to the end user. As all modern browsers are capable of handling compressed data streams, and it’s pretty simple to enable on the server side (provided you have access to do so on your server), it’s an easy win.
Next, given the built-in user system of Drupal, and that we are supposing a complexly constructed application, it is probable that your application will contain both a logged in, and a not logged in state. If you are not using a clustered web server setup, our recommendation is to split these applications on to two separate URLs that are hosted using different web servers (but use the same database). There are a couple of advantages that such a setup gives you. First, by default you will spread out the load over two different servers. Second, it is probable that your logged-in users are a smaller group of users than the anonymous users, but that they use the more complex functions in the site, like forums and blogs and comments, etc. By moving their more robust usage into their own sandbox you ensure that they have the resources they will need. Third, since your anonymous users will have no content that is dynamically generated and personally specific, you can use full page caching on the anonymous server. This provides a huge benefit when search bot/indexer traffic is factored into the overall traffic of the site (all of which will be as an anonymous user). On a moderately popular site, at any given time a dozen indexers can be on the site, and if you use a client-side site monitoring system like HBX, you may not have known about them, as they will not show up in those traffic reports – bots don’t execute JavaScript. Unfortunately, some of these bots will obey the directives in the robots.txt better than others, so having a plan to account for them is essential. Full page caching can play a big part in this. Fortunately, Drupal has a robust full-page caching mechanism right out of the box, and enabling it is a simple, easy way to enhance your site’s delivery.
Caching in general is another essential area that very dramatically makes what is essentially an unusable site, usable. In addition to the standard full page and block level caching modules available to you, Drupal also allows programmers to set their own caching on content, on a level that’s as fine-grained as desired. We used this extensively on our community content modules, caching the content of every tab, individually, then serving those tabs with AJAX. The payoff on this setup was enormous. It also allows us to set different expiration times on the different content blocks, refreshing the “most recent” content more frequently than the top or view all tabs.
If you are going to do lots of caching, you should consider using Memcache, particularly on blocks of content in high traffic areas, like your home page, which are dynamically generated but do not change much. Memcache is a RAM based cache, which is much faster than caching to static files or the database. In particular, using memcache with path caching can greatly reduce the strain on your database server. When using URL aliases Drupal can quickly generate dozens and dozens of SQL queries per page to retrieve path information – up to one per menu item in your navigation, plus a few extra, per page load. Caching that in RAM rather than calculating it every time can have a significant impact on the site. Memcache does require the installation of the memcache module, as well as memcache itself on the server, and the use of an alternate cache.inc file (which is included with the memcache module). Caching the URL alias paths requires a further hack to path.inc, but the payoff is totally worth it.
In some cases, even this might not be enough. For example, one of our modules displays content related to the current node, based on taxonomy. In our initial programming, the setup was for this module to rifle through the taxonomy and node structure, in order find, order, and return content that matched the currently viewed node, and display it in a block. The processing required to do all that brought the site to its knees, and an alternative method was devised. Instead of running that script dynamically on every page load, we elected to create a cron job, which loops through all the main node types, and outputs the generated content to a flat html file (labeled by type and node ID). We then rewrote the related content module to pick up its data from the file system. This has the advantage of only running once for each node, and it can be set to run in the middle of the night, during the site's lowest traffic period.
Another area where standard Drupal techniques do not translate well to large complex sites is with CSS aggregation. Drupal comes with a very effective CSS aggregator that examines a page’s usage of module style sheets and aggregates them into composite style sheets, one for each combination of invoked modules. In a very complex site, however, one that has dozens of modules, each with their own specific set of needs with regard to styling, this process can lead to a very large number of combinations, meaning the site will have a huge number of aggregated style sheets. This undermines the effectiveness of the web browser’s inherent caching of external files, as each combined file must be downloaded as it is encountered. In our tests, we discovered that, in cases where Drupal’s normal CSS aggregation scheme will result in numerous large style sheets, a more effective strategy is to aggregate and compress all of the style sheets into a single composite. This style sheet will be larger than any one of the normally aggregated style sheets, but there will only be one that must be downloaded, and it should only have to be downloaded once. Our benchmarking indicated that, while the very first load of the style sheet (typically on the home page) was slightly slower due to the larger file size, there was a 73% savings on CSS download time on the first page load of any other page with a different module configuration, where Drupal would normally generate and serve a different style sheet. This can be combined with some creative manipulation of the caching headers returned by the server to significantly impact the effectiveness of the cache.
Both of these options, incidentally, were substantially better than non-aggregated style sheets, so even if you don’t want to worry about creating a single aggregated file, Drupal’s native aggregation is still a very good option. It’s compression algorithm is actually quite good, and when we wrote a module to do the aggregation for the single sheet, we actually used that same code – the primary difference lies in the production of one sheet, instead of multiple. A similar benefit can be achieved with the compression and aggregation of JavaScript files. Simply by compressing our JS with the Yahoo! YUI Compressor and compiling it, we were able to show an average reduction of 42% in the download time of our JavaScript.
Finally, consider the use of a CDN to offload as much traffic as possible onto a network other than your own. There are many very affordable options for content networks and there is certainly one that can help with your site. Not only does using a CDN spread out the load of serving your site to other machines, but also to other networks, smoothing out any issues that might plague you with momentary spikes in traffic. An additional benefit to using a CDN is derived from the way browsers execute concurrent http requests. Most, if not all, browsers limit the number of requests they are willing to concurrently send to a single domain to two requests. This is a substantial bottleneck, particularly on a complex, richly designed site, which will require multiple http requests to download all the images and script and style files. While some of this is offset by the CSS aggregation (in our case, by an average of 17 requests), an additional benefit can be derived by sending http requests to two domains, which browsers will happily do. Thus, with the addition of a CDN, the amount of requests that a browser could potentially make immediately doubles.
Of all of the above techniques, deferring the execution of as much content as possible through the use of AJAX is the most dramatically effective single technique. If you move a significant portion of each pages content to AJAX you will see improvement in the weight, execution time and load time of your pages. Collectively all of the above techniques can take an extremely complex application and make it perform well for your users. Both your servers and your users will thank you.
Tags: Transforming Technology, Patient Strategy, Product Strategy, Global Strategy
Anonymous - December 1, 2008, 3:18 pm
This is a test
Anonymous - December 1, 2008, 3:32 pm
This is another test, with a name...