"Enterprise-Level Drupal," Jason Burnett: Drupal Camp Twin Cities 2011

I have worked with large sites that have become completely bogged down. The next time I launch a large site with Drupal, I want to make sure that does not happen. So to learn how, I attended a session on “Better, Faster, Stronger: Designing Large / Enterprise-Level Drupal Environments” by Jason Burnett at Drupal Camp Twin Cities 2011. Jason is director of infrastructure for NeoSpire, a hosting company.

Here is the session description.

Power and performance are key deliverables for any enterprise or large organizational website and Drupal is no exception. The goal of this session is to help you understand what a highly available environment consists of so that you can avoid the common mistakes and pitfalls when designing a highly available and scalable architecture to support Drupal.

My notes follow after the break.

First things first

Prerequisites that you’ll need (or at least want).

Good reliable network with plenty of capacity
At least one good system administrator

Don’t necessarily need a data center.

Stacks

By default, we all use:

Apache
PHP
Drupal
MySQL

To handle unexpected traffic, from a tweet, for example, you need to add in other elements to that stack.

Pressflow

Pressflow is a drop-in replacement for Drupal 6.x. It improves database performance and reverse proxy caching and MySQL and PHP 5 performance.

You can still use all your modules and themes, it just works faster. Pressflow is updated pretty quickly after a security update comes out.

Available through Four Kitchens

Pretty much all the patches of Drupal 6 in Pressflow have been incorporated into Drupal 7.

Varnish

Varnish is a reverse proxy cache. It caches content based on HTTP headers. Memory is always faster than disk, and it uses kernel-based virtual memory.

Does not work for authenticated users: this is for anonymous users. Authenticated traffic bypasses Varnish.

Need to tell Apache to listen to port 8080 instead of 80.

The Apache logs will always show the Varnish IP address, which will reduce the information in your logs for analytics, although this can be modified so the IP can be forwarded to Apache for the logs.

Cookies need to be eliminated, so customization has to be done to allow Google Analytics to work.

Varnish can help with basic security: for example, you can turn off outside access to run cron.

Varnish is so much more faster than Apache that you have to tell the server to allow more file handles.

Apache

Apache needs to be tuned to match your hardware. Setting MaxClients too high is asking for trouble. Every application is different. Take the total memory dedicated to Apache and divide it by 40 to determine the MaxClients.

APC (Alternative PHP Cache)

APC is an Opcode cache. Rather than running Drupal as code that is interpreted when someone accesses the site, APC compiles the code beforehand to reduce parsing and compiling, which reduces load on memory and CPU.

Drupal 7 should run on APC because of how intensive it is for performance. Only a five-minute install. Make sure to use the Debian version.

Very minimal configuration is needed: you need to allocate necessary memory to APC, and that is pretty much it.

Running APC simply improves performance, it does not prevent you from doing anything with Drupal. It works for both authenticated and anonymous users.

You can see performance increases of 35–40%.

Memcached

This is a distributed memory object caching system. It reduces the load on the database. It create a simple key/value datastore.

It caches all of your tables in memory in bin. All of your reads go to memory, all of your inserts and deletes and updates still go to the database.

Helps if a lot of people are hitting the site at the same time.

Can reduce database stress by 30%: which can increase capacity when there is high traffic.

Need to use the Memcache module in Drupal. There are only certain Drupal tables which work with Memcache.

Solr

Better than native Drupal search. It is built on a standard application server: you can decide what J2EE server to use.

New stack

Varnish
Apache
APC
PHP
Pressflow
Memcached
Solr
MySQL

Can other stacks work too?

Yes, and there are arguments that can be made both for and against. This has been used in production and feels the most stable and enterprise-ready. Always looking at new options, like the Comanche web server.

Check out getpantheon.com for a pre-configured installation using the same software stack. This can then be used on the Amazon S3 cloud, for example.

Multiple servers

The beauty of this stack is that it can be installed entirely on a single server or split it up on separate servers.

First step is to split off Memcached, Solr and MySQL onto a database server.

Then you can add in load balancers, so there are multiple web app servers with the same files which handle greater capacity, using the same database.

NFS allows multiple web servers seamlessly serve the same content. User uploaded content is instantly available to all web servers. Any code changes only need to be made in one location.

Syslog

Drupal can log to syslog reducing load on the database. Sending logs to a central location allows for easy review, even when there are multiple servers, those can all be sent to one syslog server.

Multiple datase servers: MySQL circular replication

Circular replication is the method by which we synchronize data. There are two IP addresses (master and slave). Heartbeat is used to automatically failover the addresses when necessary.

NFS and SOLR using DRBD

Data synchronization handled with DRBD, which stands for distributed replicated block device. Essentially RAID1 over the network. Only one server is able to access the data at a time, which is why we have IP management. IP management is handled by Heartbeat automatically.

All together now: one example

Load balancers
Multiple web servers
Multiple NFS servers
Multiple MySQL servers
Multiple Solr servers

Each of these has 24 gigs of RAM.

Other tools

Drush
Monitoring
- Availability (Panopta is just $10 per month for up to 10 IP addresses: actual HTTP requests, not just pings)
- Core updates
- Module updates
- Munin (server health: identify what is causing a problem); Zabbix does the same, but also can do alerts based on threshholds: both are open source
CDN: content delivery network
Akamai is an example, Limelight is less expensive. They spider a site frequently, then distribute files around the world, so that they are closer to the people requesting the page.

Lessons learned

Conntrack tables: Disable all the IPTables connection tracking modules unless you need them: they have a limit
NTP: Network time protocol: Time synchronization with atomic clocks for your servers. Extremely important to keep within a certain threshold. Very important if system utilizes Heartbeat
Load testing: Load test your solutions and make sure you can achieve your goal. Jmeter is an example.

My take

I am not a server guy, but this was explained very, very well in a way that I could understand. I came away from this with a lot of ideas about how to improve performance for Drupal sites. I had tried to figure this out by reading online articles, but found it pretty confusing. Now I have a clear example of one way to combine a number of options to improve performance. One of the best sessions I attended at Drupal Camp.