Filesystem Tips and Tricks



Last updated on March 4th 2001


Contents

Introduction
General Filesystem Layout
The "noatime" Option
The "logging" Option
The "forcedirectio" Option
Important Links

Introduction

There are a few tricks you can use under Solaris to gain a little extra performance from your filesystems and also increase their data reliability. This article discusses the example of setting up a filesystem for a Webserver along with some useful mount options inherent with Solaris.

The "noatime" and "logging" options discussed below are a great new feature of Solaris from version 7 on, including Solaris 8. If you haven't upgraded yet - this might be one more reason for you to do so!

General Filesystem Layout

When designing your filesystem, pay attention to what role it will play for your particular application. Depending on your needs, you can arrange physical disks and their formatting in such as way as to tune them for the utmost performance.

If you're running a Webserver, for example - it would benefit performance to have an entire drive to dedicate to Website storage. You might configure it with both the "noatime" and "logging" options mentioned below along with maybe a "nosuid" option as well as long as you don't have any suid CGI scripts in your site. This would offload requests to a separate drive from your system drive and provide a second point of failure by splitting responsibilities. The mount options tune the drive for further speed.

If you're on an even larger scale and want to beef up your Webserver, you might want to consider a software - or better yet, hardware RAID as the storage device. A Webserver is mostly a read-oriented scenario, where you have a lot of disk reads and not many writes and should choose a RAID option accordingly. A RAID 5 setup is normally used, as it provides a high read transaction rate and provides redundancy in case of a drive failure. A failure will have an impact on performance, though.

One thing you should watch is that you do not mirror across partitions - rather you should mirror across drives and even controllers if possible. Otherwise, you'll seriously degrade your performance since you've now doubled your seeks.

When it comes to partitioning the drive, consider the layout of the Website(s) on it, the Webserver software you'll be using and server robustness/security. The simplest approach is to use one whole filesystem (i.e. the entire drive) for your Webserver filesystem. The benefit is that this is probably the most flexible way to do it.

You should carefully consider an alternative method to partitioning your Webserver filesystem. Break the drive into separate filesystems along the lines of:

/content  (logging,noatime,nosuid,ro)
/logs     (logging,rw)

This method ensures that if your server log files consume all the space available to them, you won't affect the Website content area. If you have things that do their own writes in your content area (like forums, guestbooks, etc.) then your users won't be affected. You will lose logging data though... But with good scripting, you can rotate your logs regularly and take them offline for backup or analysis. Another benefit is that you can modify the mount options for each of these partitions as needed, making the /content partition read-only for security reasons. This way, if your server is compromised, you have a little extra protection against someone modifying your Website content. Of course, if someone gets root on your box, then there's not much you can do, but short of that - this will keep most people at bay.

Lastly, if you really want to get into the thick of things, you can tweak the ufs filesytem "highwater" and "lowwater" marks with the "ufs_HW" and "ufs_LW" options in /etc/system. This is also known as the "ufs write throttle." It's tuning is beyond the current scope of this article, but should you be interested in this level of hardware tuning, I would suggest that you read the Sun Performance and Tuning book (this specifically is covered on pages 172-173.

The "noatime" Option

The "noatime" mount option is useful not just for Usenet news spools. It's also good for running your Webserver, since it's mostly a read-only operation. By eliminating the need to update access times each time a file is accessed, you reduce filesystem activity. It's a small piece of the performance puzzle, but it all adds up. Here's another 5-10 horsepower.

From the mount_ufs man page, we see this explanation:

"By default, the file system is mounted with normal access time (atime) recording. If noatime is specified, the file system will ignore access time updates on files, except when they coincide with updates to the ctime or mtime. See stat(2). This option reduces disk activity on file systems where access times are unimportant (for example, a Usenet news spool). noatime turns off access time recording regardless of dfratime or nodfratime."

One thing you'll need to verify is that your backup software does not rely on access time to determine if a file has been modified or not when making incremental backups. Generally though, they rely on the modified time (mtime) so this is okay.

The "logging" Option

This is a pretty exciting feature that hasn't been given that much attention, and something that many people overlook. By using the "logging" mount option, you enable a filesystem that is journaled on top of the atypical ufs type. Without this logging feature, your filesystem could become corrupted in the event of a crash or power failure. If it's a large filesystem, a reboot could take a very long time as it runs fsck on it.

A journaled filesystem provides a log-based, byte-level filesystem that was developed for transaction oriented, high performance servers. Scalable and robust, its advantage over non-journaled filesystems is the quick restarting capability. A filesystem can be restored to a consistent state in a matter of seconds or minutes as compared to possibly minutes or hours with the traditional fsck method of filesystem checking after a crash.

From the mount_ufs man page, we see this explanation:

"If logging is specified, then logging is enabled for the duration of the mounted file system. Logging is the process of storing transactions (changes that make up a complete UFS operation) in a log before the transactions are applied to the file system. Once a transaction is stored, the transaction can be applied to the file system later. This prevents file systems from becoming inconsistent, therefore eliminating the need to run fsck. And, because fsck can be bypassed, logging reduces the time required to reboot a system if it crashes, or after an unclean halt. The default behavior is nologging.

The log is allocated from free blocks on the file system, and is sized approximately 1 Mbyte per 1 Gbyte of file system, up to a maximum of 64 Mbytes.

Logging can be enabled on any UFS, including root (/). The log created by UFS logging is continually flushed as it fills up. The log is totally flushed when the file system is unmounted or as a result of the lockfs -f command."

To enable this, simply specify "logging" as an option to mount in /etc/vfstab thusly:

# /etc/vfstab
#
#device		device		 mount   FS	 fsck	mount	 mount
#to mount	to fsck		 point   type	 pass	at boot	 options
#
/dev/md/dsk/d2	/dev/md/rdsk/d2	 /RAID   ufs	 2	yes	 logging

You'll need to remount the partition for it to take effect. If the partition is in use (e.g. / or /usr) then you'll need to reboot the system. There are ways around this, but it's easier just to reboot. Once in effect, using the mount command with no options - should list all mounted filesystems. Look for the one you entered in /etc/vfstab above and look at the options field. The word "logging" should appear. You're all set!

The "forcedirectio" Option

This option isn't necessarily of any benefit to a Webserver filesystem unless you're sending out large files like in a code repository or large multimedia files such as movies or MP3s. If this is the case, then enabling this option will speed up operations when reading from the filesystem. For small, static pages this option will have little to no effect.

From the mount_ufs man page, we see this explanation:

"If forcedirectio is specified and supported by the file system, then for the duration of the mount forced direct I/O will be used. If the filesystem is mounted using forcedirectio, then data is transferred directly between user address space and the disk. If the filesystem is mounted using noforcedirectio, then data is buffered in kernel address space when data is transferred between user address space and the disk. forcedirectio is a performance option that benefits only from large sequential data transfers. The default behavior is noforcedirectio."

Important Links

High Availability: Configuring Boot, Root and Swap (PDF)


Content and images are copyright 2001 by Michael Holve