<font><font color="#555555"><b>Copied from a different website: http://tunelinux.pe.kr/bbs/read.php?table=linuxinfo&no=81 </b></font></font>
Credits to the original writer
<font><font color="#555555"> <big><font face="Monotype Corsiva">ㅇ Tuning Linux For Maximum Performance ㅇ The /proc Directory Structure ㅇ The Virtual-Memory Parameters ㅇ The Swap_Cluster ㅇ Tuning The Network One can safely assume that most people run Linux after installing it from one of the common distribution CD-ROMs such as RedHat or Caldera. Your machine probably performs faster and more reliably compared to what it would do with one of the commercial, proprietary, Microsoft Windows variants. Most users are aware that the Linux kernel and the myriad of utilities installed from the distribution CD are only very generalized versions adapted to run well on a variety of diverse configurations. You may feel there is potential for improvement given your particular configuration and hardware mix. Well, yes, that's true. For one thing, the kernel installed from RedHat or Caldera distros are compiled for x386 standard Intel CPUs, and re-compiling for the particular CPU-type you have will give you an edge on performance and reliability. The sysadmin is also well advised to cut away all unwanted fat from the kernel. If you don't have any SCSI hardware installed in your system, there is little sense in having drivers for it in your kernel binary. The same applies for other hardware types, such as network gear, sound, etc. Once you have a kernel that suites your particular situation, you can then fine-tune your kernel according to your workload and preferences. Most of today's Unix variants have self-tuning capabilities that generally work quite well. How does an operating system tune itself, you ask? During the boot phase the kernel queries the system to find out how much RAM and swap space is available, and how fast the CPU is. Once it has this information, the kernel makes adjustments to cache sizes and virtual-memory parameters. The system administrator can further refine this self-tuning process or over-ride it after boot, during execution. It is sometimes advisable to do so because the system administrator has a more intimate knowledge of the kind of workload his or her machine is likely to experience in the course of an average day. However, be warned: playing with sensitive kernels requires some basic understanding of the innards of the operating system. Randomly setting values can mess up your system and actually degrade performance, or in extreme situations crash your system. If your system gets totally messed up as a result of "tuning," do not worry. On the next boot, the old behavior comes back, since the tuning measures apply only until the system is restarted. Linux offers two ways to tune the system dynamically at run-time. One is the standard sysctl system call (check the sysctl man page). This system call can be used from a program or a script (with proper authorization, usually as root) and it will immediately apply the changes requested. ===================================================== The /proc Directory Structure Another way to tune the kernel is through the /proc directory structure. /proc is a pseudo filesystem used as an interface to kernel data structures, rather than reading and interpreting /dev/kmem. Most of it is read-only, but some files let kernel variables be changed. There is a numerical subdirectory for each running process; subdirectories are named by the process ID. Each subdirectory contains the following pseudo- files and directories: cmdline environ fd mem stat status cwd exe maps root statm Command line arguments Values of environment variables Directory, which contains all file descriptors Memory held by this process Process status Process status in human readable form Link to the current working directory Link to the executable of this process Memory maps Link to the root directory of this process Process memory status information Entering more statm pid (substituting pid for the relevant process ID) shows the relevant status of a given process. In fact, most commands that query a process (for example, ps and top) use the /proc filesystem to produce their output. Further to the PID-related information, the /proc file system holds other systemwide information in the following subdirectories: apm cmdline cpuinfo devices dma filesystems interrupts ioports kcore kmsg ksyms loadavg locks meminfo misc modules mounts partitions rtc slabinfo stat swaps uptime version Advanced power management info Kernel command line Info about the CPU Available devices (block and character) Used DMS channels Supported filesystems Interrupt usage I/O port usage Kernel core image Kernel messages Kernel symbol table Load average Kernel locks Memory info Miscellaneous List of loaded modules Mounted filesystems Table of partitions known to the system Real time clock Slab pool info Overall statistics Swap space utilization System uptime Kernel version Depending on your kernel configuration and installed hardware, the /proc directory might also contain the following three subdirectories: net/, scsi/, and sys/. Reading the content of these subdirectory files gives you a glimpse into the kernel tables and processes that manage the overall system activities. You can, for instance, check which and how many devices are currently configured in your system by doing the following: [root@hatta /proc]# more devices Character devices: 1 mem 2 pty 3 ttyp 4 ttyS 5 cua 7 vcs 10 misc 36 netlink 128 ptm 136 pts Block devices: 1 ramdisk 2 fd 3 ide0 9 md [root@hatta /proc]# Once you have familiarized yourself with the structure of the /proc file system you can then proceed to actually write into these files to change the kernel's behavior. To change a value, simply echo the new value into the file. An example is given below in the section on the file system data. You need to be root to do this. You can create your own boot script to get this done every time your system boots. As an example, let's look at how to tune your file system for a machine with very many online users on your machine. The values dquot-nr and dquot-max in /proc/sys/fs show the maximum number of cached disk quote entries and the number of allocated disk quota entres+number of free disk quota entries, respectively. If the number of free cached-disk quotas is low and you have a large number of users on your system, you can raise that figure as necessary. This will benefit the performance of the system in file-system operations. There are generally three areas where you might want to tune your system: The virtual memory The file system The network Let's now see the most common tunable parameters. Since I don't know about the shape of your particular system workload, I am not going to advise on which values to use, but rather how to use the parameters themselves. It is then up to you to find out which value suits your needs best. =============================================================== The Virtual-Memory Parameters (Tuning Linux For Maximum Performance: Page 3 of 5 ) By Moshe Bar August 29, 2000 In This Article Tuning Linux For Maximum Performance The /proc Directory Structure The Virtual-Memory Parameters The Swap_Cluster Tuning The Network The speed and efficiency with which the kernel manages virtual memory in general, and the movement of vm pages to and from swap space in particular, has a big impact on overall workload performance in a busy system. All relevant parameters can be found in /proc/sys/vm. The first important tuning opportunities arise from the bdflush parameter. This file controls the behavior of the bdflush kernel deamon. This deamon decides under which conditions and where the contents of the buffer cache are written back to disk after they have been modified. The bdflush has nine parameters, of which the first three are the most important. The first, nfract, dictates the maximum number of modified (or dirty, in kernel parlance) buffers in the buffer cache. Setting this value to a high reading will cause the kernel to delay writing these buffers to disk. On the other hand, it will also have more buffers to write to disk when the memory is short on the disk. The second parameter, ndirty, tells the kernel the maximum number of dirty buffers bdflush writes to the disk at one time. A high value will cause a more irregular, but bursty activity, while a low value will let the system perform a smoother I/O to disk. The third bdeflush parameter is nrefill. It tells the kernel how many empty buffers to allocate ahead of actual buffer cache use. The higher the value, the more memory will be allocated (which results in less available system memory), but also in less frequent allocations later on (less work for the kernel). As you can see, most of the time, it comes down to a decision of giving the kernel more memory and causing the kernel to work less or vice- versa. The other file in the /procs/sys/vm directory is freepages. It controls when and how aggressively the kernel will start swapping. This file can potentially have a big impact on system performance The first value of this file, min, tells the kernel to prevent programs from allocating more values. Below that min number of free pages, only the kernel will get more memory. The second parameter, low, tells the kernel that below this number it should start to swap out pages aggressively to the swap space. Finally, the third parameter, high, tells the kernel which minimum number of free memory pages have to be in the system for the system to start swapping gently. The kernel will therefore always try to keep this amount of pages free in the system. ------------------------------------------------------------------------- ------- The Swap_Cluster (Tuning Linux For Maximum Performance: Page 4 of 5 ) By Moshe Bar August 29, 2000 In This Article Tuning Linux For Maximum Performance The /proc Directory Structure The Virtual-Memory Parameters The Swap_Cluster Tuning The Network One of the most worthy values to tune in your system, once you reach RAM shortage, is the third parameter of the kswapd file. This parameter, called swap_cluster, tells the kernel the number of pages it should transfer to the disk. The higher the number, the less individual I/O operations will result. Too large a number, say 512 or so, would actually slow the down the system because the request queue would get flooded and the I/O would become so big that the kernel needs to cut it into smaller pieces. Obviously, all the files in the /proc/sys/vm directory are important and the careful system administrator should make sure there is a harmonious mix among the parameters according to his or her needs. But the most important ones are probably covered here. The File System For workstations or servers with heavy I/O duty, tuning the file system might become a necessity. But before actually going into individual values, it is imperative to stress that the tuning has to start at the hardware selection and configuration phase. Make sure to have SCSI when I/O is an issue. SCSI disks perform more efficiently in multi-processing environments. Also, make sure to carefully design the file-system layout according to the I/O heaviness of files. For example, don't put the swap space and the files with the most I/Os on the same disk or the same SCSI controller. Try to spread out the I/O according to your needs. This is why it is always better to have many smaller disks rather than one or two big ones. Having one giant disk, next to security and availability concerns, will just generate more I/Os for that one disk. This is comparatively slower than using several smaller ones. The file-system tuning opportunities in 2.2 kernels reside in the /proc/sys/fs. One peculiarity of the 2.2.x series of kernels is that the file-system controlling tables can only expand, but are never released. Such is true for the parameters we will discuss below. They generally should be increased if you have many disks or file systems. If I/O is not heavy in your workload (such as in scientific calculations or in CAD/CAM applications), you should not need to touch any of these files. One important file to know is file-max. It tells the kernel how many file descriptors (pointers to open files or devices) are allowed concurrently. The default is 4096 for kernel version 2.2.16. If you have error messages saying there are no more file descriptors, change this number. inode-state, inode-nr and inode-max As with file handles, the kernel allocates the inode structures dynamically, but can't free them yet. The value in inode-max denotes the maximum number of inode handlers. This parameter should be three to four times larger than the value in file-max, since stdin, stdout, and network sockets also need an inode structure in the kernel to handle them. If you regularly run out of inodes, you should increase this value. super-nr and super-max Again, super-block structures are allocated by the kernel, but not freed. The file super-max contains the maximum number of super-block handlers, where super-nr shows the number of currently allocated ones. Every mounted file system needs a super block, so if you plan to mount lots of file systems, you may want to increase these numbers. ------------------------------------------------------------------------- ------- Tuning The Network (Tuning Linux For Maximum Performance: Page 5 of 5 ) By Moshe Bar August 29, 2000 In This Article Tuning Linux For Maximum Performance The /proc Directory Structure The Virtual-Memory Parameters The Swap_Cluster Tuning The Network Linux runs best when networked to other machines, preferably other Linux boxes. There are many important parameters to tune in Linux. This is especially true if your Linux box acts as an Internet or an intranet server. We will only look at the most important ones here. To my knowledge, there is no complete treatise on the subject of networking tuning out there, so reading the kernel source is still the best documentation at this time. Maybe one day, somebody will write a documentation of the Linux networking subsystem and its tuning possibilities. When we speak of networking, we mostly refer to TCP/IP. Linux does many more protocols, such as the new Ipv6, Appletalk, Netware, and others more. We will only look into TCP/IP relevant stuff here. Down in /proc/sys/net/ipv4 you will find icmp_echo_ignore_all and icmp_echo_ignore_broadcasts Turn on (1) or off (0), if the kernel should ignore all ICMP ECHO requests, or just those to broadcast and multicast addresses. Please note that if you accept ICMP echo requests with a broadcast/multicast destination address your network may be used as a multiplier for denial-of-service packet flooding attacks to other hosts. So, if you are concerned about security it is probably better to turn on the ignore feature. You surely have heard about the ip_forward parameter. What it does is enable or disable forwarding of IP packages between interfaces. Put a "1" in there if you want forwarding. The ipfrag_high_trash and ipfrag_low_trash parameter is important, too. It tells the kernel the maximum memory used to reassemble IP fragments. When ipfrag_high_thresh bytes of memory is allocated for this purpose, the fragment handler skips packets until ipfrag_low_thresh is reached again. You also have the TCP-only parameters, they are: tcp_keepalive_probes, which is the number of keep-alive probes TCP sends out, until it decides that the connection is broken. tcp_keepalive_time, tells the kernel how often TCP sends out keep-alive messages, when keep alive is enabled. The default is two hours. You might want to lower this reading in a LAN environment to a mere few minutes, because many such packets might occupy a lot of memory. Another parameter to monitor is gc_stale_time. It controls how often to check for stale ARP entries. After an ARP entry is stale it will be resolved again (useful when an IP address migrates to another machine). When ucast_solicit is > 0 it first tries to send an ARP packet directly to the known host; when that fails and mcast_solicit is > 0, an ARP request is broadcasted. Conclusion In this article, we have looked at the most important kernel parameters to control the efficiency and speed of your system. Your particular workload determines which values are best. From time to time, it might make sense to review your workload and change the parameters again correspondingly. Very often, one forgets to change the parameters when the hardware or the kernel versions are upgraded. Make sure you remember to update the tuning parameters, too. A well-tuned system might actually extend the lifetime of your hardware configuration. And knowing which dials to turn could, if done correctly, save you some hard-earned dollars. Moshe Bar is an Israeli system administrator and OS researcher, who started learning Unix on a PDP-11 with AT&T Unix Release 6 back in 1981 He holds a Master's degree in computer science and writes Unix-related books. Visit his website at http://www.moelabs.com/ </font></big></font></font>
