Tune your Linux

<font><font color="#555555"><b>Copied from a different website: http://tunelinux.pe.kr/bbs/read.php?table=linuxinfo&no=81
</b></font></font>
Credits to the original writer
<font><font color="#555555">
<big><font face="Monotype Corsiva">ㅇ Tuning Linux For Maximum Performance
ㅇ The /proc Directory Structure
ㅇ The Virtual-Memory Parameters
ㅇ The Swap_Cluster
ㅇ Tuning The Network 
   
 
One can safely assume that most people run Linux after installing it 
from one of the common distribution CD-ROMs such as RedHat or Caldera. 
Your machine probably performs faster and more reliably compared to what 
it would do with one of the commercial, proprietary, Microsoft Windows 
variants. Most users are aware that the Linux kernel and the myriad of 
utilities installed from the distribution CD are only very generalized 
versions adapted to run well on a variety of diverse configurations. You 
may feel there is potential for improvement given your particular 
configuration and hardware mix. Well, yes, that's true. 

For one thing, the kernel installed from RedHat or Caldera distros are 
compiled for x386 standard Intel CPUs, and re-compiling for the 
particular CPU-type you have will give you an edge on performance and 
reliability. The sysadmin is also well advised to cut away all unwanted 
fat from the kernel. If you don't have any SCSI hardware installed in 
your system, there is little sense in having drivers for it in your 
kernel binary. The same applies for other hardware types, such as 
network gear, sound, etc. 

Once you have a kernel that suites your particular situation, you can 
then fine-tune your kernel according to your workload and preferences. 

Most of today's Unix variants have self-tuning capabilities that 
generally work quite well. How does an operating system tune itself, you 
ask? 

During the boot phase the kernel queries the system to find out how much 
RAM and swap space is available, and how fast the CPU is. Once it has 
this information, the kernel makes adjustments to cache sizes and 
virtual-memory parameters. 

The system administrator can further refine this self-tuning process or 
over-ride it after boot, during execution. It is sometimes advisable to 
do so because the system administrator has a more intimate knowledge of 
the kind of workload his or her machine is likely to experience in the 
course of an average day. However, be warned: playing with sensitive 
kernels requires some basic understanding of the innards of the 
operating system. Randomly setting values can mess up your system and 
actually degrade performance, or in extreme situations crash your 
system. 

If your system gets totally messed up as a result of "tuning," do not 
worry. On the next boot, the old behavior comes back, since the tuning 
measures apply only until the system is restarted. 

Linux offers two ways to tune the system dynamically at run-time. One is 
the standard sysctl system call (check the sysctl man page). This system 
call can be used from a program or a script (with proper authorization, 
usually as root) and it will immediately apply the changes requested. 

=====================================================
 The /proc Directory Structure

  
 
Another way to tune the kernel is through the /proc directory 
structure. /proc is a pseudo filesystem used as an interface to kernel 
data structures, rather than reading and interpreting /dev/kmem. Most of 
it is read-only, but some files let kernel variables be changed. There 
is a numerical subdirectory for each running process; subdirectories are 
named by the process ID. Each subdirectory contains the following pseudo-
files and directories: 


cmdline
environ  
fd
mem  	
stat  	
status
cwd  		
exe  	 	
maps 	  	
root 	  	
statm 	


 
Command line arguments
Values of environment variables
Directory, which contains all file descriptors
Memory held by this process
Process status
Process status in human readable form
Link to the current working directory
Link to the executable of this process
Memory maps
Link to the root directory of this process
Process memory status information

 

Entering more statm pid (substituting pid for the relevant process ID) 
shows the relevant status of a given process. In fact, most commands 
that query a process (for example, ps and top) use the /proc filesystem 
to produce their output. 

Further to the PID-related information, the /proc file system holds 
other systemwide information in the following subdirectories: 



apm
cmdline  	
cpuinfo  
devices  	
dma
filesystems
interrupts  	
ioports  	
kcore
kmsg  	
ksyms  	
loadavg  	
locks  	
meminfo  	
misc  	
modules  	
mounts  	
partitions  	
rtc  	
slabinfo  	
stat  	
swaps  	
uptime  	
version  	

 
Advanced power management info
Kernel command line
Info about the CPU
Available devices (block and character)
Used DMS channels
Supported filesystems
Interrupt usage
I/O port usage
Kernel core image
Kernel messages
Kernel symbol table
Load average
Kernel locks
Memory info
Miscellaneous
List of loaded modules
Mounted filesystems
Table of partitions known to the system
Real time clock
Slab pool info
Overall statistics
Swap space utilization
System uptime
Kernel version

 

Depending on your kernel configuration and installed hardware, the /proc 
directory might also contain the following three subdirectories: net/, 
scsi/, and sys/. 

Reading the content of these subdirectory files gives you a glimpse into 
the kernel tables and processes that manage the overall system 
activities. You can, for instance, check which and how many devices are 
currently configured in your system by doing the following: 



[root@hatta /proc]# more devices
Character devices:
1 mem
2 pty
3 ttyp
4 ttyS
5 cua
7 vcs
10 misc
36 netlink
128 ptm
136 pts
Block devices:
1 ramdisk
2 fd
3 ide0
9 md
[root@hatta /proc]#

Once you have familiarized yourself with the structure of the /proc file 
system you can then proceed to actually write into these files to change 
the kernel's behavior. To change a value, simply echo the new value into 
the file. An example is given below in the section on the file system 
data. You need to be root to do this. You can create your own boot 
script to get this done every time your system boots. As an example, 
let's look at how to tune your file system for a machine with very many 
online users on your machine. 

The values dquot-nr and dquot-max in /proc/sys/fs show the maximum 
number of cached disk quote entries and the number of allocated disk 
quota entres+number of free disk quota entries, respectively. If the 
number of free cached-disk quotas is low and you have a large number of 
users on your system, you can raise that figure as necessary. This will 
benefit the performance of the system in file-system operations. 

There are generally three areas where you might want to tune your 
system: 


The virtual memory 
The file system 
The network 
Let's now see the most common tunable parameters. Since I don't know 
about the shape of your particular system workload, I am not going to 
advise on which values to use, but rather how to use the parameters 
themselves. It is then up to you to find out which value suits your 
needs best. 

===============================================================
The Virtual-Memory Parameters

(Tuning Linux For Maximum Performance:  Page 3 of 5 )

By Moshe Bar

August 29, 2000

  In This Article  
   
   
   Tuning Linux For Maximum Performance

 The /proc Directory Structure

 The Virtual-Memory Parameters

 The Swap_Cluster

 Tuning The Network 
  
   
 
The speed and efficiency with which the kernel manages virtual memory in 
general, and the movement of vm pages to and from swap space in 
particular, has a big impact on overall workload performance in a busy 
system. All relevant parameters can be found in /proc/sys/vm. 
The first important tuning opportunities arise from the bdflush 
parameter. This file controls the behavior of the bdflush kernel deamon. 
This deamon decides under which conditions and where the contents of the 
buffer cache are written back to disk after they have been modified. 

The bdflush has nine parameters, of which the first three are the most 
important. The first, nfract, dictates the maximum number of modified 
(or dirty, in kernel parlance) buffers in the buffer cache. Setting this 
value to a high reading will cause the kernel to delay writing these 
buffers to disk. On the other hand, it will also have more buffers to 
write to disk when the memory is short on the disk. 

The second parameter, ndirty, tells the kernel the maximum number of 
dirty buffers bdflush writes to the disk at one time. A high value will 
cause a more irregular, but bursty activity, while a low value will let 
the system perform a smoother I/O to disk. 

The third bdeflush parameter is nrefill. It tells the kernel how many 
empty buffers to allocate ahead of actual buffer cache use. The higher 
the value, the more memory will be allocated (which results in less 
available system memory), but also in less frequent allocations later on 
(less work for the kernel). 

As you can see, most of the time, it comes down to a decision of giving 
the kernel more memory and causing the kernel to work less or vice-
versa. 

The other file in the /procs/sys/vm directory is freepages. It controls 
when and how aggressively the kernel will start swapping. This file can 
potentially have a big impact on system performance 

The first value of this file, min, tells the kernel to prevent programs 
from allocating more values. Below that min number of free pages, only 
the kernel will get more memory. 

The second parameter, low, tells the kernel that below this number it 
should start to swap out pages aggressively to the swap space. 

Finally, the third parameter, high, tells the kernel which minimum 
number of free memory pages have to be in the system for the system to 
start swapping gently. The kernel will therefore always try to keep this 
amount of pages free in the system. 


-------------------------------------------------------------------------
-------

The Swap_Cluster

(Tuning Linux For Maximum Performance:  Page 4 of 5 )

By Moshe Bar

August 29, 2000

  In This Article  
   
   
   Tuning Linux For Maximum Performance

 The /proc Directory Structure

 The Virtual-Memory Parameters

 The Swap_Cluster

 Tuning The Network 
  
   
 
One of the most worthy values to tune in your system, once you reach RAM 
shortage, is the third parameter of the kswapd file. 
This parameter, called swap_cluster, tells the kernel the number of 
pages it should transfer to the disk. The higher the number, the less 
individual I/O operations will result. Too large a number, say 512 or 
so, would actually slow the down the system because the request queue 
would get flooded and the I/O would become so big that the kernel needs 
to cut it into smaller pieces. 

Obviously, all the files in the /proc/sys/vm directory are important and 
the careful system administrator should make sure there is a harmonious 
mix among the parameters according to his or her needs. But the most 
important ones are probably covered here. 

The File System
For workstations or servers with heavy I/O duty, tuning the file system 
might become a necessity. But before actually going into individual 
values, it is imperative to stress that the tuning has to start at the 
hardware selection and configuration phase. Make sure to have SCSI when 
I/O is an issue. SCSI disks perform more efficiently in multi-processing 
environments. Also, make sure to carefully design the file-system layout 
according to the I/O heaviness of files. For example, don't put the swap 
space and the files with the most I/Os on the same disk or the same SCSI 
controller. Try to spread out the I/O according to your needs. 

This is why it is always better to have many smaller disks rather than 
one or two big ones. Having one giant disk, next to security and 
availability concerns, will just generate more I/Os for that one disk. 
This is comparatively slower than using several smaller ones. 

The file-system tuning opportunities in 2.2 kernels reside in 
the /proc/sys/fs. One peculiarity of the 2.2.x series of kernels is that 
the file-system controlling tables can only expand, but are never 
released. Such is true for the parameters we will discuss below. They 
generally should be increased if you have many disks or file systems. If 
I/O is not heavy in your workload (such as in scientific calculations or 
in CAD/CAM applications), you should not need to touch any of these 
files. 

One important file to know is file-max. It tells the kernel how many 
file descriptors (pointers to open files or devices) are allowed 
concurrently. The default is 4096 for kernel version 2.2.16. If you have 
error messages saying there are no more file descriptors, change this 
number. 


inode-state, inode-nr and inode-max

As with file handles, the kernel allocates the inode structures 
dynamically, but can't free them yet. 

The value in inode-max denotes the maximum number of inode handlers. 
This parameter should be three to four times larger than the value in 
file-max, since stdin, stdout, and network sockets also need an inode 
structure in the kernel to handle them. If you regularly run out of 
inodes, you should increase this value. 


super-nr and super-max

Again, super-block structures are allocated by the kernel, but not 
freed. The file super-max contains the maximum number of super-block 
handlers, where super-nr shows the number of currently allocated ones. 
Every mounted file system needs a super block, so if you plan to mount 
lots of file systems, you may want to increase these numbers. 


-------------------------------------------------------------------------
-------
Tuning The Network

(Tuning Linux For Maximum Performance:  Page 5 of 5 )

By Moshe Bar

August 29, 2000

  In This Article  
   
   
   Tuning Linux For Maximum Performance

 The /proc Directory Structure

 The Virtual-Memory Parameters

 The Swap_Cluster

 Tuning The Network 
  
   
 
Linux runs best when networked to other machines, preferably other Linux 
boxes. 
There are many important parameters to tune in Linux. This is especially 
true if your Linux box acts as an Internet or an intranet server. We 
will only look at the most important ones here. To my knowledge, there 
is no complete treatise on the subject of networking tuning out there, 
so reading the kernel source is still the best documentation at this 
time. Maybe one day, somebody will write a documentation of the Linux 
networking subsystem and its tuning possibilities. 

When we speak of networking, we mostly refer to TCP/IP. Linux does many 
more protocols, such as the new Ipv6, Appletalk, Netware, and others 
more. We will only look into TCP/IP relevant stuff here. 

Down in /proc/sys/net/ipv4 you will find icmp_echo_ignore_all and 
icmp_echo_ignore_broadcasts 

Turn on (1) or off (0), if the kernel should ignore all ICMP ECHO 
requests, or just those to broadcast and multicast addresses. 

Please note that if you accept ICMP echo requests with a 
broadcast/multicast destination address your network may be used as a 
multiplier for denial-of-service packet flooding attacks to other hosts. 
So, if you are concerned about security it is probably better to turn on 
the ignore feature. 

You surely have heard about the ip_forward parameter. What it does is 
enable or disable forwarding of IP packages between interfaces. Put 
a "1" in there if you want forwarding. 

The ipfrag_high_trash and ipfrag_low_trash parameter is important, too. 
It tells the kernel the maximum memory used to reassemble IP fragments. 
When ipfrag_high_thresh bytes of memory is allocated for this purpose, 
the fragment handler skips packets until ipfrag_low_thresh is reached 
again. 

You also have the TCP-only parameters, they are: 


tcp_keepalive_probes, which is the number of keep-alive probes TCP sends 
out, until it decides that the connection is broken. 

tcp_keepalive_time, tells the kernel how often TCP sends out keep-alive 
messages, when keep alive is enabled. The default is two hours. You 
might want to lower this reading in a LAN environment to a mere few 
minutes, because many such packets might occupy a lot of memory. 
Another parameter to monitor is gc_stale_time. It controls how often to 
check for stale ARP entries. After an ARP entry is stale it will be 
resolved again (useful when an IP address migrates to another machine). 
When ucast_solicit is > 0 it first tries to send an ARP packet directly 
to the known host; when that fails and mcast_solicit is > 0, an ARP 
request is broadcasted. 

Conclusion
In this article, we have looked at the most important kernel parameters 
to control the efficiency and speed of your system. Your particular 
workload determines which values are best. From time to time, it might 
make sense to review your workload and change the parameters again 
correspondingly. 

Very often, one forgets to change the parameters when the hardware or 
the kernel versions are upgraded. Make sure you remember to update the 
tuning parameters, too. 

A well-tuned system might actually extend the lifetime of your hardware 
configuration. And knowing which dials to turn could, if done correctly, 
save you some hard-earned dollars.

Moshe Bar is an Israeli system administrator and OS researcher, who 
started learning Unix on a PDP-11 with AT&T Unix Release 6 back in 1981 
He holds a Master's degree in computer science and writes Unix-related 
books. Visit his website at http://www.moelabs.com/



</font></big></font></font>

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.