A management node can be used to access the consoles of
all of the cluster nodes and to monitor their status. This is particularly useful (even necessary)
in the event that a node loses network connectivity or if a node stops responding
altogether.
Keyboard/Video/Mouse (KVM) switches are one method used to simplify access to the consoles of multiple cluster nodes. With this method, a keyboard, mouse, and video cable is run from the back of each node and connected to a KVM switch. Then a keyboard, video, and mouse cable is run from a designated port on the KVM switch to the management node. A special keyboard sequence is used to obtain an on-screen listing of the available systems on the KVM switch from which the administrator can choose.
Advantages of using a KVM switch are that there are usually no modifications needed to the operating system or BIOS. There is less of need to understand terminal emulation and no complications with properly viewing long messages or scrolling messages (as sometimes can happen with serial consoles).
Disadvantages of using a KVM switch include increased cable management. Although some KVM switches use an adapter to combine the keyboard/video/mouse cables into one cable. Also, KVMs may not support remote access. Usually this capability is found in “enterprise” level KVM switches, which are more expensive.
A serial port concentrator, also known as a serial port multi-plexor, is used as a sort of switch for all of the serial connections that are used (either for an EMP or for serial consoles). The serial port concentrator will allow you to log into it and access any one of the connected systems. You will need a serial port concentrator if you are planning on using an EMP or serial console for your systems. Usually, there is not much involved in setting up a serial port concentrator. The interface and associated commands will vary from vendor to vendor. Unfortunately, the serial port concentrators will sometimes be used for other things, such as terminal servers, and so there may be a lot of other functionality that you do not need. This can sometimes be confusing as you read through the documentation on these products or try to find one to purchase.
A serial console provides access to a cluster node’s console by using one of its serial ports. With this method, a cable is run from a serial port on each node to a serial port concentrator. A cable is then run from a port on the serial port concentrator to a serial port on your management node. You also have the option to run a cable from the serial port concentrator to a network switch in order toUnlike using a KVM, you will need to make changes to the configuration of your system to use a serial console.
Advantages of this method are the ease of cable management (just one cable per system). Also, many port concentrators allow remote access.
Disadvantages of using a serial console include the number of changes required to the operating system and the BIOS.
In order to properly interact with your system via a serial console from boot time to login, you will need to address four distinct stages of the startup:
1. The system loads the BIOS.
2. The BIOS transfers control to the boot loader.
3. The boot loader loads the kernel.
4. The kernel starts user-space processes that provide a login prompt for connections over the serial ports.
As you start to modify your system’s settings to enable
serial console support, it will be very helpful to you if keep your keyboard
and monitor attached until after you have completed the entire process. This way you will be able to see how each component
(i.e., the BIOS, boot loader, kernel, and OS) is behaving. For example, once you enable serial port redirection
in your BIOS, your BIOS may allow you to interact with it over the serial
port in addition to the attached
keyboard/monitor. However, your boot
loader, once you have instructed it to use the serial port, may only allow
you to interact with it over the serial port (you may not see anything on
an attached monitor). Knowing these
behaviors can help you troubleshoot serial console problems and help you avoid
confusion when you see some output on your serial console only and some on
both your serial console and an attached monitor.
In order for a serial connection to effectively replace a directly attached keyboard/monitor, you will need to be able to interact with your cluster node’s BIOS via the serial port. Unfortunately, many computer systems do not have a feature allowing you to redirect your BIOS interaction through a serial port. Many “server” computer systems offered by various vendors do indeed provide system BIOSs that support serial consoles; however, this may not be the case. You will need to check your BIOS’ support for serial consoles. If your BIOS does not support console redirection, you will not be able to interact with your BIOS without using, for instance, an attached monitor (perhaps via a KVM).
If your BIOS supports it, enable
support for console redirection in your BIOS.
The procedure for doing this will vary from BIOS to BIOS but should
be just like changing any other BIOS setting.
This means that you will need to find the appropriate menu and then
select the entries related to serial port redirection. If you have a choice, pick the type of terminal
emulation that is used (e.g., vt102, ansi, etc.). This
must match the terminal emulation used by your terminal application (e.g.,
minicom). The
emulation you use does not matter much as long as the sending and receiving
ends agree on it. Each component (including
the serial port concentrator) will also need to agree on the speed of the
serial connection (e.g., 9600, 115200 baud). Obviously, with higher speeds you will have
a better experience; but the limiting factor is the maximum speed allowed
by any of the involved components.
At this point, if you reboot the system and monitor the
serial console, you should see the BIOS output and be able to enter your BIOS
by hitting the appropriate keys. Each
vendor may supply you with special key combinations to substitute for keys
that are not supported by your terminal emulation (e.g., F10, F11, F12 for vt100). Depending
on your BIOS, you may also see output on an attached monitor.
Take note of your BIOS’ behavior for future reference.
After the BIOS is done loading, however, no more output will be sent
to your serial console. When the BIOS
transfers control to the boot loader, you will only be able to see/interact
with the boot loader on an attached monitor.
The same goes for the kernel and OS.
We will configure these components to use the serial console in the
following sections.
When the BIOS has finished its
tasks, it transfers control of the cluster node to the boot loader. The purpose of the boot loader is to load a
kernel. You generally see a message from the boot loader
(e.g., “lilo
:
”) that you can use to select the
kernel to use. You need to be able
to view these messages (and provide input) via the serial port. The boot loader can be configured to do this.
Once you select a kernel (or the default kernel is loaded), the kernel will print out a lot of information as it probes and initializes your system. The Linux kernel allows you to redirect these messages to a serial port. This is done by passing command line arguments to the kernel. You can do this by instructing the boot loader to do so.
As a result, to allow interaction with the boot loader and kernel over a serial port, we need to make changes to the boot loader’s configuration file. We present the changes needed for two popular boot loaders: LILO and GRUB.
To redirect LILO’s output to
a serial port, you will make changes in the global section of /etc/lilo.conf
. We will also add (or modify) an append
statement for the kernel in the
global section of /etc/lilo.conf
. If for some reason you do not want all kernels
to redirect their output, add an append
statement for the appropriate image
section(s) instead of in the global section of /etc/lilo.conf
.
To interact with LILO via the serial console, you will
need to make the following changes to the global section of /etc/lilo.conf
:
# Comment out message=/boot/message
if /boot/message is not a text file
# message=/boot/message
# Add the line that tells LILO to send its output to / get its input
# The first parameter is the serial
port used,
# e.g. for ttyS0, enter 0; for ttyS1,
enter 1
# The second parameter is the speed
to use (see the lilo.conf(5)
# man page for other possible parameters).
Use the highest
# speed supported by your components
for best results.
serial=0,9600
To instruct the kernel to send its output to the serial
console, add the following line to the global section of /etc/lilo.conf
:
# The
kernel sends its output to all consoles defined below.
# However,
the last one defined is the one the kernel sets
# as the default console for stdout (where user programs send
# their output by default). So you will want to list the serial
# console
as the last console. Substitute the
correct serial console
# below (if
you use ttyS1, replace all ttyS0 below).
append=”console=tty0 console=ttyS0,9600”
Your changes will not be effective until you run the following
command to install the new boot record:
/sbin/lilo
At this point, if you reboot your system and monitor the
serial console, you will see the BIOS output (as configured in the last section)
followed by the lilo
:
prompt. You should be able to press the shift
key and select a kernel. If you can do this, the boot loader is successfully
using the serial console. Once a kernel
is selected, you will see the message indicating that the kernel is being
loaded. After this one line is finished,
the kernel takes over and all of the kernel’s probe/initialization messages
should be displayed on the serial console. If you see the line stating that the kernel
is loading, but then nothing else happens, you have configured the boot loader
correctly but there is a problem with the configuration of the kernel’s serial
port redirection. As mentioned earlier,
it would be good to have a monitor/keyboard attached during this process. In this case, you can at least see that the
system is still booting (if the messages are appearing on the monitor) even
if you do not see anything on the serial console.
In addition, you will need the keyboard/monitor to log onto the system
(unless you log in over the network) until you configure your system for login
over the serial console (discussed in a following section).
If you have successfully redirected the output from your
kernel, you will see all of its messages over the serial console. After the kernel finishes all of its tasks,
it starts the first user-space process, init
. The init
process is responsible for, among other things, starting the processes that
provide you with a login prompt. However,
we have not yet taken the steps necessary to provide a login prompt to users
connecting via a serial port. This
will be covered in a later section. Even
though you will not be able to log in over the serial console yet, you will
be able to see, over the serial console, all of the output from init and all
of the other startup processes since the kernel has set the default console
to the serial port we specified in the boot loaders configuration file.
To get GRUB to use a serial console, add the following lines to the GRUB configuration file:
# The serial line
gives GRUB information about the serial port
# unit=0 corresponds to ttyS0 and
unit=1 to ttyS1
serial –unit=0 –speed=9600
# The terminal line tells GRUB which consoles are available
# serial is the default (since it
is listed first)
# and console indicates the keyboard/monitor
# The timeout is how long to wait
for input from one of the
# listed consoles before using the
default
terminal –timeout=10 serial console
# Comment out the splashimage line (the graphical
interface to GRUB)
# splashimage=<path to image>
To configure your kernel to use a serial console, modify your kernel lines in the GRUB configuration file as follows:
# The
kernel sends its output to all consoles defined below.
# However,
the last one defined is the one the kernel sets
# as the default console for stdout (where user programs send
# their output by default). So you will want to list the serial
# console
as the last console. Substitute the
correct serial console
# below (if
you use ttyS1, replace all ttyS0 below).
kernel <existing kernel parameters> console=tty0
console=ttyS0,9600
At this point, if you reboot your system and monitor the serial console, you will see the output from your BIOS (as configured in the last section) and you will receive the prompt from GRUB to select a kernel. If you are able to do this, you have configured the boot loader correctly. After GRUB has loaded the kernel, control of the system is handed over to the kernel. Thus you should start to see all of the probe/initialization messages from the kernel over the serial console. If you can use the GRUB menu to select a kernel over the serial console, but you do not see the kernel probe/initialization messages over the serial console, you have configured the boot loader correctly but there is a problem with the configuration of the kernel’s serial port redirection. As mentioned earlier, it would be good to have a monitor/keyboard attached during this process. In this case, you can at least see that the system is still booting (if the messages are appearing on the monitor) even if you do not see anything on the serial console. In addition, you will need the keyboard/monitor to log onto the system (unless you log in over the network) until you configure your system for login over the serial console (discussed in a following section).
If you have successfully redirected the output from your
kernel, you will see all of its messages over the serial console. After the kernel finishes all of its tasks,
it starts the first user-space process, init
. The init
process is responsible for, among other things, starting the processes that
provide you with a login prompt. However,
we have not yet taken the steps necessary to provide a login prompt to users
connecting via a serial port. This
will be covered in a later section. Even
though you will not be able to log in over the serial console yet, you will
be able to see, over the serial console, all of the output from init and all
of the other startup processes since the kernel has set the default console
to the serial port we specified in the boot loaders configuration file.
When the system starts to boot the OS, the init
process is started. The init
process will read through the file /etc/inittab
to do a number of essential startup
tasks. One of these tasks is to start
“getty” processes
[1]
to wait for users to log on.
A getty
process displays the “login:” prompt where you enter your username. The getty
process, in turn, starts
up a login
process that
prompts you for your password.
In order to log in over a serial console, a getty
process needs to be waiting for someone to connect via the serial port. In Linux, the device names for the serial ports
are /dev/ttyS0
, /dev/ttyS1
, etc. We can instruct the init
process to start a getty
process for this purpose by adding a line to the /etc/inittab
file as follows:
# Run gettys
in standard runlevels
S0:12345:respawn:/sbin/agetty
-h ttyS0 9600 vt100
1:2345:respawn:/sbin/mingetty tty1
2:2345:respawn:/sbin/mingetty tty2
The lines starting with “1:” and “2:” above are examples
of the getty
processes that are used by an attached keyboard/monitor. We added another line starting with “S0:
” to start a getty
process to monitor the serial port ttyS0
. “S0
”
is used as the label since we used ttyS0
in the example. You might want to use
“S1
” if you use ttyS1
.
The field after the label (“12345
”)
indicates that this getty
process should be started in runlevels 2, 3, 4 and
5 as well as runlevel 1 (single-user mode). “respawn
” indicates that when
the process is killed it should be restarted by the init process. The final field specifies that we should use
the agetty
program to monitor for connections. The
agetty
program is a getty
process that is appropriate for monitoring
serial lines. See the agetty
man page for details on available parameters.
As indicated above, you will at least need to specify the serial port
to use, the speed at which to communicate, and the terminal emulation to use. In our example, these parameters are ttyS0,
9600 baud, and vt100, respectively. Consult
the agetty
man page for other parameters that might be appropriate for you. Once you have made this change, restart the
init
process (type telinit
q
to do this).
The file /etc/securetty
lists all of the
consoles from which root
can log in. Thus, now that you
have created a getty
process to monitor the serial line,
you will want to add the name of this serial line to the list in /etc/securetty
. For example, if you started a getty
process on ttyS0
, you will
add ttyS0
on its own line
in /etc/securetty
.
At this point, if you reboot the system and monitor the
serial console, you will see the messages from your BIOS; you will be able
to interact with your boot loader; you will see all of the kernel’s probing/initialization
messages; you will see all of the output from your startup scripts.
Finally, the getty
process you just created will display the “login:” prompt to you and the normal
login process will continue. Both normal
users and root should be able to log in via the serial console. You should now be able to completely control
your system over a serial connection from BIOS configuration to login.
A remote management port is a special connection on a system that can allow certain services, such as power cycling, to take place remotely. Many vendors offer proprietary cards to insert into your system to obtain a remote management port. Some motherboards also provide an Emergency Management Port (EMP) that is usually accessed through one of the serial ports. The method you use to access these cards will vary by vendor. With systems that have an EMP, a cable can be run from the appropriate serial port to a serial port concentrator. Whichever form of remote management port is used, this is an extremely useful feature to have to ensure that a system can be managed remotely. The serial console can be helpful for all times that do not include the system being locked up. In this case, the remote access to the serial console will not help. You will need a way to remotely cycle the power on the system or to run other diagnostics (which may be vendor specific).
The advantages of using a remote management port are clear. You will have the ability to remotely cycle the power and possible perform other diagnostics on a system regardless of the state of the OS.
If you choose to or need to use a vendor-supplied card,
the disadvantage will be the cost of these cards.
Simple Network Management Protocol (SNMP) is a very useful tool that is used by many vendors to report various statistics and information about their hardware (and software). At the management stations, you may find it helpful to use SNMP to collect various vital statistics about your cluster systems. This will give you a visual indicator of any problems that may be occurring on a particular system in the cluster. The information you can collect will depend on the information provided by your vendor. Usually only server products will provide useful information whereas systems designed for home use may not provide any information. There are other products, such as BigBrother, that can monitor your systems using simple methods, such as pinging your systems, and indicating any systems which may be having problems.
8.2 User Accounts
Methods for creating user accounts and propagating them
across the cluster will be discussed. This includes an
introduction to the authentication process (PAM), password
files, and NIS(+). Other methods such as LDAP will only be given
a overview with no implementation details.
8.2.1 Authentication Process
8.2.2 NIS and NIS+
8.2.3 Overview of LDAP
8.3 Software and Configuration File Consistancy
Maintaining copies of software and configuration files
across nodes can consume a considerable amount of time and
effort.
8.3.1 CF Engine
8.4 Node Cloning
Different strategies for cloning the nodes will be
introduced. This includes the kickstart utility from Redhat,
dump utilities, tar, system imager, and ghost. The creation of a
boot disk that has the correct utilities for network building or
restores of compute nodes will be given. Other methodologies
such as bootp and the Intel network card boot utility will be
presented.
8.4.1 Boot Disk Creation
8.4.2 Kickstart
8.4.3 Filesystem dump utility
8.4.4 GNU tar
8.4.5 SystemImage
8.4.6 Ghost
8.5 High Availability
Failover of cluster resources will be presented in this
chapter.
8.5.1 Data and System Backup
8.5.2 Availability of headnode
8.5.3 Compute Node Failure
8.6 Security
For a system that will be connected to the internet it is
very important that certain steps be followed to ensure the
security of the cluster. SSH, tcp_wrappers, inetd, and auditing
of system daemons will be introduced.
[1] There are a number of actual programs that are referred to as “getty” processes, such as getty, agetty, and mingetty. We will refer to the use of any one of these as the use of a getty process.