Friday, September 12, 2008

Setting Up TPC-H: Part 2

Now that we have the tools to create the data, lets create a place in the database to put it and then insert the data.

Step 1: Create TPCH Tablespace and user/schema
Create both a dedicated tablespace and schema to contain/access this data - here it is simply called TPCH. As the SYS user

SQL> CREATE SMALLFILE TABLESPACE "TPCH" DATAFILE '/u01/app/oracle/oradata/BRS01/tpch.dbf' SIZE 1000M AUTOEXTEND ON NEXT 10M MAXSIZE UNLIMITED LOGGING EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO;

SQL> CREATE USER "TPCH" PROFILE "DEFAULT" IDENTIFIED BY "password" DEFAULT TABLESPACE "TPCH" TEMPORARY TABLESPACE "TEMP" QUOTA UNLIMITED ON "TPCH" ACCOUNT UNLOCK;
SQL> GRANT "CONNECT" TO "TPCH";
SQL> GRANT CREATE TABLE TO "TPCH";
SQL> GRANT CREATE VIEW TO "TPCH";

Step 2: Create the tables
TPCH comes with two files (dss.ddl and dss.ri) that contain the DDL and referential integrity setup. However, since we will use "direct path"option of sqlloader to put the data into the database, it doesn't make sense to have any primary or foreign keys in place when loading. Run this script (as the tpch user) to create all the required tables.

Step 3: Generate and load data into database
Jeff Moss has put together a "wrapper" script that uses dbgen to create and store the data in flat files and then calls Oracle's sqlldr to put the data into the database - see details here. Here is what needs to be done
  • Download the control files (*.ctl) and the two scripts and put them in same directory as tpch i.e. where dbgen and qgen are located. Due the wiki tool used on Jeff's page the naming is a bit mangled - just rename using lower case and use .ctl or .sh extension.
  • Run the scripts, following Jeff's examples almost verbatim. Obviously, use a connection string appropriate for your own database and pay attention to the last parameter - it is the total number of (parent + child) processes created or parallel streams used to create and load the data. Too high a number here can very easily bring a system to it's knees - my rule of thumb is to make this equal to the number of CPU cores. The first parameter is the TPCH Scale Factor: 1 ~ 1 GB database, 10 ~ 10 GB database etc.
Step 4: Create primary keys, foreign keys and indexes
These constraints are specified in the dss.ri file of TPCH. Unfortunately, some syntactic idiosyncrasies and outdated schema names mean that this is not simply plug 'n play on Oracle. To make life easier I created a Oracle-compatible script that will setup all the primary and foreign keys - the script is here.

Setting Up TPC-H: Part 1

TPC-H is the data warehouse benchmark of the Transaction Processing Council (their web site has lots of results submitted by vendors trying to display the prowess of their hardware and/or software). As in the case of all benchmarks, TPC-H is not perfect - it's not even a star schema so doesn't really represent 99% of real data warehouses, comparing results between systems, groups, companies etc is fraught with difficulty and complication but we're gonna do it anyways! Just remember all the usual benchmark caveats.

Download and untar the files from the TPC-H web site then make a copy/rename makefile.suite to makefile and edit the four lines that specify the compiler on your system (CC), database, machine and workload

CC = gcc
DATABASE= ORACLE
MACHINE = LINUX
WORKLOAD = TP
CH

When setting the database you'll notice that there is no predefined type for Oracle. Huh? The company that has 45% of the RDBMS market is not listed here? Either Oracle requested this or the TPC guys are extremely biased in favor of IBM or Microsoft (their web site does use ASP.NET :-) Because of this we need to define an Oracle section ourselves. Edit tpcd.h and add section for Oracle (with all variables defined to empty strings, this is the simplest setup that works)

#ifdef ORACLE
#define GEN_QUERY_PLAN ""
#define START_TRAN ""
#define END_TRAN ""
#define SET_OUTPUT ""
#define SET_ROWCOUNT ""
#define SET_DBASE ""
#endif /* ORACLE */

Then just type make to compile, this will generate two executables dbgen and qgen which, respectively, are used to generate flat files for loading into the database and the queries to run. See the README for gory details.

Thursday, September 11, 2008

SQLPlus (Shared Library) Not Found! (Linux, 10g2)

After installing Oracle database 10g2 on Linux, you may get the following error when trying to invoke sqlplus

$ sqlplus
sqlplus: error while loading shared libraries: libsqlplus.so: cannot open shared object file: No such file or directory

Mmmm, that's strange, the install went perfectly. This error occurs under the following conditions
  • The user invoking sqlplus is not the oracle user (or a member of the oinstall group)
  • No Metalink patches have been applied yet
So let's do some investigation

$ su -
Password:
~ # locate libsqlplus.so
/u01/app/oracle/product/10.2.0/db_1/lib/libsqlplus.so
~ # ll /u01/app/oracle/product/10.2.0/db_1/lib/libsqlplus.so
-rw-r----- 1 oracle oinstall 1047293 Jun 22 2005 /u01/app/oracle/product/10.2.0/db_1/lib/libsqlplus.so

So clearly the shared object libsqlplus.so is there but note that it is only readable by the oracle user and members of oinstall group. This is a bug in the Linux release - it does seem to indicate some pretty poor quality control on the part of Oracle's release engineering team. I would have thought that this type of "correct permissions" error is part of the standard tests run immediately before a new version is OK'd for release.......Oracle, you do have such standard checks?

A quick scan through $ORACLE_HOME will show that some of the executables are not actually executable by everyone (permissions are rwxr-x---, when they should be rwxr-xr-x). If the installation is for training, experimenting etc and thus has no license, it can be easily fixed by simply doing

# chmod -R a+rX $ORACLE_HOME

The capital X gives everyone (all) executable permissions if, and only if, the owner has executable permissions.

If it is a licensed version then download patch 4516865 which will install a script called $ORACLE_HOME/install/changePerm.sh, which you obviously need to run. But this script has a bug too - it does not update $ORACLE_HOME/lib/libclntsh.so.10.1 - you can just chmod 755 this file manually. Not sure if there is another Metalink patch to do this "officially".

Wednesday, September 10, 2008

SQLPlus, RMAN etc. with Command History and Auto-Completion

Oracle's sqlplus command line interface to the database is fairly primitive and lacks two of the most often used and powerful features found in shells - a history of commands and auto-completion of keywords (upon hitting the tab key). These features can be added quite easily.

Command History
Hans Lub has written a readline wrapper called rlwrap that allows for editing of any keyboard input. RPM packages do not seem to be widely available so I built one myself - it is available here (version 0.30, built on 32 bit CentOS on 10-Sep-2008, requires readline version >= 4.2). To install, simply do

rpm -ivh rlwrap-0.30-1.i386.rpm

See the man page for details and usage examples.

Tab Auto-Completion
Johannes Gritsch has produced some extensions to rlwrap. These extensions consist of
  • The list of Oracle keywords, names of all V$ views, complete data dictionary, DBMS_* and UTL_* packages, SQL functions etc.
  • A shell script called sql+ that tells rlwrap what valid SQL word delimiters are (since rlwrap was written for the bash shell there are some differences e.g. $ and # are delimiters in bash but not in SQL).
Since the names of Oracle "variables" differ across versions there are extensions available for 9i, 10g and 11g. Download them here and install as follows (for 10g on CentOS/Redhat/Fedora):

wget http://www.linuxification.at/download/rlwrap-extensions-1.00.tar.gz
mkdir rlwrap-ext
tar xzf rlwrap-extensions-1.00.tar.gz -C rlwrap-ext
chown root:root -R rlwrap-ext
sed -e 's#/usr/local/share/rlwrap#/usr/share/rlwrap#' sql+ > sql+.tmp
mv sql+.tmp sql+
chmod 755 sql+
cp rlwrap-ext/sql+ /usr/local/bin/

cp rlwrap-ext/asmcmd /usr/share/rlwrap/
cp rlwrap-ext/rman /usr/share/rlwrap/
cp rlwrap-ext/sqlplus* /usr/share/rlwrap/

To use it, just do one of the following which then invokes rlwrap with the correct options for SQL and keyword list for tab auto-completion

sql+ username/password
sql+ # without arguments it assumes the SYS user connecting as SYSDBA

Nice and simple! Thanks Hans and Johannes - time permitting I'd like to put these two programs together into one RPM package......

Extracting a File from RPM Package

Ever wanted to see the contents of just one file inside a RPM package but didn't want to actually install the package? Then simply do the following which will extract all the files into the current directory

rpm2cpio package_name.rpm | cpio -idm

Can also a "v" option for a verbose progress listing.

Sunday, September 7, 2008

Installing Oracle 10g2 on CentOS 5.0 under VMware

This is a quick and dirty guide that has only the bare essentials of installing Oracle database 10g2 on CentOS 5.0. This was done in a virtual environment under VMWare Workstation but other than the things that are clearly VMWare specific, the installation is exactly the same as on a physical machine.

CentOS (32 bit) Installation
During the package selection do not select any of the "super-package" options (server, server with gui etc), uncheck all of these and select "Customize now". Choose the following package groups, the default selections are adequate except in the case of the Legacy Software Support group:
  • Gnome Desktop Environment
  • Editors
  • Development Libraries
  • Development Tools
  • Legacy Software Development
  • Server Configuration Tools
  • Administration Tools
  • Base
  • Java
  • Legacy Software Support, NOTE: add compat-db and openmotif (libXp will automatically be included as a dependency)
  • System Tools
  • X Window System
VMWare specific: install VMWare-Tools, run vware-config-tools.pl, run vmware-toolbox and enable time synchronization with the host.

Compulsory Configuration Options
  • Assign a static IP, do not use DHCP
  • Turn off the firewall
  • Disable SELinux
Optional Configuration Options
  • Disable IPV6 networking (instructions here)
  • Minimize services (do you really need BlueTooth facilities running?)
  • Install sysstat package (contains sar and iostat)
  • Personal customizations for bash, vim etc.
Prepare for Oracle
It is necessary to add some Oracle-specific users and groups and then change some kernel parameters. We also need to "fool" the Oracle install into believing that we are running on a certified OS by temporarily changing /etc/redhat-release. As root do this

groupadd dba
groupadd oinstall
useradd -g oinstall -G dba -s /bin/bash oracle
mkdir -p /u01/app/oracle/product/10.2.0/db_1
chown -R oracle:oinstall /u01/app/oracle
chmod -R 775 /u01/app/oracle
passwd oracle
cp /etc/redhat-release /etc/redhat-release.orig
echo "Red Hat Enterprise Linux AS release 4 (Nahant)" > /etc/redhat-release

Update /etc/sysctl.conf to include

# semaphores: semmsl, semmns, semopm, semmni
kernel.sem = 250 32000 100 128
net.ipv4.ip_local_port_range = 1024 65000
net.core.rmem_default=262144
net.core.rmem_max=262144
net.core.wmem_default=262144
net.core.wmem_max=262144

and then run sysctl -p to load these new parameters.

Set limits on the number of processes and files that can simultaneously be running or open in /etc/security/limits.conf

* soft nofile 1024
* hard nofile 65536
* soft nproc 2047
* hard nproc 16384

Update the PAM login configuration by adding this line to /etc/pam.d/login

session required pam_limits.so

Switch to the oracle user and edit ~/.bash_profile to include the following (obviously change the SID to whatever you want to name your own database)

ORACLE_BASE=/u01/app/oracle
ORACLE_HOME=/u01/app/oracle/product/10.2.0/db_1
ORACLE_SID=BRS01
export ORACLE_BASE ORACLE_HOME ORACLE_SID
PATH=/usr/sbin:$PATH; export PATH
PATH=$ORACLE_HOME/bin:$PATH; export PATH
LD_LIBRARY_PATH=\$ORACLE_HOME/lib:/lib:/usr/lib
CLASSPATH=\$ORACLE_HOME/JRE:\$ORACLE_HOME/jlib:\$ORACLE_HOME/rdbms/jlib
export LD_LIBRARY_PATH CLASSPATH

Read this file to get into the new environment by doing

source .bash_profile

and then double check the parameters by echoing them out e.g. echo $ORACLE_SID produces BRS01 in my case.

Oracle Installation
As the user oracle, download and unzip the installation files

unzip 10201_database_linux32.zip

and then change into the "database" directory and run

./runInstaller

and fill in the name of the database (same as the SID in the bash_profile file), a password for the administrators and then it is just clickety, click, click! Assuming you accept all the defaults, of course.

Post Installation
We now need to restore the file containing the OS version

mv /etc/redhat-release.orig /etc/redhat-release

If you want to use the GUI management tools (Database- or Grid-Control) you need to explicitly add a new service to the listener for this new database. If you don't you will get an ORA-12505 in the GUI management tool (TNS:listener does not currently know of SID given in connect descriptor (DBD ERROR: OCIServerAttach)) as well as the following error in $ORACLE_HOME/hostName_dbName/sysman/log/emoms.log

ORA-12514, TNS:listener does not currently know of service requested in connect descriptor
The Connection descriptor used by the client was:
(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=c5-10g2.homeunix.net)(PORT=1521)))(CONNECT_DATA=(SERV
ICE_NAME=BRS01)))

To fix this add a new service to the SID_LIST in $ORACLE_HOME/network/admin/listerner.ora that includes the new database name and ORACLE_HOME location. For example for my database called BRS01, listener.ora looks like this

SID_LIST_LISTENER =

(SID_LIST =
(SID_DESC =
(SID_NAME = PLSExtProc)
(ORACLE_HOME = /u01/app/oracle/product/10.2.0/db_1)
(PROGRAM = extproc)
)
(SID_DESC =
(SID_NAME = BRS01)
(ORACLE_HOME =
/u01/app/oracle/product/10.2.0/db_1)
)
)

LISTENER =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1))
(ADDRESS = (PROTOCOL = TCP)(HOST = c5-10g2.homeunix.net)(PORT = 1521))
)
)

Fire It Up!
The database can be started in two ways: the command-line-only approach is simply this (as the oracle user)

$ lsnrctl start
$ sqlplus sys/ as sydba
SQL> startup

The GUI approach requires that the listener still be started from the command line (lsnrctl start) but then you go can to http://dbHostName:1158/em and simply click "Startup" and then enter usernames and passwords.

And that's all there is to it!

Saturday, September 6, 2008

Tuning ext3 Filesystem

Checking the filesystem

Since ext3 is a journaled filesystem it will never be labeled as dirty even if there are real problems. We can force the filesystem checks to be run at every boot and/or at periodic intervals. During the boot process the init script scans the list of devices in /etc/fstab, and then runs fsck (unless options on the filesystem instruct it otherwise).

Examine and change the properties of a filesystem using tune2fs [options] /dev/name - here name is either

  • sdXY for disk X, partition Y
  • mdX for software raid partitions
  • the name of logical volumes under LVM.

View parameters with “-l” option, there are four parameters of interest

  1. Mount Count: how many times the filesystem has been mounted since it was last checked
  2. Maximum Mount Count: run checks at every X mounts (CentOS defaults this value to -1, never gets checked)
  3. Last Checked: Date/time the last filesystem check was run
  4. Check Interval: How often should checks be run (CentOS default = 0, never gets checked)

Change the defaults to run checks on every boot or at 7 day intervals (you may want to stagger these so that not all filesystems are being checked simultaneously).

  • Run fsck on every boot tune2fs -c 1 /dev/md0

    tune2fs -c 1 /dev/VolGroup00/LogVol00

  • Run fsck every 7 days tune2fs -i 7d /dev/md0. Note: This does not mean that fsck will be run every 7 days for you, rather it means if the system reboots and it has been more than 7 days since the filesystem was checked, then fsck will run now. If you really want the filesystem checked every 7 days then it must be done via a cron job and the filesystem should (must?) be unmounted as well....which is not really practical for a busy server :-(

    tune2fs -i 7d /dev/VolGroup00/LogVol00

Reclaiming some disk space

By default the filesystem reserves some space for root so normal users can never truly fill up a filesystem. This gives root some space to work in during times of crisis (ext3 by default reserves 5% for root). Details are provided by tune2fs

[root ~]# tune2fs -l /dev/VolGroup00/LV_Home |grep [Bb]lock\ count
Block count: 108797952
Reserved block count: 5439897

Preserving this extra 5% is necessary on / and /var but it seems to be overkill on really big filesystems. On my ~400GB /home filesystem (with a block size = 4096 bytes), the 5439897 reserved blocks are ~ 20 GB. To reserve only 1% space for root do

[root ~]# tune2fs -m1 /dev/VolGroup00/LV_Home

which will reclaim 4% (about 16GB) for normal users and leave root with 4GB of breathing room. To reserve less than 1% for root you need to calculate the number of blocks yourself and use the “-r” option.

NFS Behind a Firewall

Setting up NFS is very simple but since ports are dynamically assigned on the server this creates a major headache for firewall rules.

NFS v2 and v3 require these services to be running (use chkconfig --list to see if they are configured to start at boot time)

  • portmap - dynamically assigns ports for the NFS service
  • nfslock - allows NFS clients to create locks on the files on the server
  • nfs - the “umbrella” NFS daemon (this is not one daemon but rather several RPC processes: rpc.mountd, rpc.nfsd, rpc.statd, rpc.quotad, rpc.idmapd)

To make firewall rules we need to force NFS to use the same port numbers every time it is run. To do this, put the following into /etc/sysconfig/nfs (create this file if it doesn’t exist)

# NFS port numbers
STATD_PORT=10002
STATD_OUTGOING_PORT=10003
MOUNTD_PORT=10004
RQUOTAD_PORT=10005
LOCKD_UDPPORT=10006

Now we need to have the following ports open in the firewall

  • 111 TCP and UDP (portmapper)
  • 2049 TCP andUDP (NFS). By default CentOS clients will only use the TCP port but the Mac OS X default is to use UDP
  • 10002 - 10006 TCP and UDP. These are the static port numbers that NFS will now use every time it starts up

Aside: If performance from MAC OS X clients is slow when configured to use TCP, try setting the kernel paramater net.inet.tcp.delayed_ack equal to zero on the Mac clients (/usr/sbin/sysctl -w net.inet.tcp.delayed_ack=0).

Running CentOS Headless

CentOS on a Dell server will behave just fine if you simply unplug the keyboard and monitor but there are times when you really want to see those messages normally sent to the console. Note, some “desktop” machines will completely refuse to boot without a keyboard while others have a BIOS setting that determines the outcome when no keyboard is found. For a server it is unnecessary to have a keyboard and monitor but during times of troubleshooting it can be useful to see the boot messages. These messages can be directed to the serial port and read (via a null modem cable) in a terminal emulator program (minicom on Linux, TeraTerm Pro on Windows). In fact you can even see the grub menu and run a login shell over the serial port.

The setup in CentOS is as follows

  1. Add
    S0:12345:respawn:/sbin/agetty ttyS0 9600 linux

    to /etc/initab. This tells the kernel to a run login shell with the program agetty on ttyS0 (COM1 in Windows language) at a baud rate of 9600 in run levels 1-5.
  2. To allow root to directly login on the serial port add ttyS0 to /etc/securetty (echo "ttyS0" >> /etc/securetty)
  3. Edit /etc/grub/grub.conf - tell grub to use serial port at a certain baud rate and to direct the menu to the serial port instead of the console and also tell the kernel to direct messages to the serial port as well as the console. Also comment out the splashimage line since these graphics will not work over a serial port. Here is a sample grub.conf

    default=0
    timeout=10
    #splashimage=(hd0,0)/grub/splash.xpm.gz
    serial --unit=0 --speed=9600
    terminal --timeout=2 serial console
    hiddenmenu
    title CentOS (2.6.18-8.1.6.el5)
    root (hd0,0)
    kernel /vmlinuz-2.6.18-8.1.6.el5 ro root=/dev/VolGroup00/LogVol00 rhgb quiet console=tty1 console=ttyS0,9600
    initrd /initrd-2.6.18-8.1.6.el5.img

To use higher baud rates simply change the 3 instances of “9600″ to your desired speed (two instances in grub.conf and one in /etc/inittab).

Setup the terminal emulator with these parameters (shown here with a baud rate of 57600)

PXE Booting and Installing CentOS Linux

This describes a very simple PXE boot setup on CentOS Linux. If machine A has no CD or DVD drive how can we install an operating system? Well, have machine B store a separate kernel and an installation repository (the install CD/DVD) then tell machine A to boot from this kernel on B and install an OS using the install data on B. This setup is “simple” in the sense that it will not have extensive options and/or menus at boot time - it is intended for installations and/or recovery of one version of the OS only. PXELinux has many options for menu configuration, for e.g. you could offer CentOS version 4.4 or 5.0 or even different Linux distributions. This was done using CentOS 5.0.

Setup on the client (machine A) is very simple and is done via the BIOS menus - just enable PXE booting on your network card and then put the network card first in boot priority order listing.

On the server (machine B) you will need the following (this is the order in which the client talks to the various server daemons)

  1. DHCP server to assign an IP address to the client, tell it that a TFTP server exists and where to find it.
  2. TFTP server to provide a kernel (and options) to the client so that it can boot.
  3. HTPP, FTP or NFS server to offer up the installation repository.

DHCP Server

  1. Install the DHCP server (package name is simply “dhcp”).
  2. Edit /etc/dhcpd.conf to allow network booting and tell clients where they can find the TFTP server (the next-server option) and the name of the file it should load

    ddns-update-style interim;
    ignore client-updates;
    allow booting;
    allow bootp;
    subnet 192.168.0.0 netmask 255.255.255.0 {
    # --- default gateway
    option routers 192.168.0.1;
    option subnet-mask 255.255.255.0;
    option domain-name "yourDomainName.com";
    option domain-name-servers 192.168.0.1;
    range dynamic-bootp 192.168.0.128 192.168.0.254;
    default-lease-time 21600;
    max-lease-time 43200;
    next-server 192.168.0.8;
    filename "/pxelinux.0";
    }
  3. Open port UDP 67 in the firewall and restart the DHCP server (service dhcpd restart)

TFTP Server

  1. Install the TFTP server (package name = tftp-server).
  2. The TFTP server is a “on-demand” network service and is thus managed by xinetd. Set “disable=no” in /etc/xinetd.d/tftp and restart xinetd (service xinted restart)
  3. Install syslinux and copy pxelinux.0 to the TFTP server’s root

    cp /usr/lib/syslinux/pxelinux.0 /tftpboot
  4. From the installation sources (on disk #1 one if you are using CDs) copy the kernel and ramdisk image to the TFTP server’s root

    cp location/of/installation/disks/images/pxeboot/vmlinuz /tftpboot
    cp location/of/installation/disks/images/pxeboot/initrd.img /tftpboot
  5. Make a directory to hold the PXE boot configuration files (mkdir /tftpboot/pxelinux.cfg)
  6. The name of the configuration file is tricky and depends on your local setup - the tftp client will look for config files (in this order) that are named after the NIC’s GUID, the NICs MAC address, a hexadecimal representation of the client’s IP address, truncated versions of the hexadecimal IP address (details here). This is a problem - often you don’t know the client’s GUID or MAC address until you can boot it but you can’t boot it because the TFTP server configuration is not complete. I suggest using WireShark or tcpdump on the server to capture requests from the client and thus learn it’s MAC address. Once you know the MAC address, create a configuration file whose name is “01″ concatenated with the MAC address (use dashes to replace colons). For example, my MAC address is 00:13:72:0d:ee:f1 and my config file is named /tftpboot/pxelinux.cfg/01-00-13-72-0d-ee-f1. Older versions (<3.20)>
  7. Edit this config file to tell the client where to find a kernel (and options to the kernel)

    echo "DEFAULT vmlinuz initrd=initrd.img ramdisk_size=100000" >> /tftpboot/pxelinux.cfg/name-of-config-file
  8. Open port UDP 69 in the firewall.

Installation Repository on FTP Server

  1. Install vsftpd, open port TCP 21 in the firewall and start the service (default settings will allow anonymous ftp so there is no need to configure anything else)
  2. Mount or copy the installion sources into the public FTP directory (/var/ftp/pub). If using ISO images use the “-o loop” option to the mount command mount -o loop /location/installation/disks/CentOS5_dvd.iso /var/ftp/pub/

Doing the Installtion

Now boot up the client and everything should work perfectly ;-) Use WireShark or tcpdump on the server the first time to help debug any network/daemon issues. During the install on the client select the “FTP” option when asked for the source of installation the files, enter the name or IP address of the FTP server and the directory of the installation sources (simply /pub in this example).

Is Your Linux Software RAID Really Recoverable?

After installing a system with software RAID1 (mirroring) some additional setup is required to ensure that you can actually recover from a disk failure. For CentOS, Fedora and other RH-like distros the Red Hat Enterprise Linux SysAdmin Guide has good instructions on setting up software raid. For other distros, follow their specific installation guides.

Post Install Setup

Make backups of disk partions from all disks, members of the RAID array and file system mount points. Here we have two SATA disks /dev/sda and /dev/sdb.

# mkdir /root/raidinfo
# sfdisk -d /dev/sda > /root/raidinfo/partitions.sda
# sfdisk -d /dev/sdb > /root/raidinfo/partitions.sdb
# cat /proc/mdstat > /root/raidinfo/mdstat.orig
# cat /etc/fstab > /root/raidinfo/fstab.orig

The GRUB boot loader is only installed on one disk, by default this is the first disk the system finds and is always labeled /dev/sda. If this disk fails the system will be unbootable - we need to install GRUB on all the other disks in the array (only /dev/sdb in this example). The following set of commands will install GRUB into the MBR of /dev/sdb

# grub
grub> device (hd0) /dev/sdb
grub> root (hd0,0)
Filesystem type is ext2fs, partition type 0xfd
grub> setup (hd0)
Checking if "/boot/grub/stage1" exists... no
Checking if "/grub/stage1" exists... yes
Checking if "/grub/stage2" exists... yes
Checking if "/grub/e2fs_stage1_5" exists... yes
Running "embed /grub/e2fs_stage1_5 (hd0)"... 16 sectors are embedded. succeeded
Running "install /grub/stage1 (hd0) (hd0)1+16 p (hd0,0)/grub/stage2 /grub/grub .conf"... succeeded
Done.
grub>quit

NOTE: if the version of GRUB is ever updated, only the MBR of /dev/sda will be updated automatically. You will need to reinstall GRUB to the MBR for all other drives in the array. If the kernel is updated no additional changes are required other than updating /boot/grub/grub.conf to point to the new kernel - this happens automatically on CentOS.

Testing The Setup Before Disk Failure

Immediately after installing the OS you should test the software raid setup to verify that the machine is bootable from any disk, the automatic synch’ing is working as expected and most importantly, you know what you are doing so that in a real emergency you do not lose any valuable data. The easiest way to accomplish these goals is to

  • shutdown and disconnect one disk drive
  • restart the machine (do you get the GRUB menu and boot succesfully?)
  • do a few things…..edit some text files, download some stuff, whatever
  • reconnect the drive and verify that the automatic synchronization starts and then finishes without issue
  • repeat all the above for each disk

Lets walk through an example where we remove /dev/sda from the system. After rebooting without this drive take a look at /proc/mdstat. This can be confusing: we have removed /dev/sda from the system but your boot logs and mdstat will say you have /dev/sda installed. This is because the OS labels the first disk it finds as /dev/sda but this is physically the original /dev/sdb, confused yet ;-)

# cat /proc/mdstat
....
md3 : active raid1 sda5[1]
307347392 blocks [2/1] [_U]
.....

For each of the md devices (we’ll focus on md3) it is saying that the md3 device is active as RAID1 and sda5 is its only member (this sda5 was sdb5 before removing the original sda disk). The [2/1] indicates that two members should be contained in the md3 device, but current only one is available i.e. sda5 (the original sdb5). The [_U] is indicating that the first member is not available but the second one is (the “U”). Here the first and second members refer to the original sda5 and sdb5, respectively.

Now reconnect the original /dev/sda drive and look at mdstat

# cat /proc/mdstat
....
md3 : active raid1 sdb5[1]
307347392 blocks [2/1] [_U]
.....

The only member of md3 is now shown as sdb5, this is physically the same disk that showed up as sda5 when the system had only one disk. Mmm…..md3 has only one member, we will need to manually add the original sda5 to md3 to recreate the RAID array

# mdadm -a /dev/md3 /dev/sda5

Once this is done the OS will start synchronizing sda5 with sdb5. Once again, mdstat is the source of information and will allow you to monitor the progress of the rebuild

....
md3 : active raid1 sda5[2] sdb5[1]
307347392 blocks [2/1] [_U]
[===============>.....] recovery = 77.4% (238065664/307347392) finish=25.6min speed=44977K/sec
....

Note now that md3 now has two members (sda5 and sdb5) and that the offline member (sda5) is being recovered. When the rebuild is complete the [2/1] [_U] will become [2/2] [UU]. Repeat this for the other md devices until they are all rebuilt and successfully resynch’ed.

Now do exactly the same for the other disk drive(s): remove the disk, reboot, reinstall the disk and rebuild the RAID arrays

Recovery After Disk Failure

In the event of a disk failure there are three things you need to do

  • Install a new disk….d’oh!
  • Partition the new disk in exactly the same way as the old dead one
  • Add RAID partitions back into the /dev/mdx devices

Assume /dev/sda has died. To add the correct partitions to the new disk we will use the backups of the partition tables we made above

# sfdisk /dev/sda < /root/raidinfo/partitions.sda

If the new disk is bigger than the old disk you will be left with additional free space on this drive - this will not cause any problems.

The second step is to add these partitions back into the md devices. To do this correctly you need to know which partition resided inside which md device. The answer lies in the backup of the mdstat file

# cat /root/raidinfo/mdstat.orig
Personalities : [raid1]
md1 : active raid1 sdb2[1] sda2[0]
4096448 blocks [2/2] [UU]
......

This snippet shows that sda2 was a member of md1. Now add sda2 back into md1

# mdadm -a /dev/md1 /dev/sda2

Immediately after adding a partition to the array, it will automagically begin synching /dev/sda2 with /dev/sdb2 (the other member of md1). You can monitor the progress with

# watch -n 30 cat /proc/mdstat

In this example the information will be updated every 30 seconds - do not set the time interval for updates too small, this will slow down the synching process. Repeat this for all partitions on the new disk. You can add all the partitions simultaneously, however only one md device will be synchronized at a time. Be patient - 300GB on a SATA II (3 Gbps) connection takes about 80 minutes (an average of ~60 MB/sec, consistent with the sustained transfer rate quoted by the WD specs for these drives).

Disabling IPV6 Networking on Redhat, CentOS and Fedora

Even though many OS's can now do IPv6 networking, most organisations don't yet have IPv6 networks so having this capability enabled in the OS (which it is by default on Redhat) can just be an annoyance. To completely disable IPv6 in your system do the following

1. echo "alias net-pf-10 off" >> /etc/modprobe.conf
2. Set NETWORKING_IPV6=no in /etc/sysconfig/network
3. For each interface (except loopback) set IPV6INIT=no in each configuration file. The configuration file for the eth0 interface is /etc/sysconfig/network-scripts/ifcfg-eth0. Others are similarly named.
4. Stop and permanently disable the IPv6 firewall/iptables
# service ip6tables stop
# chkconfig --level 12345 ip6tables off

Frustrations of CentOS 5.0 under VMWare on Vista

  • The goal: install CentOS 5.0 under VMWare Workstation running on Windows Vista (64 bit). I've done it before, multiple times, without a hitch in about 20 minutes.
  • The reality: well, there was more than a few hitches and it took an entire afternoon.
  • The problem: after install and upon reboot the GDM (Gnome Display Manager) GUI does not come up - you are left with a command line prompt. Manually trying startx at the command line gave an error saying X Windows was already running :-(
Ultimately this CentOS (virtual) machine will run an Oracle database so the GUI is not needed but it is nice to use in the beginning while configuring the OS (shutting down unnecessary services etc.) and installing Oracle. The plan was to configure the OS, install Oracle and then never run the GUI again after making runlevel 3 the new default. One hour - max, that's all it should take. Hah!!

I've done this before (in February 2008) so what is different now? CentOS 5.0 is exactly the same, I'm using the same ISO image as in February. However, both Vista and VMWare Workstation are in a different state - Vista is now at SP1 with numerous other updates and VMWare is at version 6.05 (vs. 6.01 in February). After several failed installs I became convinced it had something to do with SELinux - each time I disabled SELinux during the first-time boot procedure, GDM would not launch. If I left SELinux enabled during the first boot then disabled it later, I had varying degrees of success. I could not find any solid evidence to conclusively prove this hypothesis, but this is the procedure that eventually worked:
  • Install (from ISO on networked drive) and leave the firewall and SELinux enabled
  • Reboot - got no GDM GUI
  • Reboot again (out of frustation) and what do you know, there it was - the wonderful GDM GUI
  • Install VMWare-Tools
  • Shutdown, point the CD to drive D: (not the ISO file used for install)
  • Restart then minimized services and disabled IPV6
  • Reboot and disable SELinux (via the GUI or /etc/selinux/config)
  • Reboot and disable the firewall
  • Finally! We are in the state we want and the GDM GUI is coming up after restarting!
What a royal bloody pain!