Monday, July 11, 2011

BASH Shell Scripting to generate confluence code that gets you bar and pie charts on your wiki

At my work I often use confluence to update our wiki. Just last week I wanted to change the way bar charts look on our wiki. Confluence has simple mechanism of obtaining bar and pie charts. These charts indicate how our clusters are being used by the user community in our university. To generate the numbers, first need to run some script which creates some files with all the required numbers from some database. For example, to create a bar chart shown below I need to put in this code in my confluence. If you go to this page, you can see how these charts look on wiki: https://wikis.nyu.edu/display/NYUHPC/Historical+Usage+Reports

Confluence code for bar chart

{chart:title=CPU Time in Hours|type=bar|width=900}
|| CLUSTER || January || February || March || April || May || June ||
|| USQ | 380033 | 291235 | 356897 | 343902 | 375666 | 445860 ||
|| BOWERY | 572622 | 272514 | 331905 | 355627 | 301544 | 309076 ||
|| CARDIAC | 716065 | 359153 | 329973 | 723314 | 279251 | 617893 ||
|| CUDA | 1 | 0 | 1 | 251 | 54 | 0 ||
{chart}
 

Confluence code for pie chart

{chart:title=Number of Jobs on BOWERY|width=450}
|| CLUSTER || January || February || March || April || May || June ||
|| BOWERY | 5,571 | 2,722 | 3,445 | 3,670 | 3,800 | 5,909 ||
{chart}

Simple. Right? Not really. Because, I need to do this on the first of every month. Because of whatever reasons I end up doing this some where in the middle of the month. Which is not cool. So, I thought of setting a cron job. I wanted this cron job to deliver me an email with the stuff I need to put on confluence to generate the charts just like the one shown here.

Generally, I need to log on to a specific machine and then run a command to generate the numbers and a file with the same numbers. Let me show you what I do here:
[manchu@hpc-metrics ~]$ /usr/local/bin/metrics-analysis.py monthly
To retrieve a monthly report for the previous month.
Getting HPC Metrics Statistics.
For the jobs ended on and after 2011-06-01 before 2011-07-01.
Getting the total counts from usq...
Getting the total counts from bowery...
Getting the total counts from cardiac...
Getting the total counts from cuda...
Getting the total counts from usq...
Getting the total counts from bowery...
Getting the total counts from cardiac...
Getting the total counts from cuda...

**************** Starting the sum task 1: grouped by ALL *************
Getting the total sum from usq...
Getting the total sum from bowery...
Getting the total sum from cardiac...
Getting the total sum from cuda...
----------------------
Summary of this period, sorted by ALL
For jobs ended on and after 2011-06-01 and before 2011-07-01
                               Name            usq         bowery        cardiac           cuda            All
                        Jobs number         72,671          5,909         32,444            107        111,131
                        User number             76             36              9              1             94
                        CPU time(h)        445,860        309,076        617,893              0      1,372,829
                       Wall time(h)      4,279,233        115,571        793,340             83      5,188,227
                       Used time(h)        381,261         39,053        177,488              3        597,805
                Requested CPU cores        108,591        115,402        183,247            830        408,070
                Avg. used CPU cores           1.17           7.91           3.48           0.00
Where,
Avg. used CPU cores: the average CPU resource consumed by a job, which is "CPU time/Used time"


The results were also stored in the file of hpc_usage_2011-06-01_2011-06-30.txt


[manchu@hpc-metrics ~]$ vi hpc_usage_2011-06-01_2011-06-30.txt
1 HPC Usage Summary
  2 
  3 **************** Starting the sum task 1: grouped by ALL *************
  4 Summary of this period, sorted by ALL
  5 For jobs ended on and after 2011-06-01 and before 2011-07-01
  6                                Name            usq         bowery        cardiac           cuda            All
  7                         Jobs number         72,671          5,909         32,444            107        111,131
  8                         User number             76             36              9              1             94
  9                         CPU time(h)        445,860        309,076        617,893              0      1,372,829
 10                        Wall time(h)      4,279,233        115,571        793,340             83      5,188,227
 11                        Used time(h)        381,261         39,053        177,488              3        597,805
 12                 Requested CPU cores        108,591        115,402        183,247            830        408,070
 13                 Avg. used CPU cores           1.17           7.91           3.48           0.00
 14 Where,
 15 Avg. used CPU cores: the average CPU resource consumed by a job, which is "CPU time/Used time"
 16 
 17 

I need to get the numbers from line 7 to 13 and put them together in specific format required to generate bar and pie charts on wiki using confluence wiki tool. Like I said, I could do it by cutting and pasting the numbers on to some excel file and then copy and paste them from there to confluence. Believe me it's painful to do that. So I've decided to write a script that would do all this work and then cron job would deliver all the results to me in the email. I just copy the entire things from the email and put them on confluence. That's it. It's just not even a minute job this way. More over cron job would remind me that I need to do this on the first of every month. Cool. Ha!

Crontab

My cron job runs the script at 00:05 on 1st of each month. The line you need to put in your cron tab is:
[manchu@hpc-metrics ~]$ crontab -e
5 0 1 * * /home/manchu/hpc-metrics-cronjob.sh > /dev/null 2>&1
crontab: installing new crontab
[manchu@hpc-metrics ~]$

Cron Job Script

Here is my cron job script:
[manchu@hpc-metrics ~]$ more hpc-metrics-cronjob.sh 
#!/bin/bash

# This is a cron job script for delivering the hpc metrics at the beginning of each month. Written by Sreedhar Manchu.
/home/manchu/metrics.sh | mail -s "HPC Metrics Details for `date -d\"1 month ago\" +%B`" my_email@domain.com
[manchu@hpc-metrics ~]$ 

Shell Script to generate the required code for confluence

Here is the script I wrote to generate the confluence code:
[manchu@hpc-metrics ~]$ more metrics.sh 
#!/bin/bash

/usr/local/bin/metrics-analysis.py monthly
echo "----------------------------------------------------------------------------------------"
echo
echo "###################################### EXCEL VALUES ###################################"
echo
echo "----------------------------------------------------------------------------------------"
echo

filename=hpc_usage_`date -d "last month" "+%F"`_`date -d "yesterday" "+%F"`.txt

for ((i=-5; i<=-2; i++))
do
 echo -n "| *`date -d "last month" "+%b %Y"`* |"
 for j in {7..12}
 do 
  echo -n "`head -$j $filename | tail -1 | perl -lane 'print " $F['$i'] |"'`"
 done
 echo -n "`head -13 $filename | tail -1 | perl -lane 'print " $F['$(($i+1))'] |"'`"
 echo
done


months=`date -d "last month" "+%-m"`
clusters=4
categories=7
for ((i=0; i<$months; i++))
do
filename=hpc_usage_`date -d "$(($months-$i)) months ago" "+%F"`_`date -d "yesterday $(($months-$i-1)) months ago" "+%F"`.txt
 for ((j=0; j<$clusters; j++))
 do
  for ((k=0; k<$categories; k++))
                do
   if [ $k -ne 6 ]
   then
    value[$(($i+$j*$months+$clusters*$months*$k))]="`head -$(($k+7)) $filename | tail -1 | perl -lane 'print "$F['$((
-5+$j))']"' | sed 's/,//g'`" 
   else
    value[$(($i+$j*$months+$clusters*$months*$k))]="`head -13 $filename | tail -1 | perl -lane 'print "$F['$((-4+$j))
']"' | sed 's/,//g'`" 
   fi
  done
 done
done

title_string=("Number of Jobs" "Number of Users" "CPU Time in Hours" "Walltime in Hours" "Used Time in Hours" "Total Requested CPU Cores" "Avg. C
PU Cores Used Per Job")
cluster=("USQ" "BOWERY" "CARDIAC" "CUDA")

echo
echo "----------------------------------------------------------------------------------------"
echo
echo "####################################### BAR CHARTS ####################################"
echo
echo "----------------------------------------------------------------------------------------"
echo
echo "&nbsp;"
for ((k=0; k<$categories; k++))
do
 echo "{chart:title=${title_string[$k]}|type=bar|width=900}"
 echo -n "|| CLUSTER ||"
 for ((n=0; n<$months; n++))
 do
  echo -n " `date -d "$(($months-$n)) month ago" "+%B"` ||"
 done
 echo
 for ((j=0; j<$clusters; j++))
 do
  echo -n "|| ${cluster[$j]} |"
  for ((i=0; i<$months; i++))
  do
   echo -n " ${value[$(($k*$clusters*$months+$j*$months+$i))]} |"
  done
  echo "|"
 done
 echo "{chart}"
 echo "&nbsp;"
 echo
done
echo "----------------------------------------------------------------------------------------"
echo

input_start_date=`date -d "$months months ago" "+%F"`
input_end_date="`date "+%F"`"

/usr/local/bin/metrics-analysis.py << EOF
$input_start_date
$input_end_date
EOF

filename="hpc_usage_`date -d "$months months ago" "+%F"`_`date -d "yesterday" "+%F"`.txt"
for ((j=0; j<$clusters; j++))
do
 for ((k=0; k<$categories; k++))
 do
  if [ $k -ne 6 ]
  then
   value[$(($k*$clusters+$j))]="`head -$(($k+7)) $filename | tail -1 | perl -lane 'print "$F['$((-5+$j))']"' | sed 's/,//g'`
" 
  else
   value[$(($k*$clusters+$j))]="`head -13 $filename | tail -1 | perl -lane 'print "$F['$((-4+$j))']"' | sed 's/,//g'`" 
  fi
 done
done
echo "----------------------------------------------------------------------------------------"
echo
echo "####################################### PIE CHARTS ####################################"
echo
echo "----------------------------------------------------------------------------------------"
echo
echo "&nbsp;"
for ((k=0; k<$categories; k++))
do
        echo "{chart:title=${title_string[$k]} over the last `date -d"last month" "+%-m"` months|width=450}" 
        echo -n "|| CLUSTER ||"
        for ((j=0; j<=3; j++))
        do
                echo -n " ${cluster[$j]} ||"
        done
        echo 
        echo -n "|| category |"
        for ((j=0; j<$clusters; j++))
        do
  echo -n " ${value[$(($k*$clusters+$j))]} |"
        done
        echo -n "|"
        echo
        echo "{chart}"
        echo "&nbsp;"
        echo
done
echo "----------------------------------------------------------------------------------------"
echo


[manchu@hpc-metrics ~]$ 

Output I get when I run the script

Here is the output I get when I run this script:
[manchu@hpc-metrics ~]$ ./metrics.sh 
To retrieve a monthly report for the previous month.
Getting HPC Metrics Statistics.
For the jobs ended on and after 2011-06-01 before 2011-07-01.
Getting the total counts from usq...
Getting the total counts from bowery...
Getting the total counts from cardiac...
Getting the total counts from cuda...
Getting the total counts from usq...
Getting the total counts from bowery...
Getting the total counts from cardiac...
Getting the total counts from cuda...

**************** Starting the sum task 1: grouped by ALL *************
Getting the total sum from usq...
Getting the total sum from bowery...
Getting the total sum from cardiac...
Getting the total sum from cuda...
----------------------
Summary of this period, sorted by ALL
For jobs ended on and after 2011-06-01 and before 2011-07-01
                               Name            usq         bowery        cardiac           cuda            All
                        Jobs number         72,671          5,909         32,444            107        111,131
                        User number             76             36              9              1             94
                        CPU time(h)        445,860        309,076        617,893              0      1,372,829
                       Wall time(h)      4,279,233        115,571        793,340             83      5,188,227
                       Used time(h)        381,261         39,053        177,488              3        597,805
                Requested CPU cores        108,591        115,402        183,247            830        408,070
                Avg. used CPU cores           1.17           7.91           3.48           0.00
Where,
Avg. used CPU cores: the average CPU resource consumed by a job, which is "CPU time/Used time"


The results were also stored in the file of hpc_usage_2011-06-01_2011-06-30.txt


----------------------------------------------------------------------------------------

###################################### EXCEL VALUES ###################################

----------------------------------------------------------------------------------------

| *Jun 2011* | 72,671 | 76 | 445,860 | 4,279,233 | 381,261 | 108,591 | 1.17 |
| *Jun 2011* | 5,909 | 36 | 309,076 | 115,571 | 39,053 | 115,402 | 7.91 |
| *Jun 2011* | 32,444 | 9 | 617,893 | 793,340 | 177,488 | 183,247 | 3.48 |
| *Jun 2011* | 107 | 1 | 0 | 83 | 3 | 830 | 0.00 |

----------------------------------------------------------------------------------------

####################################### BAR CHARTS ####################################

----------------------------------------------------------------------------------------

&nbsp;
{chart:title=Number of Jobs|type=bar|width=900}
|| CLUSTER || January || February || March || April || May || June ||
|| USQ | 205227 | 341853 | 433496 | 494724 | 323675 | 72671 ||
|| BOWERY | 5571 | 2722 | 3445 | 3670 | 3800 | 5909 ||
|| CARDIAC | 42578 | 98160 | 27774 | 32225 | 53133 | 32444 ||
|| CUDA | 29 | 1 | 114 | 218 | 131 | 107 ||
{chart}
&nbsp;

{chart:title=Number of Users|type=bar|width=900}
|| CLUSTER || January || February || March || April || May || June ||
|| USQ | 70 | 87 | 96 | 89 | 85 | 76 ||
|| BOWERY | 27 | 32 | 33 | 35 | 38 | 36 ||
|| CARDIAC | 10 | 9 | 10 | 7 | 12 | 9 ||
|| CUDA | 4 | 1 | 4 | 6 | 3 | 1 ||
{chart}
&nbsp;

{chart:title=CPU Time in Hours|type=bar|width=900}
|| CLUSTER || January || February || March || April || May || June ||
|| USQ | 380033 | 291235 | 356897 | 343902 | 375666 | 445860 ||
|| BOWERY | 572622 | 272514 | 331905 | 355627 | 301544 | 309076 ||
|| CARDIAC | 716065 | 359153 | 329973 | 723314 | 279251 | 617893 ||
|| CUDA | 1 | 0 | 1 | 251 | 54 | 0 ||
{chart}
&nbsp;

{chart:title=Walltime in Hours|type=bar|width=900}
|| CLUSTER || January || February || March || April || May || June ||
|| USQ | 5245803 | 13446775 | 20903317 | 20559816 | 40612407 | 4279233 ||
|| BOWERY | 80722 | 43649 | 60817 | 63988 | 67950 | 115571 ||
|| CARDIAC | 284755 | 494757 | 320250 | 78273 | 1067320 | 793340 ||
|| CUDA | 116 | 4 | 224 | 996 | 158 | 83 ||
{chart}
&nbsp;

{chart:title=Used Time in Hours|type=bar|width=900}
|| CLUSTER || January || February || March || April || May || June ||
|| USQ | 291717 | 214256 | 282014 | 295398 | 312894 | 381261 ||
|| BOWERY | 30980 | 21816 | 29132 | 38124 | 36421 | 39053 ||
|| CARDIAC | 153014 | 150655 | 117527 | 23588 | 256442 | 177488 ||
|| CUDA | 71 | 0 | 21 | 300 | 91 | 3 ||
{chart}
&nbsp;

{chart:title=Total Requested CPU Cores|type=bar|width=900}
|| CLUSTER || January || February || March || April || May || June ||
|| USQ | 428382 | 780907 | 838015 | 1357449 | 359100 | 108591 ||
|| BOWERY | 175895 | 103060 | 97843 | 107121 | 96763 | 115402 ||
|| CARDIAC | 151891 | 277135 | 66307 | 87260 | 91421 | 183247 ||
|| CUDA | 92 | 4 | 307 | 1623 | 1019 | 830 ||
{chart}
&nbsp;

{chart:title=Avg. CPU Cores Used Per Job|type=bar|width=900}
|| CLUSTER || January || February || March || April || May || June ||
|| USQ | 1.30 | 1.36 | 1.27 | 1.16 | 1.20 | 1.17 ||
|| BOWERY | 18.48 | 12.49 | 11.39 | 9.33 | 8.28 | 7.91 ||
|| CARDIAC | 4.68 | 2.38 | 2.81 | 30.66 | 1.09 | 3.48 ||
|| CUDA | 0.01 | 0.02 | 0.04 | 0.84 | 0.59 | 0.00 ||
{chart}
&nbsp;

----------------------------------------------------------------------------------------

To analyze the hpc-metrics database
Please input the start date, such as 2010-11-01: Please input the day after the end date, such as 2010-12-01: Getting HPC Metrics Statistics.
For the jobs ended on and after 2011-01-01 before 2011-07-01.
Getting the total counts from usq...
Getting the total counts from bowery...
Getting the total counts from cardiac...
Getting the total counts from cuda...
Getting the total counts from usq...
Getting the total counts from bowery...
Getting the total counts from cardiac...
Getting the total counts from cuda...

**************** Starting the sum task 1: grouped by ALL *************
Getting the total sum from usq...
Getting the total sum from bowery...
Getting the total sum from cardiac...
Getting the total sum from cuda...
----------------------
Summary of this period, sorted by ALL
For jobs ended on and after 2011-01-01 and before 2011-07-01
                               Name            usq         bowery        cardiac           cuda            All
                        Jobs number      1,871,646         25,117        286,314            600      2,183,677
                        User number            153             66             22             11            172
                        CPU time(h)      2,193,594      2,143,289      3,025,648            307      7,362,837
                       Wall time(h)    105,047,351        432,696      3,038,695          1,580    108,520,322
                       Used time(h)      1,777,541        195,527        878,715            486      2,852,269
                Requested CPU cores      3,872,444        696,084        857,261          3,875      5,429,664
                Avg. used CPU cores           1.23          10.96           3.44           0.63
Where,
Avg. used CPU cores: the average CPU resource consumed by a job, which is "CPU time/Used time"


The results were also stored in the file of hpc_usage_2011-01-01_2011-06-30.txt


----------------------------------------------------------------------------------------

####################################### PIE CHARTS ####################################

----------------------------------------------------------------------------------------

&nbsp;
{chart:title=Number of Jobs over the last 6 months|width=450}
|| CLUSTER || USQ || BOWERY || CARDIAC || CUDA ||
|| category | 1871646 | 25117 | 286314 | 600 ||
{chart}
&nbsp;

{chart:title=Number of Users over the last 6 months|width=450}
|| CLUSTER || USQ || BOWERY || CARDIAC || CUDA ||
|| category | 153 | 66 | 22 | 11 ||
{chart}
&nbsp;

{chart:title=CPU Time in Hours over the last 6 months|width=450}
|| CLUSTER || USQ || BOWERY || CARDIAC || CUDA ||
|| category | 2193594 | 2143289 | 3025648 | 307 ||
{chart}
&nbsp;

{chart:title=Walltime in Hours over the last 6 months|width=450}
|| CLUSTER || USQ || BOWERY || CARDIAC || CUDA ||
|| category | 105047351 | 432696 | 3038695 | 1580 ||
{chart}
&nbsp;

{chart:title=Used Time in Hours over the last 6 months|width=450}
|| CLUSTER || USQ || BOWERY || CARDIAC || CUDA ||
|| category | 1777541 | 195527 | 878715 | 486 ||
{chart}
&nbsp;

{chart:title=Total Requested CPU Cores over the last 6 months|width=450}
|| CLUSTER || USQ || BOWERY || CARDIAC || CUDA ||
|| category | 3872444 | 696084 | 857261 | 3875 ||
{chart}
&nbsp;

{chart:title=Avg. CPU Cores Used Per Job over the last 6 months|width=450}
|| CLUSTER || USQ || BOWERY || CARDIAC || CUDA ||
|| category | 1.23 | 10.96 | 3.44 | 0.63 ||
{chart}
&nbsp;

----------------------------------------------------------------------------------------

[manchu@hpc-metrics ~]$

Saturday, July 9, 2011

Berkely Lab Checkpoint/Restart (BLCR) Installation and Run Procedure

What is BLCR?

BLCR (Berkeley Lab Checkpoint/Restart) allows programs running on Linux to be "checkpointed" (written entirely to a file), and then later "restarted". BLCR can be found at http://ftg.lbl.gov/checkpoint.

Web Links

https://ftg.lbl.gov/projects/CheckpointRestart/
https://ftg.lbl.gov/CheckpointRestart/CheckpointDownloads.shtml
https://ftg.lbl.gov/CheckpointRestart/downloads/blcr-0.8.2.tar.gz
https://ftg.lbl.gov/CheckpointRestart/downloads/blcr-0.8.2-1.src.rpm
https://upc-bugs.lbl.gov//blcr/doc/html/BLCR_Admin_Guide.html
https://upc-bugs.lbl.gov//blcr/doc/html/BLCR_Users_Guide.html
https://upc-bugs.lbl.gov//blcr/doc/html/FAQ.html

Installation Procedure

# cd Desktop
# wget https://ftg.lbl.gov/CheckpointRestart/downloads/blcr-0.8.2.tar.gz
# tar xzvf blcr-0.8.2.tar.gz
# cd blcr-0.8.2
# mkdir builddir
# cd builddir/
# ../configure --with-linux=/usr/src/kernels/2.6.18-128.el5-x86_64/ --with-system-map=/boot/System.map-2.6.18-128.1.6.el5_lustre.1.8.0.1smp --with-vmlinux=/boot/vmlinuz-2.6.18-128.1.6.el5_lustre.1.8.0.1smp --enable-multilib --enable-testsuite --enable-init-script

*******************************************************************
***** WARNING WARNING WARNING WARNING WARNING WARNING WARNING *****
*******************************************************************
* The kernel source does not match currently the running kernel.  *
* Compilation will produce modules unsuitable for the currently   *
* running kernel, which may not be what you intended.             *
*******************************************************************
***** WARNING WARNING WARNING WARNING WARNING WARNING WARNING *****
*******************************************************************
======================================================================
Please review the following configuration information:
  Kernel source directory = /usr/src/kernels/2.6.18-128.el5-x86_64/
  Kernel build directory = /usr/src/kernels/2.6.18-128.el5-x86_64/
  Kernel symbol table = /boot/System.map-2.6.18-128.1.6.el5_lustre.1.8.0.1smp/boot/vmlinuz-2.6.18-128.1.6.el5_lustre.1.8.0.1smp
  Kernel version probed from kernel build = 2.6.18-128.el5
  Kernel running currently = 2.6.18-128.1.6.el5_lustre.1.8.0.1smp
======================================================================

Warning: Proceeding with this warning would lead to the installation failure.

This can be fixed with the following procedure. BLCR needs to be able to examine a linux kernel source tree that has been configured, and this configuration must match the kernel that you will run BLCR against. If you do not have a configured linux kernel source tree, you may be able to create one fairly easily. Many distributions provide a 'config' file that is all you need to easily produce a configured kernel source tree.

# uname -r
2.6.18-128.1.6.el5_lustre.1.8.0.1smp
 
# cp -a /usr/src/linux-2.6.18-128.1.6.el5_lustre.1.8.0.1/ /tmp/
# cd /tmp/linux-2.6.18-128.1.6.el5_lustre.1.8.0.1/
# cp configs/kernel-2.6.18-2.6-rhel5-x86_64-smp.config .config
# make prepare-all scripts
# cd /state/partition1/blcr-0.8.2/builddir/
# ../configure --with-linux=/tmp/linux-2.6.18-128.1.6.el5_lustre.1.8.0.1/ --with-system-map=/boot/System.map-2.6.18-128.1.6.el5_lustre.1.8.0.1smp --with-vmlinux=/boot/vmlinuz-2.6.18-128.1.6.el5_lustre.1.8.0.1smp --enable-multilib --enable-testsuite --enable-init-script

*******************************************************************
***** WARNING WARNING WARNING WARNING WARNING WARNING WARNING *****
*******************************************************************
* The kernel source does not match currently the running kernel.  *
* Compilation will produce modules unsuitable for the currently   *
* running kernel, which may not be what you intended.             *
*******************************************************************
***** WARNING WARNING WARNING WARNING WARNING WARNING WARNING *****
*******************************************************************
======================================================================
Please review the following configuration information:
  Kernel source directory = /tmp/linux-2.6.18-128.1.6.el5_lustre.1.8.0.1/
  Kernel build directory = /tmp/linux-2.6.18-128.1.6.el5_lustre.1.8.0.1/
  Kernel symbol table = /boot/System.map-2.6.18-128.1.6.el5_lustre.1.8.0.1smp/boot/vmlinuz-2.6.18-128.1.6.el5_lustre.1.8.0.1smp
  Kernel version probed from kernel build = 2.6.18-128.1.6.el5_lustre.1.8.0.1custom
  Kernel running currently = 2.6.18-128.1.6.el5_lustre.1.8.0.1smp
======================================================================

Warning: Kernel version probed from kernel build = 2.6.18-128.1.6.el5_lustre.1.8.0.1custom doesn't match with Kernel running currently = 2.6.18-128.1.6.el5_lustre.1.8.0.1smp. Proceeding with this warning would lead to installation failure.

This can be fixed with the following procedure. We need to change the Kernel version in the Makefile in the Linux kernel source directory copied to /tmp.
# cd /tmp/linux-2.6.18-128.1.6.el5_lustre.1.8.0.1/
# vi Makefile

Handy Hint: Change the line "EXTRAVERSION = -128.1.6.el5_lustre.1.8.0.1custom" to "EXTRAVERSION = -128.1.6.el5_lustre.1.8.0.1smp". We just have to replace tag "custom" with "smp".

# cp configs/kernel-2.6.18-2.6-rhel5-x86_64-smp.config .config
# make prepare-all scripts
# cd /state/partition1/blcr-0.8.2/builddir/

Configuring BLCR

# ../configure --with-linux=/tmp/linux-2.6.18-128.1.6.el5_lustre.1.8.0.1/ --with-system-map=/boot/System.map-2.6.18-128.1.6.el5_lustre.1.8.0.1smp --with-vmlinux=/boot/vmlinuz-2.6.18-128.1.6.el5_lustre.1.8.0.1smp --enable-multilib --enable-testsuite --enable-init-script
 
======================================================================
Please review the following configuration information:
  Kernel source directory = /tmp/linux-2.6.18-128.1.6.el5_lustre.1.8.0.1/
  Kernel build directory = /tmp/linux-2.6.18-128.1.6.el5_lustre.1.8.0.1/
  Kernel symbol table = /boot/System.map-2.6.18-128.1.6.el5_lustre.1.8.0.1smp/boot/vmlinuz-2.6.18-128.1.6.el5_lustre.1.8.0.1smp
  Kernel version probed from kernel build = 2.6.18-128.1.6.el5_lustre.1.8.0.1smp
  Kernel running currently = 2.6.18-128.1.6.el5_lustre.1.8.0.1smp
====================================================================== 

Compiling BLCR

# make

Testing the Build

# make insmod check
 
======================
All 58 tests passed
(2 tests were not run)
======================
Make sure blcr modules are loaded by grepping for blcr in the lsmod output. There should be two modules "blcr" and "blcr_imports".
# lsmod | grep blcr
blcr                  139268  0
blcr_imports           46208  1 blcr

Note: "make insmod check" loads BLCR kernel modules before doing check. Hence, loading them again with insmod would fail and there is no need for it.
If only "make check" is used in building the package then BLCR kernel modules need to be loaded separately. These module need to be loaded in order as shown below.

# insmod /usr/local/lib64/blcr/2.6.18-128.1.6.el5_lustre.1.8.0.1smp/blcr_imports.ko
# insmod /usr/local/lib64/blcr/2.6.18-128.1.6.el5_lustre.1.8.0.1smp/blcr.ko

Installing BLCR

# make install

Useful Information: By default BLCR will install into /usr/local.

Loading the kernel modules by default at boot time

Useful Information: Adding '--enable-init-script' to the configure flags installs blcr init script in /usr/local/etc/init.d/blcr. We need to copy this script to /etc/init.d/ and then modify the script, chkconfig to make it work as boot up script (service).

# vi /etc/init.d/blcr  
# chkconfig --add blcr
Follow the below procedure to modify the script and then save it.
Copy the blcr kernel modules from /usr/local/lib64/blcr/`uname -r`/ to /lib/modules/`uname -r`/kernel/drivers/misc/
# cp /usr/local/lib64/blcr/`uname -r`/*.ko /lib/modules/`uname -r`/kernel/drivers/misc/
# depmod -a
# vi /etc/init.d/blcr

Modify line 10:  module_dir=
                             to
                 module_dir=/usr/local/lib64/blcr/2.6.18-128.1.6.el5_lustre.1.8.0.1smp

Note: Next to module_dir= add the path of the directory containing blcr kernel modules.

Modify line 38:  modprobe $1 || (do_checkmod $1 || insmod ${module_dir}/${1}.ko)
                             to
                 modprobe $1 > /dev/null 2>&1 || (do_checkmod $1 || insmod ${module_dir}/${1}.ko)

Modify line 43:  modprobe -r $1 || (do_checkmod $1 && rmmod $1)
                             to
                 modprobe -r $1 > /dev/null 2>&1 || (do_checkmod $1 && rmmod $1)

Modify line 88:  if [ "x$rc1$rc2" != "x111" ] ; then
                             to
                 if [ "x$rc1$rc2" != "x11" ] ; then

Note: " > /dev/null 2>&1" next to modprobe is not necessary at all. Even when modprobe doesn't work, insmod works to load blcr modules. But as it tries to use the command modprobe first it gives "FATAL: Module blcr_imports not found" and "FATAL: Module blcr not found" error messages for the command modprobe. Then it runs insmod command successfully to load blcr modules with ok message. Adding " > /dev/null 2>&1" next to modprobe takes off this confusion.

If you don't want to copy the blcr kernel modules to /lib/modules/`uname -r`/kernel/drivers/misc/, then you can also do this as shown below.
# vi /etc/init.d/blcr

Modify line 10:  module_dir=
                             to
                 module_dir=/usr/local/lib64/blcr/2.6.18-128.1.6.el5_lustre.1.8.0.1smp

Note: Next to module_dir= add the path of the directory containing blcr kernel modules.

Modify line 38:  modprobe $1 || (do_checkmod $1 || insmod ${module_dir}/${1}.ko)
                             to
                 modprobe $1 > /dev/null 2>&1 || (do_checkmod $1 || insmod ${module_dir}/${1}.ko)

Modify line 43:  modprobe -r $1 || (do_checkmod $1 && rmmod $1)
                             to
                 modprobe -r $1 > /dev/null 2>&1 || (do_checkmod $1 && rmmod $1)

Modify line 88:  if [ "x$rc1$rc2" != "x111" ] ; then
                             to
                 if [ "x$rc1$rc2" != "x11" ] ; then

Note: " > /dev/null 2>&1" next to modprobe is not necessary at all. Even when modprobe doesn't work, insmod works to load blcr modules. But as it tries to use the command modprobe first it gives "FATAL: Module blcr_imports not found" and "FATAL: Module blcr not found" error messages for the command modprobe. Then it runs insmod command successfully to load blcr modules with ok message. Adding " > /dev/null 2>&1" next to modprobe takes off this confusion.

Note: There is no need to modify lines 38 and 43 as modules are loaded through insmod command as long as you don't care about error messages from command modprobe. No matter what, I believe we need to modify line 88 though.

# chkconfig --add blcr
# chkconfig --list blcr
blcr               0:off    1:off    2:off    3:on    4:on    5:on    6:off
# service blcr status
BLCR subsytem is active
# lsmod | grep blcr
blcr                  139268  0
blcr_imports           46208  1 blcr
# service blcr stop
Unloading BLCR:                                            [  OK  ]
# lsmod | grep blcr
# service blcr start
Loading BLCR:                                              [  OK  ]
# lsmod | grep blcr
blcr                  139268  0
blcr_imports           46208  1 blcr
# service blcr reload
Unloading BLCR:                                            [  OK  ]
Loading BLCR:                                              [  OK  ]
# lsmod | grep blcr
blcr                  139268  0
blcr_imports           46208  1 blcr
#
Useful Information
1) If you haven't used --enable-init-script configure option a template init script, etc/blcr.rc is provided in the BLCR source directory, blcr-0.8.2/etc/. Modify this as shown above to suit your system.

    # cp /state/partition1/blcr-0.8.2/etc/blcr.rc /etc/init.d/blcr
    # chmod 755 blcr
    # vi /etc/init.d/blcr

 2) Line 10 should be like this: module_dir=/usr/local/lib64/blcr/2.6.18-128.1.6.el5_lustre.1.8.0.1smp. Replace the text next to "module_dir=" with the path of blcr kernel modules. In my case it is "/usr/local/lib64/blcr/2.6.18-128.1.6.el5_lustre.1.8.0.1smp".
 3) Modify all other lines just like above.

Updating ld.so.cache

Nearly all Linux distributions use a caching mechanism for resolving dynamic library dependencies. If you have installed BLCR's shared library in a directory that is cached by the mechanism, then you will need to update this cache. To do so, run the ldconfig command as root; no command-line arguments are needed.

Handy Hint: Add the line "/usr/local/lib64" to the file "/etc/ld.so.conf" if configured with --enable-multilib or create a file under /etc/ld.so.conf.d/ with the line "/usr/local/lib64" or /usr/local/lib if configured without --enable-multilib.

# vi /etc/ld.so.conf
# more /etc/ld.so.conf
/lib64
/usr/lib64
/usr/kerberos/lib64
/opt/nmi/lib
/usr/lib64/qt-3.1/lib
/usr/lib64/mysql
/usr/X11R6/lib64
/usr/local/lib64
# ldconfig

Note: If configured without --enable-multilib replace the line /usr/local/lib64 with /usr/local/lib.

Note: If configured with --prefix= or --libdir= options that cause BLCR's shared library (libcr.so) to be installed in other than /lib or /usr/lib or any directory listed in /etc/ld.so.conf or any directory listed in a file under /etc/ld.so.conf.d/ then there is no need to run the ldconfig command. Although, it should always be safe to run the ldconfig command.

Note: Note that if you passed no --prefix= or --libdir= options to BLCR's configure script, then you should check /etc/ld.so.conf and /etc/ld.so.conf.d/ for /usr/local/lib (the default location) to determine if you actually need to run the ldconfig command.

Note: If passed --prefix= or --libdir= options to BLCR's configure script that cause BLCR's shared library (libcr.so) to be installed in other than /lib or /usr/lib or any directory listed in /etc/ld.so.conf or any directory listed in a file under /etc/ld.so.conf.d/, then you need to create a file like blcr.sh in /etc/profile.d/ with permissions 755 (-rwxr-xr-x).

# cd /etc/profile.d/
# more blcr.sh
#!/bin/sh
export LD_LIBRARY_PATH=/usr/local/lib/:/usr/local/lib64/
# chmod 755 blcr.sh
# source /etc/profile.d/blcr.sh
# echo $LD_LIBRARY_PATH
/usr/local/lib/:/usr/local/lib64/

Building a binary RPM from source RPMS

We can build RPMS from a source RPM (with a .src.rpm suffix) rather than the .tar.gz version of the BLCR distribution. Source RPMs are available on BLCR website. These source RPMs are configured to build for the running kernel, with --prefix=/usr and to configure with --enable-multilib on 64-bit platforms. Built RPMs will be placed in a subdirectory of /usr/src/redhat/RPMS.

Warning: To build binary RPMs from the source RPM, we need to do little bit tweaking on our systems as kernel version probed from kernel build = 2.6.18-128.1.6.el5_lustre.1.8.0.1custom doesn't match with Kernel running currently = 2.6.18-128.1.6.el5_lustre.1.8.0.1smp. Proceeding with this mismatch would lead to installation failure.

Handy Hint: Trick is to create links to vmlinuz, system map and kernel build in their respective directories with the tag custom in place of original tag smp.

Follow this procedure to build RPMS.
# cd /lib/modules/
# ln -s 2.6.18-128.1.6.el5_lustre.1.8.0.1smp 2.6.18-128.1.6.el5_lustre.1.8.0.1custom
# cd /boot/
# ln -s System.map-2.6.18-128.1.6.el5_lustre.1.8.0.1smp System.map-2.6.18-128.1.6.el5_lustre.1.8.0.1custom
# ln -s vmlinuz-2.6.18-128.1.6.el5_lustre.1.8.0.1smp vmlinuz-2.6.18-128.1.6.el5_lustre.1.8.0.1custom
# rpmbuild --rebuild --define 'kernel_ver 2.6.18-128.1.6.el5_lustre.1.8.0.1custom' blcr-0.8.2-1.src.rpm --target `uname -p`

Note: If installed from RPMS the path to executables is /usr/bin and to libraries it is /usr/lib64 (64 bit) as well as /usr/lib (32 bit). Most probably, /usr/lib64 would already be there in the file /etc/ld.so.conf. If it is not there make sure to add it as a separate line to this file. No need to add /usr/lib as this is always there in the system path and more over we just need 64 bit libraries as our machines are 64 bit.

Running BLCR

$ vi blcr.c
$ more blcr.c
#include "stdio.h"
int main( int argc, char *argv[] )
{ 
int i;
             for (i=0; i<100; i++)
             { 
                            printf("i = %d\n", i);
                            fflush(stdout);
                            sleep(1);
              } 
} 
$ gcc blcr.c -o blcr
$ cr_run ./blcr > output.txt &
[1] 17830
$ tail -f output.txt       # 'more output.txt' to see different output before checkpointing and after restart.
$ ps | grep blcr | grep -v grep
17830 pts/0 00:00:00 blcr
$ cr_checkpoint --term 17830       # creates a contex.pid file and kills the process
[1]+ Terminated       cr_run ./blcr >output.txt
$ ls context.*
context.17830
$ cr_restart context.17830 &       # viola ! start from where it was checkpointed

PBS Script Generator: Interdependent dropdown/select menus in Javascript

PBS SCRIPT GENERATOR
SH/BASH TCSH/CSH
Begin End Abort

About Me

LA, CA, United States
Here I write about the battles that have been going on in my mind. It's pretty much a scribble.

Sreedhar Manchu

Sreedhar Manchu
Higher Education: Not a simple life anymore