How To Install Slurm On Debian 8.6

To Install Slurm on Debian 8.6

Slurm is an open-source, high-performing, and highly scalable tool which performs cluster management and job scheduling for Linux clusters. Slurm is a simple and easy-to-install tool. The installation and usage of Slurm tool on Ubuntu has already been covered by us, and this article deals with the installation of the Slurm on Debian. Some of the notable features of Slurm are:

  • Highly configurable as it is integrated with about 100 plugins
  • Pre-emptive and gang scheduling (time-slicing of parallel jobs)
  • Easy accounting and configuration as its integrated with database.
  • Resource allocations optimized for network topology and on-node topology (sockets, cores and hyperthreads)
  • Has advanced reservation
  • Can boot different OS for each job
  • Support for job arrays

Installing Slurm

Initiate the installation of the Slurm tool by adding the repository to your system. Run the following command for the same purpose.

root@linuxhelp:~# nano /etc/apt/sources.list
deb http://ftp.debian.org/debian/ stable main contrib non-free
deb http://ftp.de.debian.org/debian jessie main

Once it is added, update the newly added repository by invoking the following command.

root@linuxhelp:~# apt-get update
Ign http://ftp.debian.org stable InRelease                                                                    
Ign http://ftp.de.debian.org jessie InRelease                                                                 
Get:1 http://ftp.debian.org stable Release.gpg [2,373 B]                      
Get:2 http://ftp.de.debian.org jessie Release.gpg [2,373 B]                        
Get:3 http://ftp.debian.org stable Release [148 kB]                                
Get:4 http://security.debian.org jessie/updates InRelease [63.1 kB]                       
Get:5 http://ftp.de.debian.org jessie Release [148 kB]                                                        
Get:6 http://ftp.debian.org stable/main amd64 Packages [6,776 kB]                                             
Get:7 http://security.debian.org jessie/updates InRelease [63.1 kB]                                           
Get:8 http://security.debian.org jessie/updates/main Sources [190 kB]                                         
Get:9 http://ftp.de.debian.org jessie/main amd64 Packages [6,776 kB]                                          
Get:10 http://security.debian.org jessie/updates/contrib Sources [1,439 B]  
.
.
.
Get:19 http://ftp.debian.org stable/non-free amd64 Packages [83.6 kB]                                         
Get:20 http://ftp.debian.org stable/contrib Translation-en [38.5 kB]                                          
Get:21 http://ftp.debian.org stable/main Translation-en [4,582 kB]                                            
Get:22 http://ftp.debian.org stable/non-free Translation-en [72.1 kB]                                         
Fetched 24.7 MB in 2min 8s (191 kB/s)                                                                         
Reading package lists... Done

Now is the time to install Slurm. Make use of the following command for the same purpose.

root@linuxhelp:~# apt-get install slurm
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following NEW packages will be installed:
  slurm
0 upgraded, 1 newly installed, 0 to remove and 157 not upgraded.
Need to get 22.1 kB of archives.
After this operation, 88.1 kB of additional disk space will be used.
Get:1 http://ftp.debian.org/debian/ stable/main slurm amd64 0.4.2-1 [22.1 kB]
Fetched 22.1 kB in 0s (38.5 kB/s)
Selecting previously unselected package slurm.
(Reading database ... 138444 files and directories currently installed.)
Preparing to unpack .../slurm_0.4.2-1_amd64.deb ...
Unpacking slurm (0.4.2-1) ...
Processing triggers for man-db (2.7.0.2-5) ...
Setting up slurm (0.4.2-1) ...

The Slurm tool has been installed. Learn about the options available with Slurm by running the following command. These options may help you understand the many uses of the slurm utility.

root@linuxhelp:~# slurm
slurm 0.4.2 - Hendrik Scholz < hendrik@scholz.net> 

usage: slurm [-hHz] [-csl] [-d delay] [-t theme] -i interface

    -h            print help
    -z            zero counters at startup
    -d delay      delay between refreshs in seconds (1 <  delay <  300)
    -c            old classic/combined view
    -s            split window mode with stats
    -l            large split window mode
    -L            enable TX/RX ' leds' 
    -i interface  select network interface
    -t theme      select a theme

You can also remove the Slurm tool by running the following command.

root@linuxhelp:~# apt-get remove slurm
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages will be REMOVED:
  slurm
0 upgraded, 0 newly installed, 1 to remove and 157 not upgraded.
After this operation, 88.1 kB disk space will be freed.
Do you want to continue? [Y/n] y
(Reading database ... 138459 files and directories currently installed.)
Removing slurm (0.4.2-1) ...
Processing triggers for man-db (2.7.0.2-5) ...

Installing Slurm and using it to monitor the network was easy, isn' t it? Slurm is used by many Supercomputers and computer clusters which includes Tianhe-2, one of the fastest supercomputers in the world.

Moreover, Slurm offers three major functions in a consecutive manner. Initially, it allocates the users with resources, thus giving them enough time to perform work. Secondly, it sets a framework to start, execute and monitor the work on a set of allocated nodes. And finally, it manages the queues for the pending jobs. Slurm comprises of two daemons, namely controller daemons and a worker daemon. While the former queues the jobs, allocates the resources and monitors the status of the nodes, the latter gathers and returns the information to its node.

Tag : debian Slurm
FAQ
Q
How can I run a job within an existing job allocation?
A
There is a srun option --jobid that can be used to specify a job's ID. For a batch job or within an existing resource allocation, the environment variable SLURM_JOB_ID has already been defined, so all job steps will run within that job allocation unless otherwise specified. The one exception to this is when submitting batch jobs. When a batch job is submitted from within an existing batch job, it is treated as a new job allocation request and will get a new job ID unless explicitly set with the --jobid option.
If you specify that a batch job should use an existing allocation, that job allocation will be released upon the termination of that batch job.
Q
How can I run a job within an existing job allocation?
A
There is a srun option --jobid that can be used to specify a job's ID. For a batch job or within an existing resource allocation, the environment variable SLURM_JOB_ID has already been defined, so all job steps will run within that job allocation unless otherwise specified. The one exception to this is when submitting batch jobs. When a batch job is submitted from within an existing batch job, it is treated as a new job allocation request and will get a new job ID unless explicitly set with the --jobid option.
If you specify that a batch job should use an existing allocation, that job allocation will be released upon the termination of that batch job.
Q
Why is the Slurm backfill scheduler not starting my job?
A
The most common problem is failing to set job time limits. If all jobs have the same time limit (for example the partition's time limit), then backfill will not be effective.
Note that partitions can have both default and maximum time limits, which can be helpful in configuring a system for effective backfill scheduling.
Q
Why should I use Slurm or other Free Open Source Software (FOSS)?
A
Free Open Source Software (FOSS) does not mean that it is without cost. It does mean that you have access to the code so that you are free to use it, study it, and/or enhance it. These reasons contribute to Slurm (and FOSS in general) being subject to active research and development worldwide, displacing proprietary software in many environments. If the software is large and complex, like Slurm or the Linux kernel, then while there is no license fee, its use is not without cost.
Q
Why are my resource limits not propagated?
A
The hard resource limits applied to Slurm's slurmd daemon are lower than the user's soft resources limits on the submit host. Typically the slurmd daemon is initiated by the init daemon with the operating system default limits. This may be addressed either through use of the ulimit command in the /etc/sysconfig/slurm file or enabling PAM in Slurm.