feat: add job array example

Closes #3

feat: add job array example
7135ac49 · Rasmus Ringdahl · 29e86c08 · 7135ac49 · 7135ac49 · 7135ac49
Commit 7135ac49 authored 1 month ago by Rasmus Ringdahl
--- a/3_job_array/README.md
+++ b/3_job_array/README.md
+# Job arrays
+A job array is a method to queue multiple jobs with similar resource needs. Job arrays are suitable when the same type of computation is needed to be run multiple times with different input data.
+## How to run
+To run the example do the following steps:
+1. Log in to Lundgren
+2. Change directory to the example code
+3. Run `sbatch job_array.sh`
+4. Check queue status by running `squeue`
+5. When the job is completed check the file _job_array_XXX_YY.log_
+Try to extend the job array with one more run by adding a new file in the _data_ folder and update the _config.txt_ file.
+## Detailed description of the example
+The batch script is the main file for the job allocation and preparation. Inside the python script a few environmental variables are fetched and printed out. Furthermore there are a folder, _data_ that contains some figurative input data and a _config.txt_ file that maps the array to the input data.
+### The batch script
+The batch script, _job_array.sh_, contains four sections. The first section contains input arguments to the Slurm scheduler. The second section loads Python into environment so it is accessible. In the third step a the _config.txt_ file is read and the filename of the file corresponding to the array index is stored. Lastly the job step is performed with the relevant filename as input argument.
+The input arguments are defined with a comment beginning with SBATCH followed by the argument key and value. For easier readablility the -- method is used.
+- __job-name:__ The name of the job is set to _demo_job_array_
+- __time:__ The requeted time is set to 5 minutes, _00:05:00_
+- __ntasks:__ The number of tasks to be performed in this job is set to _1_.
+- __cpus-per-task:__ The requested number of cores per task is set to _2_
+- __mem:__ The requested memory is set to _50 MB_
+- __output:__ The standard output should be sent to the file _job_array_%A_%a.log__, the %A will expand to the job number and %a will expand to the array index.
+- __array:__ The array is set t _1-3_. This represents a list of array ids that should be created. Each id will be a separate job. The array can be of any numbering that suites the user. 
+_Note: Multiple similar jobs will be run and output files need to be handled in a way so they are not overwritten._
+Python needs to be loaded into the environment in order to be accessible this is done in the next step with the __module__ command.
+The job step with the single task is allocated and performed with the __srun__ command.
+#### The configuration file and data files
+The _config.txt_ is a text file containing a simple table, the first column contains a the array index and the second column contains the filepath to the data file to be loaded into the job. It is importat that the index in the file matches the _--array_ argument. 
+The data files in this example is a simple json object but could be a CSV-file or other file formats.
+For simpler applications the data files could be ignored and the _config.txt_ contains all relevant data. 
+#### The python script
+The python script represents the task to be done. In this case the task is to wait a time based on the input data file and print the waiting is done.
+The environment variable __SLURM_CPUS_PER_TASK__ is used to restrict the worker pool to the allocated number of cores.
+### How to set the number of cores in different programing languages and softwares
+Most programming languages and softwares tries to make use of all cores that are available. This can lead to an oversubscription on the resources. On a shared resource one must match the maximum used resources with the allocated ones. This section gives a reference in how to do it in commonly used softwares.
+- __CPLEX:__ Use the parameter _global thread count_. Read more in the [documentation](https://www.ibm.com/docs/en/icos/22.1.2?topic=parameters-global-thread-count)
+- __Gurobi:__ Use the configuration parameter _ThreadLimit_. Read more in the [documentation](https://docs.gurobi.com/projects/optimizer/en/current/reference/parameters.html#threadlimit)
+- __MATLAB:__ Create a instance of the parpool object with the _poolsize_ set to the number of cores and use the pool when running in parallell. Read more in the [documentation](https://se.mathworks.com/help/parallel-computing/parpool.html)
+- __Python:__  If the multiprocessing package is used, create an instance of the pool class with the _processes_ set to the number of cores and use the pool when running in parallell. Read more in the [documentation](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool)
--- a/3_job_array/config.txt
+++ b/3_job_array/config.txt
+task	file
+1 data/parameters_first_job.txt
+2 data/parameters_second_job.txt
+3 data/parameters_third_job.txt
--- a/3_job_array/data/parameters_first_job.txt
+++ b/3_job_array/data/parameters_first_job.txt
+{"name": "First file", "sleep": [2,12,7,4,4]}
\ No newline at end of file
--- a/3_job_array/data/parameters_second_job.txt
+++ b/3_job_array/data/parameters_second_job.txt
+{"name": "Second file", "sleep": [3,10,4,11,2]}
\ No newline at end of file
--- a/3_job_array/data/parameters_third_job.txt
+++ b/3_job_array/data/parameters_third_job.txt
+{"name": "Third file", "sleep": [12,3,14,10,20]}
\ No newline at end of file
--- a/3_job_array/job_array.sh
+++ b/3_job_array/job_array.sh
+#! /bin/bash
+#SBATCH --job-name=demo_job_array
+#SBATCH --time=00:05:00
+#SBATCH --ntasks=1
+#SBATCH --cpus-per-task=2
+#SBATCH --mem-per-cpu=50MB
+#SBATCH --output=job_array_%A_%a.log
+#SBATCH --array=1-3
+# Loading Python into the environment
+module load python/anaconda3-2024.02-3.11.7
+# Specify the path to the config file
+config=config.txt
+# Extract the file name for the current $SLURM_ARRAY_TASK_ID
+file=$(awk -v task=$SLURM_ARRAY_TASK_ID '$1==task {print $2}' $config)
+# Start job stage
+srun python job_array_task.py ${file}
\ No newline at end of file
--- a/3_job_array/job_array_task.py
+++ b/3_job_array/job_array_task.py
+from datetime import datetime
+from multiprocessing import Pool
+import json
+import logging
+import os
+import sys
+import time
+logger = logging.getLogger(__name__)
+def sleep(input):
+    time.sleep(input[1])
+    logger.info('Task %d done.',input[0])
+def main(filename: str):
+    # Read environment variables.
+    NUMBER_OF_CORES = os.environ.get('SLURM_CPUS_PER_TASK','Unknown')
+    if NUMBER_OF_CORES in 'Unknown':
+        logger.error('Unkown number of cores, exiting.')
+        return
+    NUMBER_OF_CORES = int(NUMBER_OF_CORES)
+    logger.info('Running program with %d cores.',NUMBER_OF_CORES)
+    # Reading configuration file and create a list of tasks
+    # This represents the reading of parameters and calculations
+    logger.info('Reading configuration from %s.',filename)
+    with open(filename, 'r') as file:
+        data = json.load(file)
+    tasks = []
+    total_time = 0
+    for i in range(len(data['sleep'])):
+        time = data['sleep'][i]
+        tasks.append((i, time))
+        total_time = total_time + time
+    # Creating a multiprocessing pool to perform the tasks
+    pool = Pool(processes=NUMBER_OF_CORES)
+    # Running submitting the tasks to the worker pool
+    tic = datetime.now()
+    logger.info('Submitting tasks to pool.')
+    pool.map(sleep, tasks)
+    toc = datetime.now()
+    logger.info('All tasks are done, took %d seconds, compared to %d seconds with single thread.',
+        (toc-tic).seconds, total_time)
+if __name__ == '__main__':
+    logging.basicConfig(level=logging.INFO)
+    filename = sys.argv[1]
+    main(filename)
--- a/README.md
+++ b/README.md
@@ -25,4 +25,9 @@ Learn more about the [example](https://gitlab.liu.se/rasri17/lundgren-examples/-
 #### Example 2 - Mutli core job
 A multi core job is a job that splits the computation to multiple cores. This type of job is the most suitable and most common ones to run on Lundgren. This includes optimization problems and heavy computations.
 Learn more about the [example](https://gitlab.liu.se/rasri17/lundgren-examples/-/tree/main/2_multi_core_job).
\ No newline at end of file
+#### Example 3 - Job arrays
+A job array is a method to queue multiple jobs with similar resource needs. Job arrays are suitable when the same type of computation is needed to be run multiple times with different input data.
+Learn more about the [example](https://gitlab.liu.se/rasri17/lundgren-examples/-/tree/main/3_job_array).
\ No newline at end of file