# mnist-slurm-demo **Repository Path**: cubeml-io/mnist-slurm-demo ## Basic Information - **Project Name**: mnist-slurm-demo - **Description**: slurm测试脚本 - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-03-12 - **Last Updated**: 2024-03-12 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Slurm ## 安装python环境 ```sh cd mnist-slurm-demo/conda-env #修改SBATCH 参数,-N 节点数,尽量设置当前最大节点数,使得每个节点都能安装 sbatch conda-env-create.sh # 查看安装日志 tail -f conda-env.log ``` ##提交作业 ```sh #按需修改参数 cd mnist-slurm-demo #提交作业 sbatch mnist-job.sh # This command submits the job. #查询作业状态 scontrl show job job_id # 取消作业 scancel job job_id #查询日志 tail -f mnist.log ``` ## Example batch script ``` sh #!/bin/bash #SBATCH --time 10 # time in minutes to reserve #SBATCH --cpus-per-task 2 # number of cpu cores #SBATCH --mem 4G # memory pool for all cores #SBATCH --gres gpu:1 # number of gpu cores #SBATCH -o mnist.log # write output to log file # Run jobs using the srun command. srun -l python mnist.py ``` # Useful commands See the [documentation](https://slurm.schedmd.com) for more information. ## Submitting jobs ``` sh sbatch mnist-job.sh ``` ## Show information about a running job Replace `` below with your job id. ``` sh sstat ``` ## Show information about all submitted jobs ``` sh sacct ``` ## Show specific information about a job ``` sh sacct --units G --format=jobid,avecpu,alloccpus,avevmsize -j ``` ## Get an interactive shell on a compute node ``` sh srun --gres gpu:1 --pty bash ``` Logging into the compute node is useful for debugging the environment. ======= # mnist-slurm-demo