UNIX Shell Basics
Basic
pwd # Show current directory
cd /path/to/directory # Change directory
cd .. # Go up one level
cd - # Go to previous directory
cd ~ # Go to home directory
ls # List files and folders
ls -lhrt # List files with details, human-readable sizes, sorted by time, reverse order
mkdir folder_name # Create a new folder
mkdir -p /path/to/folder # Create nested folders
mv old_name new_name # Rename or move file/folder
rm file_name # Remove file
rm -r folder_name # Remove folder recursively
rm -f file_name # Force remove file without prompt
rm -rf folder_name # Force remove folder recursively without prompt
cp source_file destination_file # Copy file
cp -a /source/folder /destination/folder # Copy folder recursively with attributes
scp user@remote:/path/to/remote/file /local/destination/ # Copy file from remote server
scp /local/file user@remote:/path/to/remote/destination/ # Copy file to remote serverwc -l file_name # Count number of lines in a file
cat file_name # Display file content
zcat file_name.gz # Display compressed file content
less file_name # View file content with pagination
zless file_name.gz # View compressed file content with pagination
head -n 10 file_name # Show first 10 lines of a file
tail -n 10 file_name # Show last 10 lines of a file
history # Show command history
top # Display running processes
htop # Interactive process viewer (may need installation)
chmod 755 file_name # Change file permissions
chmod +x file_name # Make file executable
chown user:group file_name # Change file owner and group
tar xzvf archive.tar.gz # Extract tar.gz archive
tar czvf archive.tar.gz /path/to/folder # Create tar.gz archive
tar xjvf archive.tar.bz2 # Extract tar.bz2 archive
tar cjvf archive.tar.bz2 /path/to/folder # Create tar.bz2 archive
unzip archive.zip # Extract zip archive
zip -r archive.zip /path/to/folder # Create zip archive
ping google.com # Check network connectivity
ifconfig # Show network interfaces (may need installation)
ip addr show # Show network interfaces (modern alternative to ifconfig)
wget http://example.com/file # Download file from the internet
curl -O http://example.com/file # Download file using curlsort file_name # Sort lines in a file
sort -u file_name # Sort and remove duplicate lines
uniq file_name # Remove duplicate lines (requires sorted input)
diff file1 file2 # Compare two files line by line
diff -r dir1 dir2 # Compare two directories recursively
ln -s /path/to/target /path/to/symlink # Create a symbolic link
tree /path/to/directory # Display directory structure as a tree (may need installation)
tree -L 2 /path/to/directory # Display directory structure up to level 2
du -h /path/to/directory # Show disk usage of a directory
du -sh /path/to/directory # Show total disk usage of a directory
du -sh /path/to/directory/*/ # Show disk usage of all subdirectories
du -h --max-depth=1 /path/to/directory | sort -h # Show disk usage of subdirectories sorted by size
rsync -av /source/folder/ /destination/folder/ # Sync folders
rsync -auzP /source/folder/ user@remote:/destination/folder/ # Sync to remote server with compression and progress## Show current shell
echo $SHELL
## Change default shell to bash
chsh -s /bin/bash
chsh -s /bin/zshgrep
grep Globally search a Regular Expression and Print: is a command-line utility for searching plain-text data sets for lines that match a specified pattern: * Search files or input for specific patterns * Supports regular expressions for complex pattern matching * Options for case sensitivity, whole word matching, line numbers, etc.
find
find is a powerful command to search for files and directories based on various criteria: * Search by name, type, size, modification time, permissions, etc. * Execute actions on found items (delete, move, etc.) * Supports complex expressions with logical operators
## Find all .txt files in current directory and subdirectories
find . -name "*.txt"
## Find files/folders by name
find /path/to/search -name "pattern"
## Find files larger than 100MB
find /path/to/search -type f -size +100M
## Find files in pattern and list details
find ./Clean -name "*Clean.fastq.gz" -type f -exec ls -lh {} \;cut
cut is used to extract sections from each line of input: * Extract columns or fields from text files * Specify delimiter for fields * Useful for processing structured data like CSV or TSV files * Works line by line
## Extract first column from a tab-delimited file
cut -f1 file.txt
## Extract first and third columns from a comma-separated file
cut -d',' -f1,3 file.csv
## Extract characters from position 1 to 5
cut -c1-5 file.txtsed
sed is a stream editor used to perform basic text transformations: * substituction, deletion, insertion, and more * works line by line, applying patterns and edits
## Remove all spaces from a string
echo "a b c" | sed 's/ //g'
#> "abc"
## Remove all occurences of "foo" with "bar"
echo "foo baz foo" | sed 's/foo/bar/g'
#> "bar baz bar"
## Delete lines containing a "pattern" pattern from file.txt
sed '/pattern/d' file.txt
## Print only lines matching a "pattern" pattern from file.txt
sed -n '/pattern/p' file.txt
## Print lines 2 to 5 of a file
sed -n '2,5p' file.txtawk
awk used for pattern scanning and processing. * More powerful than cut for data extraction and reporting * Process text as fileds and records, support arithmetic and logic
## Print the first column of a file
awk '{print $1}' file.txt
## Sum values in the first column
awk '{sum += $1} END {print sum}' file.txt
## Print lines where the third column is greater than 100
awk '$3 > 100' file.txt
## Print the lines that match a pattern "error"
awk '/error/' file.txt
## Print the number of lines in a file
awk 'END {print NR}' file.txtFile Compression
- Commone tar flags:
-x= extract-f= file-v= verbose-z= gzip compression-j= bzip2 compression-J= xz compression-c= change to directory-C= specify directory-p= preserve permissions
## Extract .tar.gz file
tar -xf fuke.tar
tar -xzvf file.tar.gz
tar -xzf file.tgz
## Extract .tar.bz2 file
tar -xjf file.tar.bz2
## Extract .tar.xz file
tar -xJf file.tar.xz
## Extract to specific directory
tar -xzf file.tar.gz -C /path/to/directory
## List contents without extracting
tar -tzf file.tar.gz
tar -tf file.tar## Extract zip files
unzip file.zip
unzip file.zip -d /path/to/directory
## List contents without extracting
unzip -l file.zip
## Extract .gz file
gunzip file.gz
gunzip -dk file.gz # keep original file
## Extract .bz2 file
bunzip2 file.bz2
## Extract .xz file
unxz file.xz
## Extract .7z file
7z x file.7z
## Extract .rar file
unrar x file.rar- Common tar flags for compression
-c= create archive-f= file name-z= gzip compression (.tar.gz)-j= bzip2 compression (.tar.bz2)-J= xz compression (.tar.xz)-v= verbose output
tar -cf archive.tar file1 file2 directory/
## gzip compressed
tar -czf archive.tar.gz file1 file2 directory/
## bzip2 compressed
tar -cjf archive.tar.bz2 file1 file2 directory/
## .tar.xz compressed
tar -cJf archive.tar.xz file1 file2 directory/## Backup and Restore a Folder
tar -czvf mybackup.tar.gz myfolder
tar -xzvf mybackup.tar.gzFile Permissions
Only the user can copy, execute, and change
chmod 700 /path/to/folder700 means:
- Owner: 7 (read = 4, write = 2, execute = 1; 4+2+1 = 7)
- Group: 0 (no permissions)
- Others: 0 (no permissions)
All can read and execute, only the owner can change
chmod 755 /path/to/folder- Owner: 7 (read = 4, write = 2, execute = 1; 4+2+1 = 7)
- Group: 5 (read = 4, execute = 1; 4+1 = 5)
- Others: 5 (read = 4, execute = 1; 4+1 = 5)
Give a user named someuser read, write, and execute permissions on the /path/to/folder directory
setfacl -m u:someuser:rwx /path/to/folderVerify the Permissions
getfacl /path/to/folderRemove all ACL entries for a user
setfacl -x u:someuser /path/to/folderChange owner
sudo chown username:groupname /path/to/folderData transfer
- Transfering data efficiently between local and remote systems:
wget: Download files directly from the internet to your local machine or server.scp: Securely copy files between local and remote systems over SSH.rsync: Synchronize files and directories efficiently, with options for selective copying and deletion.
## Basic command
rsync -av /source/folder/ /destination/folder/
## Sync source to destination
rsync -av --delete /source/folder/ /destination/folder/
## If you need two-way sync, run both directions
rsync -av --delete /folder1/ /folder2/
rsync -av --delete /folder2/ /folder1/
## Network sync
rsync -av --delete /local/folder/ user@remote:/remote/folder/
## Copy only new files (files that don't exist in the destination)
rsync -av --ignore-existing /source/folder/ /destination/folder/
## Copy only if source is newer
rsync -av --update /source/folder/ /destination/folder/--archive: Preserves permissions, timestamps, symbolic links, etc.--verbose: Shows progress--delete: Removes files in destination that don’t exist in source--dry-run: Preview changes without actually syncing--update: Only copy newer file--ignore-existing: Skips files that already exist in destination (regardless of age)--update: Only copies if source file is newer than destination--no-perms: Don’t preserve Unix permissions--no-times: Don’t preserve modification times--inplace: Write directly without temp files
- Sync remote server folders to local using rsync
## Remote server details
remote_user="username"
remote_host="hpcio2" # hpc2021-io1.hku.hk,hpc2021-io2.hku.hk
## Define folders to sync (absolute paths)
## Format: "local_absolute_path:remote_absolute_path"
folders_to_sync=(
# "/mnt/m/Reference:/lustre1/g/path_my/Reference"
# "/mnt/m/WES/DFSP/Annovar:/lustre1/g/path_my/pipeline/somatic_variants_calling/data/DFSP/Annovar"
"/mnt/m/WES/SARC/BAM:/lustre1/g/path_my/pipeline/somatic_variants_calling/data/SARC/BAM"
# Add more folders as needed in the same format
)
## Sync each folder
for folder_pair in "${folders_to_sync[@]}"; do
# Split the pair into local and remote paths
local_path="${folder_pair%%:*}"
remote_path="${folder_pair#*:}"
echo "Syncing ${remote_user}@${remote_host}:${remote_path} to ${local_path}"
# Create local directory structure
mkdir -p "${local_path}"
# Sync the folder contents from remote to local (trailing slash on source copies contents)
rsync -av --update --progress --inplace --no-perms --no-times \
"${remote_user}@${remote_host}:${remote_path}/" \
"${local_path}/"
echo "-----------------------------------"
done- Sync local folders to remote server
## Remote server details
remote_user="username"
remote_host="hpcio1" # hpc2021-io1.hku.hk
## Define folders to sync (absolute paths)
## Format: "local_absolute_path:remote_absolute_path"
folders_to_sync=(
"/mnt/f/Reference:/lustre1/g/path_my/Reference"
# Add more folders as needed in the same format
)
## Sync each folder
for folder_pair in "${folders_to_sync[@]}"; do
# Split the pair into local and remote paths
local_path="${folder_pair%%:*}"
remote_path="${folder_pair#*:}"
echo "Syncing ${local_path} to ${remote_user}@${remote_host}:${remote_path}"
# Create remote directory structure
ssh ${remote_user}@${remote_host} "mkdir -p ${remote_path}"
# Sync the folder contents (trailing slash on source copies contents)
rsync -av --update --progress \
"${local_path}/" \
"${remote_user}@${remote_host}:${remote_path}/"
echo "-----------------------------------"
doneVim Text Editor
Modes
- Vim text editor has three main modes:
Esc, back to normal modei, insert modev, select mode
| Command | What it does | Most common follow-up |
|---|---|---|
| i | Insert before cursor | typing text |
| a | Append after cursor | typing text |
| I | Insert at beginning of line | — |
| A | Append at end of line | — |
| o | Open new line below | typing |
| O | Open new line above | typing |
| Esc | Back to Normal mode | — |
| v | Visual mode | move + operator |
| V | Visual Line mode | — |
| Ctrl-v | Visual Block mode | — |
Editing
| Command | Meaning | examples |
|---|---|---|
| x | Delete character under cursor | — |
| dd | Delete current line | 5dd = delete 5 lines |
| yy | Yank (copy) current line | p to paste |
| p / P | Paste after/before cursor | — |
| u | Undo | (most used command after Esc) |
| Ctrl-r | Redo | — |
| . | Repeat last change/command | (extremely powerful) |
| dw / d$ | Delete to end of word / end of line | diw, daw |
| ciw / caw | Change (delete + insert) inside/around word | most common text object usage |
| “ci”” / ci( / ci{” | Change inside quotes/parentheses/braces | extremely common in code |
Search & Replace
| Command | Meaning |
|---|---|
| /pattern | Search forward |
| ?pattern | Search backward |
| n / N | Next / previous match |
| * / # | Search forward/backward for word under cursor |
| :%s/old/new/g | Replace all occurrences (whole file) |
| :%s/old/new/gc | Replace with confirmation |
Saving & Exiting
| Command | Meaning |
|---|---|
| :w | Save (write) |
| :q | Quit (only if no changes) |
| :q! | Quit without saving |
| :wq or :x or ZZ | Save and quit |
| :w !sudo tee % | Save file needing root (very common workaround) |
Nano Text Editor
- Open a file with:
nano filename.txt - Use the arrow keys to navigate.
- Save the changes with
ctrl + o, - Confirm the filename by pressing
Enter. - Exit nano with
ctrl + x.
Searching Text
- To find words or phrases within a file:
- Initiate search with
ctrl + w. - Type the search term and press
Enter. - Exit search mode with
ctrl + c.
Customizing Nano
Enhance Nano’s functionality:
nano -miA file.txt-m: Enable mouse support.-i: Auto-indent new lines.-A: Enable syntax highlighting.
Job Schedule
OpenPBS
OpenPBS (Portable Batch System) is an open-source workload manager and job scheduler used in High-Performance Computing (HPC) environments to automate and optimize the execution of tasks across clusters and clouds.
- PBS job script
#!/bin/bash
#PBS -N job_name # Job name
#PBS -l nodes=1:ppn=4 # Number of nodes and processors per node
#PBS -l walltime=01:00:00 # Walltime (hh:mm:ss)
#PBS -l mem=8gb # Memory per node
#PBS -j oe # Combine stdout and stderr
#PBS -M user@example.com # Email address for notifications
#PBS -m abe # Send email on (a) abort, (b) begin, (e) end
#PBS -q batch # Queue named
#PBS -V # Export all environment variables to the job
## example
cd $PBS_O_WORKDIR # Change to the directory where the job was submitted
module load python/3.8.5 # Load necessary modules
python my_script.py # Command to run your program## Some variables
$PBS_O_WORKDIR # The directory from which the job was submitted
$PBS_O_HOME # The home directory of the user who submitted the job
$PBS_JOBDIR # The working directory for the job on the compute node
$TMPDIR # The temporary directory for the job on the compute node
$PBS_JOBID # The unique identifier assigned to the job
$PBS_NODEFILE # The file containing the list of nodes allocated to the job
$PBS_NUM_NODES # The number of nodes allocated to the job
$PBS_NUM_PPN # The number of processors per node allocated to the job
$PBS_QUEUE # The name of the queue to which the job was submitted
$PBS_JOBNAME # The name of the job as specified in the job scriptThe qstat command is used to request the status of jobs, queues, or a batch server: * qsub job_script.pbs - Submit a job * qsub -v sample="$sample" -M "$EMAIL" pbs_1_cutadapt.pbs - Submit a job with environment variables * qstat - Check job status * qstat -u guorui -a - Check all jobs for user guorui * qdel job_id - Delete a job
* qhold job_id - Hold a job
* qrls job_id - Release a held job
## List finished job details in summary
qstat -x job_id
## List finished job details in full details
qstat -H job_id
qstat -x -f job_id
qstat -x -f job_id | grepl "exec_host"
## Find out all compute nodes allocated to a job
for job_id in $(find . -name "log.*.e*" -printf "%f\n" | sed 's/.*\.e\([0-9]*\)$/\1/' | sort -n | uniq); do
echo -n "Job ID: $job_id: "; qstat -x -f $job_id | grep exec_host
done
## Start and interactive job with 16 processors, 64GB memory, 24 hours walltime in cgs_queue
qsub -I -q cgs_queue -l nodes=hpcf3-c01:ppn=16,walltime=12:00:00,mem=64gb
qsub -I -q cgs_queue -l nodes=hpcf3-c02:ppn=16,walltime=12:00:00,mem=64gb
qsub -I -q cgs_queue -l nodes=hpcf3-c03:ppn=16,walltime=12:00:00,mem=64gbShell session
Tmux
- Tmux Configuration Consider adding these aliases to your shell configuration for quick tmux access.
# Terminal aliases for tmux
alias t="tmux"
alias ta="t a -t"
alias tls="t ls"
alias tn="t new -s"
alias tk="t kill-session -t"
alias tks="t kill-server"Change the default prefix in .tmux.conf:
set-option -g prefix C-a
bind-key C-a send-prefix- Session Management
| Action | Command/Shortcut |
|---|---|
| List sessions | tmux list-sessions |
| Attach to session | tmux attach-session -t target-session |
| Switch between sessions | Ctrl + A + s |
| Switch to latest session | Ctrl + A + l |
| Detach from session | Ctrl + A + d |
- Window Management
| Action | Shortcut |
|---|---|
| List all windows | Ctrl + A + w |
| Rename current window | Ctrl + A + , |
| Switch to next window | Ctrl + A + → |
| Switch to previous window | Ctrl + A + ← |
| Create new window | Ctrl + A + c |
| Kill current window | Ctrl + A + q |
- Pane Management
| Action | Shortcut |
|---|---|
| Switch to pane above | Shift + ↑ |
| Switch to pane below | Shift + ↓ |
| Switch to pane left | Shift + ← |
| Switch to pane right | Shift + → |
| Kill current pane | Ctrl + A + x |
- Copy Mode
| Action | Shortcut |
|---|---|
| Enter copy mode | Drag mouse to select text |
| Paste copied text | Ctrl + A + ] |
Screen
Add these aliases to your shell configuration for easier screen management.
# Terminal aliases for screen
alias s="screen" # start a screen session
alias ss="s -S" # start a named screen session
alias sr="s -r" # reattach to a screen session
alias sls="s -ls" # list current running screen sessions- Basic Commands
| Action | Command |
|---|---|
| Start screen session | screen |
| Start named session | screen -S session_name |
| Reattach to session | screen -r |
| List running sessions | screen -ls |
- Window Management
| Action | Shortcut |
|---|---|
| Create new window | Ctrl + A + C |
| Kill current window | Ctrl + A + K |
| List all windows | Ctrl + A + W |
| Go to window 0-9 | Ctrl + A + 0-9 |
| Go to next window | Ctrl + A + N |
| Toggle between windows | Ctrl + A + Ctrl + A |
| Rename current window | Ctrl + A + A |
- Region Management
| Action | Shortcut |
|---|---|
| Split horizontally | Ctrl + A + S |
| Split vertically | Ctrl + A + \| |
| Switch between regions | Ctrl + A + Tab |
| Close all regions but current | Ctrl + A + Q |
| Close current region | Ctrl + A + X |
- Session & Copy Mode
| Action | Shortcut |
|---|---|
| Detach from session | Ctrl + A + D |
| Start copy mode | Ctrl + A + [ |
| Paste copied text | Ctrl + A + ] |
| Show help | Ctrl + A + ? |
| Quit screen | Ctrl + A + Ctrl + \ |
- VSCode: Official Keyboard Shortcuts Reference
- Vim: Vim Cheat Sheet
- Tmux: Tmux Cheat Sheet
- Terminal: Bash Reference Manual
Software Environment
Conda
- Installation
# Miniforge linux
wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh"
# Miniforge Mac
wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh"
# Install it
bash Miniforge3-$(uname)-$(uname -m).sh
# Add conda to PATH if found
if [ -f "$HOME/miniconda3/etc/profile.d/conda.sh" ]; then
. $HOME/miniconda3/etc/profile.d/conda.sh
fi- Configuration
# Init
conda init --all
# Use mamba for faster solving
conda update -n base conda
conda install -n base conda-libmamba-solver
# Config
conda config --set solver libmamba
conda config --set always_yes true
conda config --set auto_activate_base false
# Setup channels
conda config --add channels bioconda
conda config --add channels conda-forge
#conda config --set channel_priority strict
# Install software from a specific channel
conda install -c conda-forge numpy- Create Environment
# Create R environment with specific R version and packages
conda create -n renv r=4.5 r-languageserver r-tidyverse r-irkernel r-httpgd r-downlit r-xml2 r-markdown r-devtools radian
# Install bioconductor packages, all package names are lower case
conda install -n renv bioconductor-deseq2 bioconductor-edger
# Install repository packages
conda install -n renv r-qs r-fs r-tidyverse
# Remove a specific environment
conda remove --name renv --all
# Remove a software in a environment
conda remove --name renv package_name
# Export and save the conda env file with all software information
conda env export -n renv > renv.yml
# Export and save the conda env file with all software information
# without the prefix and bundle information
conda env export -n renv --no-builds > renv.yml
conda env export -n renv --no-builds --file renv.yml
# Create a new environment with a environment file
conda env create --file renv.yml
# Rename an environment
conda create --name new_env_name --clone old_env_name
conda activate new_env_name
conda remove --name old_env_name --allPixi
## Auto-activate pixi when entering project directory
eval "$(pixi completion --shell bash)" # or zshGit Version Control
## keep your original Git repository but provide a clean version to others,
git archive --format=zip --output=my-project-clean.zip main
git init
git add .
git commit -m "first commit"
git remote add origin
git push -u origin main
git config --global user.name "" # set an user name that will be associated with each history marker
git config --global user.email "" # set an email address that will be associated with each history marker
git config --global color.ui auto # set automatic command line coloring for Git for easy reviewing
git init #
# 1. Delete the local branch
git branch -d branch-name # Use -D to force delete
# 2. Delete the remote branch
git push origin --delete branch-name
# 3. Verify branches are gone
git branch -a # List all branches
echo "" # create file ""
git status # check
git add # add a file as it looks now to your next commit (stage)
git commit # commit your staged content as a new commit snapshot
### if go to the viam editing mode, i to input, esc to quit, :wq to save and quit
git log # show all commits in the current branch’s history, check versions, q to quit
touch .gitignore
git branch "" # create branch ""
git branch # check branches
git checkout "" # switch to another branch and check it out into your working directory ""
git branch -D "" # delete branch ##
git checkout -b temp # create temporary branch ##
git merage # merge the specified branch’s history into the current one
git clone # retrieve an entire repository from a hosted location via URL
git remote -V # check remove repos information
git push # Transmit local branch commits to the remote repository branch
git fetch # fetch down all the branches from that Git remote
git diff # diff of what is changed but not staged
git pull # fetch and merge any commits from the tracking remote branch
git reset # unstage a file while retaining the changes in working directory
git rm -r --cached . # ignore already added files run
## Go back to a specifi commit
git reset --hard e4098a7
## Pull remote updates and ignore local changes
git fetch origin
git reset --hard origin/your-branch-name
## Change remote URL
git remote -v # check current remote configuration
git remote set-url origin <new-repository-url>
git remote remove origin
git remote add origin <new-repository-url>
## Reset to match remote
git fetch origin
git reset --hard origin/mainWSL
- System info
## Install the neofetch package
sudo apt install neofetch -y
## Show system info
neofetch --memory_unit gib- Change sudo password
In the windows terminal, run the following command to change the sudo password:
wsl -u root
## Input the new password
passwd <username>- WSL configuration
- WSL global config:
C:\Users\<UserName>\.wslconfig - WSL distro config:
\\wsl.localhost\Ubuntu-24.04\etc\wsl.conf
- WSL global config: