Linux

Created

October 9, 2022

Modified

January 28, 2026

UNIX Shell Basics

Basic

pwd # Show current directory
cd /path/to/directory # Change directory
cd .. # Go up one level
cd - # Go to previous directory
cd ~ # Go to home directory
ls # List files and folders
ls -lhrt # List files with details, human-readable sizes, sorted by time, reverse order
mkdir folder_name # Create a new folder
mkdir -p /path/to/folder # Create nested folders
mv old_name new_name # Rename or move file/folder
rm file_name # Remove file
rm -r folder_name # Remove folder recursively
rm -f file_name # Force remove file without prompt
rm -rf folder_name # Force remove folder recursively without prompt
cp source_file destination_file # Copy file
cp -a /source/folder /destination/folder # Copy folder recursively with attributes
scp user@remote:/path/to/remote/file /local/destination/ # Copy file from remote server
scp /local/file user@remote:/path/to/remote/destination/ # Copy file to remote server
wc -l file_name # Count number of lines in a file
cat file_name # Display file content
zcat file_name.gz # Display compressed file content
less file_name # View file content with pagination
zless file_name.gz # View compressed file content with pagination
head -n 10 file_name # Show first 10 lines of a file
tail -n 10 file_name # Show last 10 lines of a file
history # Show command history
top # Display running processes
htop # Interactive process viewer (may need installation)
chmod 755 file_name # Change file permissions
chmod +x file_name # Make file executable
chown user:group file_name # Change file owner and group
tar xzvf archive.tar.gz # Extract tar.gz archive
tar czvf archive.tar.gz /path/to/folder # Create tar.gz archive
tar xjvf archive.tar.bz2 # Extract tar.bz2 archive
tar cjvf archive.tar.bz2 /path/to/folder # Create tar.bz2 archive
unzip archive.zip # Extract zip archive
zip -r archive.zip /path/to/folder # Create zip archive
ping google.com # Check network connectivity
ifconfig # Show network interfaces (may need installation)
ip addr show # Show network interfaces (modern alternative to ifconfig)
wget http://example.com/file # Download file from the internet
curl -O http://example.com/file # Download file using curl
sort file_name # Sort lines in a file
sort -u file_name # Sort and remove duplicate lines
uniq file_name # Remove duplicate lines (requires sorted input)
diff file1 file2 # Compare two files line by line
diff -r dir1 dir2 # Compare two directories recursively
ln -s /path/to/target /path/to/symlink # Create a symbolic link
tree /path/to/directory # Display directory structure as a tree (may need installation)
tree -L 2 /path/to/directory # Display directory structure up to level 2
du -h /path/to/directory # Show disk usage of a directory
du -sh /path/to/directory # Show total disk usage of a directory
du -sh /path/to/directory/*/ # Show disk usage of all subdirectories
du -h --max-depth=1 /path/to/directory | sort -h # Show disk usage of subdirectories sorted by size
rsync -av /source/folder/ /destination/folder/ # Sync folders
rsync -auzP /source/folder/ user@remote:/destination/folder/ # Sync to remote server with compression and progress
## Show current shell
echo $SHELL

## Change default shell to bash
chsh -s /bin/bash
chsh -s /bin/zsh

grep

grep Globally  search  a Regular Expression and Print: is a command-line utility for searching plain-text data sets for lines that match a specified pattern: * Search files or input for specific patterns * Supports regular expressions for complex pattern matching * Options for case sensitivity, whole word matching, line numbers, etc.

find

find is a powerful command to search for files and directories based on various criteria: * Search by name, type, size, modification time, permissions, etc. * Execute actions on found items (delete, move, etc.) * Supports complex expressions with logical operators

## Find all .txt files in current directory and subdirectories
find . -name "*.txt"

## Find files/folders by name
find /path/to/search -name "pattern" 

## Find files larger than 100MB
find /path/to/search -type f -size +100M

## Find files in pattern and list details
find ./Clean -name "*Clean.fastq.gz" -type f -exec ls -lh {} \;

cut

cut is used to extract sections from each line of input: * Extract columns or fields from text files * Specify delimiter for fields * Useful for processing structured data like CSV or TSV files * Works line by line

## Extract first column from a tab-delimited file
cut -f1 file.txt 

## Extract first and third columns from a comma-separated file
cut -d',' -f1,3 file.csv

## Extract characters from position 1 to 5
cut -c1-5 file.txt

sed

sed is a stream editor used to perform basic text transformations: * substituction, deletion, insertion, and more * works line by line, applying patterns and edits

## Remove all spaces from a string
echo "a b c" | sed 's/ //g'
#> "abc"
## Remove all occurences of "foo" with "bar"
echo "foo baz foo" | sed 's/foo/bar/g'
#> "bar baz bar"

## Delete lines containing a "pattern" pattern from file.txt
sed '/pattern/d' file.txt

## Print only lines matching a "pattern" pattern from file.txt
sed -n '/pattern/p' file.txt

## Print lines 2 to 5 of a file
sed -n '2,5p' file.txt

awk

awk used for pattern scanning and processing. * More powerful than cut for data extraction and reporting * Process text as fileds and records, support arithmetic and logic

## Print the first column of a file
awk '{print $1}' file.txt

## Sum values in the first column
awk '{sum += $1} END {print sum}' file.txt 

## Print lines where the third column is greater than 100
awk '$3 > 100' file.txt

## Print the lines that match a pattern "error"
awk '/error/' file.txt

## Print the number of lines in a file
awk 'END {print NR}' file.txt

File Compression

  • Commone tar flags:
    • -x = extract
    • -f = file
    • -v = verbose
    • -z = gzip compression
    • -j = bzip2 compression
    • -J = xz compression
    • -c = change to directory
    • -C = specify directory
    • -p = preserve permissions
## Extract .tar.gz file
tar -xf fuke.tar
tar -xzvf file.tar.gz
tar -xzf file.tgz

## Extract .tar.bz2 file
tar -xjf file.tar.bz2

## Extract .tar.xz file
tar -xJf file.tar.xz

## Extract to specific directory
tar -xzf file.tar.gz -C /path/to/directory

## List contents without extracting
tar -tzf file.tar.gz
tar -tf file.tar
## Extract zip files
unzip file.zip
unzip file.zip -d /path/to/directory

## List contents without extracting
unzip -l file.zip

## Extract .gz file
gunzip file.gz
gunzip -dk file.gz  # keep original file

## Extract .bz2 file
bunzip2 file.bz2

## Extract .xz file
unxz file.xz

## Extract .7z file
7z x file.7z

## Extract .rar file
unrar x file.rar
  • Common tar flags for compression
    • -c = create archive
    • -f = file name
    • -z = gzip compression (.tar.gz)
    • -j = bzip2 compression (.tar.bz2)
    • -J = xz compression (.tar.xz)
    • -v = verbose output
tar -cf archive.tar file1 file2 directory/

## gzip compressed
tar -czf archive.tar.gz file1 file2 directory/

## bzip2 compressed
tar -cjf archive.tar.bz2 file1 file2 directory/

## .tar.xz compressed
tar -cJf archive.tar.xz file1 file2 directory/
## Backup and Restore a Folder
tar -czvf mybackup.tar.gz myfolder
tar -xzvf mybackup.tar.gz

File Permissions

Only the user can copy, execute, and change

chmod 700 /path/to/folder

700 means:

  • Owner: 7 (read = 4, write = 2, execute = 1; 4+2+1 = 7)
  • Group: 0 (no permissions)
  • Others: 0 (no permissions)

All can read and execute, only the owner can change

chmod 755 /path/to/folder
  • Owner: 7 (read = 4, write = 2, execute = 1; 4+2+1 = 7)
  • Group: 5 (read = 4, execute = 1; 4+1 = 5)
  • Others: 5 (read = 4, execute = 1; 4+1 = 5)

Give a user named someuser read, write, and execute permissions on the /path/to/folder directory

setfacl -m u:someuser:rwx /path/to/folder

Verify the Permissions

getfacl /path/to/folder

Remove all ACL entries for a user

setfacl -x u:someuser /path/to/folder

Change owner

sudo chown username:groupname /path/to/folder

Data transfer

  • Transfering data efficiently between local and remote systems:
    • wget: Download files directly from the internet to your local machine or server.
    • scp: Securely copy files between local and remote systems over SSH.
    • rsync: Synchronize files and directories efficiently, with options for selective copying and deletion.
## Basic command
rsync -av /source/folder/ /destination/folder/

## Sync source to destination
rsync -av --delete /source/folder/ /destination/folder/

## If you need two-way sync, run both directions
rsync -av --delete /folder1/ /folder2/
rsync -av --delete /folder2/ /folder1/

## Network sync
rsync -av --delete /local/folder/ user@remote:/remote/folder/

## Copy only new files (files that don't exist in the destination)
rsync -av --ignore-existing /source/folder/ /destination/folder/

## Copy only if source is newer
rsync -av --update /source/folder/ /destination/folder/
Note
  • --archive: Preserves permissions, timestamps, symbolic links, etc.
  • --verbose: Shows progress
  • --delete: Removes files in destination that don’t exist in source
  • --dry-run: Preview changes without actually syncing
  • --update: Only copy newer file
  • --ignore-existing: Skips files that already exist in destination (regardless of age)
  • --update: Only copies if source file is newer than destination
  • --no-perms: Don’t preserve Unix permissions
  • --no-times: Don’t preserve modification times
  • --inplace: Write directly without temp files
  • Sync remote server folders to local using rsync
## Remote server details
remote_user="username"
remote_host="hpcio2" # hpc2021-io1.hku.hk,hpc2021-io2.hku.hk  

## Define folders to sync (absolute paths)
## Format: "local_absolute_path:remote_absolute_path"
folders_to_sync=(
    # "/mnt/m/Reference:/lustre1/g/path_my/Reference"
    # "/mnt/m/WES/DFSP/Annovar:/lustre1/g/path_my/pipeline/somatic_variants_calling/data/DFSP/Annovar"
    "/mnt/m/WES/SARC/BAM:/lustre1/g/path_my/pipeline/somatic_variants_calling/data/SARC/BAM"
    # Add more folders as needed in the same format
)

## Sync each folder
for folder_pair in "${folders_to_sync[@]}"; do
    # Split the pair into local and remote paths
    local_path="${folder_pair%%:*}"
    remote_path="${folder_pair#*:}"
    
    echo "Syncing ${remote_user}@${remote_host}:${remote_path} to ${local_path}"
    
    # Create local directory structure
    mkdir -p "${local_path}"
    
    # Sync the folder contents from remote to local (trailing slash on source copies contents)
    rsync -av --update --progress --inplace --no-perms --no-times \
        "${remote_user}@${remote_host}:${remote_path}/" \
        "${local_path}/"
        
    echo "-----------------------------------"
done
  • Sync local folders to remote server
## Remote server details
remote_user="username"
remote_host="hpcio1" # hpc2021-io1.hku.hk 

## Define folders to sync (absolute paths)
## Format: "local_absolute_path:remote_absolute_path"
folders_to_sync=(
    "/mnt/f/Reference:/lustre1/g/path_my/Reference"
    # Add more folders as needed in the same format
)

## Sync each folder
for folder_pair in "${folders_to_sync[@]}"; do

    # Split the pair into local and remote paths
    local_path="${folder_pair%%:*}"
    remote_path="${folder_pair#*:}"
    
    echo "Syncing ${local_path} to ${remote_user}@${remote_host}:${remote_path}"
    
    # Create remote directory structure
    ssh ${remote_user}@${remote_host} "mkdir -p ${remote_path}"
    
    # Sync the folder contents (trailing slash on source copies contents)
    rsync -av --update --progress \
        "${local_path}/" \
        "${remote_user}@${remote_host}:${remote_path}/"
        
    echo "-----------------------------------"
done

Vim Text Editor

Modes

  • Vim text editor has three main modes:
    • Esc, back to normal mode
    • i, insert mode
    • v, select mode
Command What it does Most common follow-up
i Insert before cursor typing text
a Append after cursor typing text
I Insert at beginning of line
A Append at end of line
o Open new line below typing
O Open new line above typing
Esc Back to Normal mode
v Visual mode move + operator
V Visual Line mode
Ctrl-v Visual Block mode

Editing

Command Meaning examples
x Delete character under cursor
dd Delete current line 5dd = delete 5 lines
yy Yank (copy) current line p to paste
p / P Paste after/before cursor
u Undo (most used command after Esc)
Ctrl-r Redo
. Repeat last change/command (extremely powerful)
dw / d$ Delete to end of word / end of line diw, daw
ciw / caw Change (delete + insert) inside/around word most common text object usage
“ci”” / ci( / ci{” Change inside quotes/parentheses/braces extremely common in code

Search & Replace

Command Meaning
/pattern Search forward
?pattern Search backward
n / N Next / previous match
* / # Search forward/backward for word under cursor
:%s/old/new/g Replace all occurrences (whole file)
:%s/old/new/gc Replace with confirmation

Saving & Exiting

Command Meaning
:w Save (write)
:q Quit (only if no changes)
:q! Quit without saving
:wq or :x or ZZ Save and quit
:w !sudo tee % Save file needing root (very common workaround)

Nano Text Editor

  1. Open a file with: nano filename.txt
  2. Use the arrow keys to navigate.
  3. Save the changes with ctrl + o,
  4. Confirm the filename by pressing Enter.
  5. Exit nano with ctrl + x.

Searching Text

  • To find words or phrases within a file:
  1. Initiate search with ctrl + w.
  2. Type the search term and press Enter.
  3. Exit search mode with ctrl + c.

Customizing Nano

Enhance Nano’s functionality:

nano -miA file.txt
  • -m: Enable mouse support.
  • -i: Auto-indent new lines.
  • -A: Enable syntax highlighting.

Job Schedule

OpenPBS

OpenPBS (Portable Batch System) is an open-source workload manager and job scheduler used in High-Performance Computing (HPC) environments to automate and optimize the execution of tasks across clusters and clouds.

  • PBS job script
#!/bin/bash
#PBS -N job_name                  # Job name
#PBS -l nodes=1:ppn=4             # Number of nodes and processors per node
#PBS -l walltime=01:00:00         # Walltime (hh:mm:ss)
#PBS -l mem=8gb                   # Memory per node
#PBS -j oe                        # Combine stdout and stderr
#PBS -M user@example.com          # Email address for notifications
#PBS -m abe                       # Send email on (a) abort, (b) begin, (e) end
#PBS -q batch                     # Queue named
#PBS -V                           # Export all environment variables to the job

## example
cd $PBS_O_WORKDIR                 # Change to the directory where the job was submitted
module load python/3.8.5          # Load necessary modules
python my_script.py               # Command to run your program
## Some variables
$PBS_O_WORKDIR     # The directory from which the job was submitted
$PBS_O_HOME        # The home directory of the user who submitted the job
$PBS_JOBDIR        # The working directory for the job on the compute node
$TMPDIR            # The temporary directory for the job on the compute node
$PBS_JOBID         # The unique identifier assigned to the job
$PBS_NODEFILE      # The file containing the list of nodes allocated to the job
$PBS_NUM_NODES     # The number of nodes allocated to the job
$PBS_NUM_PPN       # The number of processors per node allocated to the job
$PBS_QUEUE         # The name of the queue to which the job was submitted
$PBS_JOBNAME       # The name of the job as specified in the job script

The qstat command is used to request the status of jobs, queues, or a batch server: * qsub job_script.pbs - Submit a job * qsub -v sample="$sample" -M "$EMAIL" pbs_1_cutadapt.pbs - Submit a job with environment variables * qstat - Check job status * qstat -u guorui -a - Check all jobs for user guorui * qdel job_id - Delete a job
* qhold job_id - Hold a job
* qrls job_id - Release a held job

## List finished job details in summary
qstat -x job_id

## List finished job details in full details
qstat -H job_id
qstat -x -f job_id
qstat -x -f job_id | grepl "exec_host"

## Find out all compute nodes allocated to a job
for job_id in $(find . -name "log.*.e*" -printf "%f\n" | sed 's/.*\.e\([0-9]*\)$/\1/' | sort -n | uniq); do
    echo -n "Job ID: $job_id: "; qstat -x -f $job_id | grep exec_host
done

## Start and interactive job with 16 processors, 64GB memory, 24 hours walltime in cgs_queue
qsub -I -q cgs_queue -l nodes=hpcf3-c01:ppn=16,walltime=12:00:00,mem=64gb
qsub -I -q cgs_queue -l nodes=hpcf3-c02:ppn=16,walltime=12:00:00,mem=64gb
qsub -I -q cgs_queue -l nodes=hpcf3-c03:ppn=16,walltime=12:00:00,mem=64gb

Shell session

Tmux

  • Tmux Configuration Consider adding these aliases to your shell configuration for quick tmux access.
# Terminal aliases for tmux
alias t="tmux"
alias ta="t a -t"
alias tls="t ls"
alias tn="t new -s"
alias tk="t kill-session -t"
alias tks="t kill-server"
Note🔧 Custom Prefix Key

Change the default prefix in .tmux.conf:

set-option -g prefix C-a
bind-key C-a send-prefix
  • Session Management
Action Command/Shortcut
List sessions tmux list-sessions
Attach to session tmux attach-session -t target-session
Switch between sessions Ctrl + A + s
Switch to latest session Ctrl + A + l
Detach from session Ctrl + A + d
  • Window Management
Action Shortcut
List all windows Ctrl + A + w
Rename current window Ctrl + A + ,
Switch to next window Ctrl + A + →
Switch to previous window Ctrl + A + ←
Create new window Ctrl + A + c
Kill current window Ctrl + A + q
  • Pane Management
Action Shortcut
Switch to pane above Shift + ↑
Switch to pane below Shift + ↓
Switch to pane left Shift + ←
Switch to pane right Shift + →
Kill current pane Ctrl + A + x
  • Copy Mode
Action Shortcut
Enter copy mode Drag mouse to select text
Paste copied text Ctrl + A + ]

Screen

Tip💡 Screen Aliases

Add these aliases to your shell configuration for easier screen management.

# Terminal aliases for screen
alias s="screen"       # start a screen session
alias ss="s -S"        # start a named screen session
alias sr="s -r"        # reattach to a screen session
alias sls="s -ls"      # list current running screen sessions
  • Basic Commands
Action Command
Start screen session screen
Start named session screen -S session_name
Reattach to session screen -r
List running sessions screen -ls
  • Window Management
Action Shortcut
Create new window Ctrl + A + C
Kill current window Ctrl + A + K
List all windows Ctrl + A + W
Go to window 0-9 Ctrl + A + 0-9
Go to next window Ctrl + A + N
Toggle between windows Ctrl + A + Ctrl + A
Rename current window Ctrl + A + A
  • Region Management
Action Shortcut
Split horizontally Ctrl + A + S
Split vertically Ctrl + A + \|
Switch between regions Ctrl + A + Tab
Close all regions but current Ctrl + A + Q
Close current region Ctrl + A + X
  • Session & Copy Mode
Action Shortcut
Detach from session Ctrl + A + D
Start copy mode Ctrl + A + [
Paste copied text Ctrl + A + ]
Show help Ctrl + A + ?
Quit screen Ctrl + A + Ctrl + \

Software Environment

Conda

  • Installation

# Miniforge linux
wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh"

# Miniforge Mac
wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh"

# Install it
bash Miniforge3-$(uname)-$(uname -m).sh

# Add conda to PATH if found
if [ -f "$HOME/miniconda3/etc/profile.d/conda.sh" ]; then
    . $HOME/miniconda3/etc/profile.d/conda.sh
fi
  • Configuration

# Init
conda init --all

# Use mamba for faster solving
conda update -n base conda
conda install -n base conda-libmamba-solver

# Config
conda config --set solver libmamba
conda config --set always_yes true
conda config --set auto_activate_base false

# Setup channels
conda config --add channels bioconda
conda config --add channels conda-forge
#conda config --set channel_priority strict

# Install software from a specific channel
conda install -c conda-forge numpy
  • Create Environment
# Create R environment with specific R version and packages
conda create -n renv r=4.5 r-languageserver r-tidyverse r-irkernel r-httpgd r-downlit r-xml2 r-markdown r-devtools radian

# Install bioconductor packages, all package names are lower case
conda install -n renv bioconductor-deseq2 bioconductor-edger

# Install repository packages
conda install -n renv r-qs r-fs r-tidyverse

# Remove a specific environment
conda remove --name renv --all

# Remove a software in a environment
conda remove --name renv package_name

# Export and save the conda env file with all software information
conda env export -n renv > renv.yml

# Export and save the conda env file with all software information
# without the prefix and bundle information
conda env export -n renv --no-builds > renv.yml
conda env export -n renv --no-builds --file renv.yml

# Create a new environment with a environment file
conda env create --file renv.yml

# Rename an environment
conda create --name new_env_name --clone old_env_name
conda activate new_env_name
conda remove --name old_env_name --all

Pixi

## Auto-activate pixi when entering project directory
eval "$(pixi completion --shell bash)"  # or zsh

Git Version Control

## keep your original Git repository but provide a clean version to others, 
git archive --format=zip --output=my-project-clean.zip main

git init
git add .
git commit -m "first commit"
git remote add origin
git push -u origin main

git config --global user.name "" # set an user name that will be associated with each history marker
git config --global user.email "" # set an email address that will be associated with each history marker
git config --global color.ui auto # set automatic command line coloring for Git for easy reviewing
git init #


# 1. Delete the local branch
git branch -d branch-name     # Use -D to force delete

# 2. Delete the remote branch
git push origin --delete branch-name

# 3. Verify branches are gone
git branch -a                # List all branches

echo "" # create file ""
git status # check 
git add # add a file as it looks now to your next commit (stage)
git commit # commit your staged content as a new commit snapshot
### if go to the viam editing mode, i to input, esc to quit, :wq to save and quit
git log # show all commits in the current branch’s history, check versions, q to quit
touch .gitignore
git branch "" # create branch ""
git branch # check branches
git checkout "" # switch to another branch and check it out into your working directory ""
git branch -D "" # delete branch ##
git checkout -b temp # create temporary branch ##
git merage # merge the specified branch’s history into the current one
git clone # retrieve an entire repository from a hosted location via URL
git remote -V # check remove repos information
git push # Transmit local branch commits to the remote repository branch
git fetch # fetch down all the branches from that Git remote
git diff # diff of what is changed but not staged
git pull # fetch and merge any commits from the tracking remote branch
git reset # unstage a file while retaining the changes in working directory
git rm -r --cached . # ignore already added files run

## Go back to a specifi commit
git reset --hard e4098a7

## Pull remote updates and ignore local changes
git fetch origin
git reset --hard origin/your-branch-name

## Change remote URL
git remote -v # check current remote configuration
git remote set-url origin <new-repository-url>
git remote remove origin
git remote add origin <new-repository-url>

## Reset to match remote
git fetch origin
git reset --hard origin/main

WSL

  • System info

## Install the neofetch package
sudo apt install neofetch -y

## Show system info
neofetch --memory_unit gib
  • Change sudo password

In the windows terminal, run the following command to change the sudo password:


wsl -u root

## Input the new password
passwd <username>
  • WSL configuration
    • WSL global config: C:\Users\<UserName>\.wslconfig
    • WSL distro config: \\wsl.localhost\Ubuntu-24.04\etc\wsl.conf
Back to top