Linux

UNIX Shell Basics

Basic

pwd # Show current directory
cd /path/to/directory # Change directory
cd .. # Go up one level
cd - # Go to previous directory
cd ~ # Go to home directory
ls # List files and folders
ls -lhrt # List files with details, human-readable sizes, sorted by time, reverse order
mkdir folder_name # Create a new folder
mkdir -p /path/to/folder # Create nested folders
mv old_name new_name # Rename or move file/folder
rm file_name # Remove file
rm -r folder_name # Remove folder recursively
rm -f file_name # Force remove file without prompt
rm -rf folder_name # Force remove folder recursively without prompt
cp source_file destination_file # Copy file
cp -a /source/folder /destination/folder # Copy folder recursively with attributes
scp user@remote:/path/to/remote/file /local/destination/ # Copy file from remote server
scp /local/file user@remote:/path/to/remote/destination/ # Copy file to remote server

wc -l file_name # Count number of lines in a file
cat file_name # Display file content
zcat file_name.gz # Display compressed file content
less file_name # View file content with pagination
zless file_name.gz # View compressed file content with pagination
head -n 10 file_name # Show first 10 lines of a file
tail -n 10 file_name # Show last 10 lines of a file
history # Show command history
top # Display running processes
htop # Interactive process viewer (may need installation)
chmod 755 file_name # Change file permissions
chmod +x file_name # Make file executable
chown user:group file_name # Change file owner and group
tar xzvf archive.tar.gz # Extract tar.gz archive
tar czvf archive.tar.gz /path/to/folder # Create tar.gz archive
tar xjvf archive.tar.bz2 # Extract tar.bz2 archive
tar cjvf archive.tar.bz2 /path/to/folder # Create tar.bz2 archive
unzip archive.zip # Extract zip archive
zip -r archive.zip /path/to/folder # Create zip archive
ping google.com # Check network connectivity
ifconfig # Show network interfaces (may need installation)
ip addr show # Show network interfaces (modern alternative to ifconfig)
wget http://example.com/file # Download file from the internet
curl -O http://example.com/file # Download file using curl

sort file_name # Sort lines in a file
sort -u file_name # Sort and remove duplicate lines
uniq file_name # Remove duplicate lines (requires sorted input)
diff file1 file2 # Compare two files line by line
diff -r dir1 dir2 # Compare two directories recursively
ln -s /path/to/target /path/to/symlink # Create a symbolic link
tree /path/to/directory # Display directory structure as a tree (may need installation)
tree -L 2 /path/to/directory # Display directory structure up to level 2
du -h /path/to/directory # Show disk usage of a directory
du -sh /path/to/directory # Show total disk usage of a directory
du -sh /path/to/directory/*/ # Show disk usage of all subdirectories
du -h --max-depth=1 /path/to/directory | sort -h # Show disk usage of subdirectories sorted by size
rsync -av /source/folder/ /destination/folder/ # Sync folders
rsync -auzP /source/folder/ user@remote:/destination/folder/ # Sync to remote server with compression and progress

## Show current shell
echo $SHELL

## Change default shell to bash
chsh -s /bin/bash
chsh -s /bin/zsh

grep

grep Globally search a Regular Expression and Print: is a command-line utility for searching plain-text data sets for lines that match a specified pattern: * Search files or input for specific patterns * Supports regular expressions for complex pattern matching * Options for case sensitivity, whole word matching, line numbers, etc.

find

find is a powerful command to search for files and directories based on various criteria: * Search by name, type, size, modification time, permissions, etc. * Execute actions on found items (delete, move, etc.) * Supports complex expressions with logical operators

## Find all .txt files in current directory and subdirectories
find . -name "*.txt"

## Find files/folders by name
find /path/to/search -name "pattern" 

## Find files larger than 100MB
find /path/to/search -type f -size +100M

## Find files in pattern and list details
find ./Clean -name "*Clean.fastq.gz" -type f -exec ls -lh {} \;

cut

cut is used to extract sections from each line of input: * Extract columns or fields from text files * Specify delimiter for fields * Useful for processing structured data like CSV or TSV files * Works line by line

## Extract first column from a tab-delimited file
cut -f1 file.txt 

## Extract first and third columns from a comma-separated file
cut -d',' -f1,3 file.csv

## Extract characters from position 1 to 5
cut -c1-5 file.txt

sed

sed is a stream editor used to perform basic text transformations: * substituction, deletion, insertion, and more * works line by line, applying patterns and edits

## Remove all spaces from a string
echo "a b c" | sed 's/ //g'
#> "abc"
## Remove all occurences of "foo" with "bar"
echo "foo baz foo" | sed 's/foo/bar/g'
#> "bar baz bar"

## Delete lines containing a "pattern" pattern from file.txt
sed '/pattern/d' file.txt

## Print only lines matching a "pattern" pattern from file.txt
sed -n '/pattern/p' file.txt

## Print lines 2 to 5 of a file
sed -n '2,5p' file.txt

awk

awk used for pattern scanning and processing. * More powerful than cut for data extraction and reporting * Process text as fileds and records, support arithmetic and logic

## Print the first column of a file
awk '{print $1}' file.txt

## Sum values in the first column
awk '{sum += $1} END {print sum}' file.txt 

## Print lines where the third column is greater than 100
awk '$3 > 100' file.txt

## Print the lines that match a pattern "error"
awk '/error/' file.txt

## Print the number of lines in a file
awk 'END {print NR}' file.txt

File Compression

Commone tar flags:
- -x = extract
- -f = file
- -v = verbose
- -z = gzip compression
- -j = bzip2 compression
- -J = xz compression
- -c = change to directory
- -C = specify directory
- -p = preserve permissions

## Extract .tar.gz file
tar -xf fuke.tar
tar -xzvf file.tar.gz
tar -xzf file.tgz

## Extract .tar.bz2 file
tar -xjf file.tar.bz2

## Extract .tar.xz file
tar -xJf file.tar.xz

## Extract to specific directory
tar -xzf file.tar.gz -C /path/to/directory

## List contents without extracting
tar -tzf file.tar.gz
tar -tf file.tar

## Extract zip files
unzip file.zip
unzip file.zip -d /path/to/directory

## List contents without extracting
unzip -l file.zip

## Extract .gz file
gunzip file.gz
gunzip -dk file.gz  # keep original file

## Extract .bz2 file
bunzip2 file.bz2

## Extract .xz file
unxz file.xz

## Extract .7z file
7z x file.7z

## Extract .rar file
unrar x file.rar

Common tar flags for compression
- -c = create archive
- -f = file name
- -z = gzip compression (.tar.gz)
- -j = bzip2 compression (.tar.bz2)
- -J = xz compression (.tar.xz)
- -v = verbose output

tar -cf archive.tar file1 file2 directory/

## gzip compressed
tar -czf archive.tar.gz file1 file2 directory/

## bzip2 compressed
tar -cjf archive.tar.bz2 file1 file2 directory/

## .tar.xz compressed
tar -cJf archive.tar.xz file1 file2 directory/

## Backup and Restore a Folder
tar -czvf mybackup.tar.gz myfolder
tar -xzvf mybackup.tar.gz

File Permissions

Only the user can copy, execute, and change

chmod 700 /path/to/folder

700 means:

Owner: 7 (read = 4, write = 2, execute = 1; 4+2+1 = 7)
Group: 0 (no permissions)
Others: 0 (no permissions)

All can read and execute, only the owner can change

chmod 755 /path/to/folder

Owner: 7 (read = 4, write = 2, execute = 1; 4+2+1 = 7)
Group: 5 (read = 4, execute = 1; 4+1 = 5)
Others: 5 (read = 4, execute = 1; 4+1 = 5)

Give a user named someuser read, write, and execute permissions on the /path/to/folder directory

setfacl -m u:someuser:rwx /path/to/folder

Verify the Permissions

getfacl /path/to/folder

Remove all ACL entries for a user

setfacl -x u:someuser /path/to/folder

Change owner

sudo chown username:groupname /path/to/folder

Data transfer

Transfering data efficiently between local and remote systems:
- wget: Download files directly from the internet to your local machine or server.
- scp: Securely copy files between local and remote systems over SSH.
- rsync: Synchronize files and directories efficiently, with options for selective copying and deletion.

## Basic command
rsync -av /source/folder/ /destination/folder/

## Sync source to destination
rsync -av --delete /source/folder/ /destination/folder/

## If you need two-way sync, run both directions
rsync -av --delete /folder1/ /folder2/
rsync -av --delete /folder2/ /folder1/

## Network sync
rsync -av --delete /local/folder/ user@remote:/remote/folder/

## Copy only new files (files that don't exist in the destination)
rsync -av --ignore-existing /source/folder/ /destination/folder/

## Copy only if source is newer
rsync -av --update /source/folder/ /destination/folder/

Note

--archive: Preserves permissions, timestamps, symbolic links, etc.
--verbose: Shows progress
--delete: Removes files in destination that don’t exist in source
--dry-run: Preview changes without actually syncing
--update: Only copy newer file
--ignore-existing: Skips files that already exist in destination (regardless of age)
--update: Only copies if source file is newer than destination
--no-perms: Don’t preserve Unix permissions
--no-times: Don’t preserve modification times
--inplace: Write directly without temp files

Sync remote server folders to local using rsync

## Remote server details
remote_user="username"
remote_host="hpcio2" # hpc2021-io1.hku.hk,hpc2021-io2.hku.hk  

## Define folders to sync (absolute paths)
## Format: "local_absolute_path:remote_absolute_path"
folders_to_sync=(
    # "/mnt/m/Reference:/lustre1/g/path_my/Reference"
    # "/mnt/m/WES/DFSP/Annovar:/lustre1/g/path_my/pipeline/somatic_variants_calling/data/DFSP/Annovar"
    "/mnt/m/WES/SARC/BAM:/lustre1/g/path_my/pipeline/somatic_variants_calling/data/SARC/BAM"
    # Add more folders as needed in the same format
)

## Sync each folder
for folder_pair in "${folders_to_sync[@]}"; do
    # Split the pair into local and remote paths
    local_path="${folder_pair%%:*}"
    remote_path="${folder_pair#*:}"
    
    echo "Syncing ${remote_user}@${remote_host}:${remote_path} to ${local_path}"
    
    # Create local directory structure
    mkdir -p "${local_path}"
    
    # Sync the folder contents from remote to local (trailing slash on source copies contents)
    rsync -av --update --progress --inplace --no-perms --no-times \
        "${remote_user}@${remote_host}:${remote_path}/" \
        "${local_path}/"
        
    echo "-----------------------------------"
done

Sync local folders to remote server

## Remote server details
remote_user="username"
remote_host="hpcio1" # hpc2021-io1.hku.hk 

## Define folders to sync (absolute paths)
## Format: "local_absolute_path:remote_absolute_path"
folders_to_sync=(
    "/mnt/f/Reference:/lustre1/g/path_my/Reference"
    # Add more folders as needed in the same format
)

## Sync each folder
for folder_pair in "${folders_to_sync[@]}"; do

    # Split the pair into local and remote paths
    local_path="${folder_pair%%:*}"
    remote_path="${folder_pair#*:}"
    
    echo "Syncing ${local_path} to ${remote_user}@${remote_host}:${remote_path}"
    
    # Create remote directory structure
    ssh ${remote_user}@${remote_host} "mkdir -p ${remote_path}"
    
    # Sync the folder contents (trailing slash on source copies contents)
    rsync -av --update --progress \
        "${local_path}/" \
        "${remote_user}@${remote_host}:${remote_path}/"
        
    echo "-----------------------------------"
done

Vim Text Editor

Modes

Vim text editor has three main modes:
- Esc, back to normal mode
- i, insert mode
- v, select mode

Command	What it does	Most common follow-up
i	Insert before cursor	typing text
a	Append after cursor	typing text
I	Insert at beginning of line	—
A	Append at end of line	—
o	Open new line below	typing
O	Open new line above	typing
Esc	Back to Normal mode	—
v	Visual mode	move + operator
V	Visual Line mode	—
Ctrl-v	Visual Block mode	—

Navigation

Command	Movement	Common combos
h j k l	← ↓ ↑ →	—
w / b	forward/backward by word start	ciw, daw
e / ge	forward/backward to word end	—
W / B	forward/backward by big word	(ignores punctuation)
0	Start of line	—
^	First non-blank character	—
$	End of line	—
gg	Top of file	—
G	Bottom of file	50G → line 50
Ctrl-u / Ctrl-d	Half page up/down	—
Ctrl-f / Ctrl-b	Full page down/up	—

Editing

Command	Meaning	examples
x	Delete character under cursor	—
dd	Delete current line	5dd = delete 5 lines
yy	Yank (copy) current line	p to paste
p / P	Paste after/before cursor	—
u	Undo	(most used command after Esc)
Ctrl-r	Redo	—
.	Repeat last change/command	(extremely powerful)
dw / d$	Delete to end of word / end of line	diw, daw
ciw / caw	Change (delete + insert) inside/around word	most common text object usage
“ci”” / ci( / ci{”	Change inside quotes/parentheses/braces	extremely common in code

Search & Replace

Command	Meaning
/pattern	Search forward
?pattern	Search backward
n / N	Next / previous match
* / #	Search forward/backward for word under cursor
:%s/old/new/g	Replace all occurrences (whole file)
:%s/old/new/gc	Replace with confirmation

Saving & Exiting

Command	Meaning
:w	Save (write)
:q	Quit (only if no changes)
:q!	Quit without saving
:wq or :x or ZZ	Save and quit
:w !sudo tee %	Save file needing root (very common workaround)

Nano Text Editor

Open a file with: nano filename.txt
Use the arrow keys to navigate.
Save the changes with ctrl + o,
Confirm the filename by pressing Enter.
Exit nano with ctrl + x.

Searching Text

To find words or phrases within a file:

Initiate search with ctrl + w.
Type the search term and press Enter.
Exit search mode with ctrl + c.

Customizing Nano

Enhance Nano’s functionality:

nano -miA file.txt

-m: Enable mouse support.
-i: Auto-indent new lines.
-A: Enable syntax highlighting.

Job Schedule

OpenPBS

OpenPBS (Portable Batch System) is an open-source workload manager and job scheduler used in High-Performance Computing (HPC) environments to automate and optimize the execution of tasks across clusters and clouds.

PBS job script

#!/bin/bash
#PBS -N job_name                  # Job name
#PBS -l nodes=1:ppn=4             # Number of nodes and processors per node
#PBS -l walltime=01:00:00         # Walltime (hh:mm:ss)
#PBS -l mem=8gb                   # Memory per node
#PBS -j oe                        # Combine stdout and stderr
#PBS -M user@example.com          # Email address for notifications
#PBS -m abe                       # Send email on (a) abort, (b) begin, (e) end
#PBS -q batch                     # Queue named
#PBS -V                           # Export all environment variables to the job

## example
cd $PBS_O_WORKDIR                 # Change to the directory where the job was submitted
module load python/3.8.5          # Load necessary modules
python my_script.py               # Command to run your program

## Some variables
$PBS_O_WORKDIR     # The directory from which the job was submitted
$PBS_O_HOME        # The home directory of the user who submitted the job
$PBS_JOBDIR        # The working directory for the job on the compute node
$TMPDIR            # The temporary directory for the job on the compute node
$PBS_JOBID         # The unique identifier assigned to the job
$PBS_NODEFILE      # The file containing the list of nodes allocated to the job
$PBS_NUM_NODES     # The number of nodes allocated to the job
$PBS_NUM_PPN       # The number of processors per node allocated to the job
$PBS_QUEUE         # The name of the queue to which the job was submitted
$PBS_JOBNAME       # The name of the job as specified in the job script

The qstat command is used to request the status of jobs, queues, or a batch server: * qsub job_script.pbs - Submit a job * qsub -v sample="$sample" -M "$EMAIL" pbs_1_cutadapt.pbs - Submit a job with environment variables * qstat - Check job status * qstat -u guorui -a - Check all jobs for user guorui * qdel job_id - Delete a job
* qhold job_id - Hold a job
* qrls job_id - Release a held job

## List finished job details in summary
qstat -x job_id

## List finished job details in full details
qstat -H job_id
qstat -x -f job_id
qstat -x -f job_id | grepl "exec_host"

## Find out all compute nodes allocated to a job
for job_id in $(find . -name "log.*.e*" -printf "%f\n" | sed 's/.*\.e\([0-9]*\)$/\1/' | sort -n | uniq); do
    echo -n "Job ID: $job_id: "; qstat -x -f $job_id | grep exec_host
done

## Start and interactive job with 16 processors, 64GB memory, 24 hours walltime in cgs_queue
qsub -I -q cgs_queue -l nodes=hpcf3-c01:ppn=16,walltime=12:00:00,mem=64gb
qsub -I -q cgs_queue -l nodes=hpcf3-c02:ppn=16,walltime=12:00:00,mem=64gb
qsub -I -q cgs_queue -l nodes=hpcf3-c03:ppn=16,walltime=12:00:00,mem=64gb

Shell session

Tmux

Tmux Configuration Consider adding these aliases to your shell configuration for quick tmux access.

# Terminal aliases for tmux
alias t="tmux"
alias ta="t a -t"
alias tls="t ls"
alias tn="t new -s"
alias tk="t kill-session -t"
alias tks="t kill-server"

🔧 Custom Prefix Key

Change the default prefix in .tmux.conf:

set-option -g prefix C-a
bind-key C-a send-prefix

Session Management

Action	Command/Shortcut
List sessions	`tmux list-sessions`
Attach to session	`tmux attach-session -t target-session`
Switch between sessions	`Ctrl + A + s`
Switch to latest session	`Ctrl + A + l`
Detach from session	`Ctrl + A + d`

Window Management

Action	Shortcut
List all windows	`Ctrl + A + w`
Rename current window	`Ctrl + A + ,`
Switch to next window	`Ctrl + A + →`
Switch to previous window	`Ctrl + A + ←`
Create new window	`Ctrl + A + c`
Kill current window	`Ctrl + A + q`

Pane Management

Action	Shortcut
Switch to pane above	`Shift + ↑`
Switch to pane below	`Shift + ↓`
Switch to pane left	`Shift + ←`
Switch to pane right	`Shift + →`
Kill current pane	`Ctrl + A + x`

Copy Mode

Action	Shortcut
Enter copy mode	Drag mouse to select text
Paste copied text	`Ctrl + A + ]`

Screen

💡 Screen Aliases

Add these aliases to your shell configuration for easier screen management.

# Terminal aliases for screen
alias s="screen"       # start a screen session
alias ss="s -S"        # start a named screen session
alias sr="s -r"        # reattach to a screen session
alias sls="s -ls"      # list current running screen sessions

Basic Commands

Action	Command
Start screen session	`screen`
Start named session	`screen -S session_name`
Reattach to session	`screen -r`
List running sessions	`screen -ls`

Window Management

Action	Shortcut
Create new window	`Ctrl + A + C`
Kill current window	`Ctrl + A + K`
List all windows	`Ctrl + A + W`
Go to window 0-9	`Ctrl + A + 0-9`
Go to next window	`Ctrl + A + N`
Toggle between windows	`Ctrl + A + Ctrl + A`
Rename current window	`Ctrl + A + A`

Region Management

Action	Shortcut
Split horizontally	`Ctrl + A + S`
Split vertically	`Ctrl + A + \\|`
Switch between regions	`Ctrl + A + Tab`
Close all regions but current	`Ctrl + A + Q`
Close current region	`Ctrl + A + X`

Session & Copy Mode

Action	Shortcut
Detach from session	`Ctrl + A + D`
Start copy mode	`Ctrl + A + [`
Paste copied text	`Ctrl + A + ]`
Show help	`Ctrl + A + ?`
Quit screen	`Ctrl + A + Ctrl + \`

Additional Resources

VSCode: Official Keyboard Shortcuts Reference
Vim: Vim Cheat Sheet
Tmux: Tmux Cheat Sheet
Terminal: Bash Reference Manual

Software Environment

Conda

Installation


# Miniforge linux
wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh"

# Miniforge Mac
wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh"

# Install it
bash Miniforge3-$(uname)-$(uname -m).sh

# Add conda to PATH if found
if [ -f "$HOME/miniconda3/etc/profile.d/conda.sh" ]; then
    . $HOME/miniconda3/etc/profile.d/conda.sh
fi

Configuration


# Init
conda init --all

# Use mamba for faster solving
conda update -n base conda
conda install -n base conda-libmamba-solver

# Config
conda config --set solver libmamba
conda config --set always_yes true
conda config --set auto_activate_base false

# Setup channels
conda config --add channels bioconda
conda config --add channels conda-forge
#conda config --set channel_priority strict

# Install software from a specific channel
conda install -c conda-forge numpy

Create Environment

# Create R environment with specific R version and packages
conda create -n renv r=4.5 r-languageserver r-tidyverse r-irkernel r-httpgd r-downlit r-xml2 r-markdown r-devtools radian

# Install bioconductor packages, all package names are lower case
conda install -n renv bioconductor-deseq2 bioconductor-edger

# Install repository packages
conda install -n renv r-qs r-fs r-tidyverse

# Remove a specific environment
conda remove --name renv --all

# Remove a software in a environment
conda remove --name renv package_name

# Export and save the conda env file with all software information
conda env export -n renv > renv.yml

# Export and save the conda env file with all software information
# without the prefix and bundle information
conda env export -n renv --no-builds > renv.yml
conda env export -n renv --no-builds --file renv.yml

# Create a new environment with a environment file
conda env create --file renv.yml

# Rename an environment
conda create --name new_env_name --clone old_env_name
conda activate new_env_name
conda remove --name old_env_name --all

Pixi

## Auto-activate pixi when entering project directory
eval "$(pixi completion --shell bash)"  # or zsh

Git Version Control

## keep your original Git repository but provide a clean version to others, 
git archive --format=zip --output=my-project-clean.zip main

git init
git add .
git commit -m "first commit"
git remote add origin
git push -u origin main

git config --global user.name "" # set an user name that will be associated with each history marker
git config --global user.email "" # set an email address that will be associated with each history marker
git config --global color.ui auto # set automatic command line coloring for Git for easy reviewing
git init #


# 1. Delete the local branch
git branch -d branch-name     # Use -D to force delete

# 2. Delete the remote branch
git push origin --delete branch-name

# 3. Verify branches are gone
git branch -a                # List all branches

echo "" # create file ""
git status # check 
git add # add a file as it looks now to your next commit (stage)
git commit # commit your staged content as a new commit snapshot
### if go to the viam editing mode, i to input, esc to quit, :wq to save and quit
git log # show all commits in the current branch’s history, check versions, q to quit
touch .gitignore
git branch "" # create branch ""
git branch # check branches
git checkout "" # switch to another branch and check it out into your working directory ""
git branch -D "" # delete branch ##
git checkout -b temp # create temporary branch ##
git merage # merge the specified branch’s history into the current one
git clone # retrieve an entire repository from a hosted location via URL
git remote -V # check remove repos information
git push # Transmit local branch commits to the remote repository branch
git fetch # fetch down all the branches from that Git remote
git diff # diff of what is changed but not staged
git pull # fetch and merge any commits from the tracking remote branch
git reset # unstage a file while retaining the changes in working directory
git rm -r --cached . # ignore already added files run

## Go back to a specifi commit
git reset --hard e4098a7

## Pull remote updates and ignore local changes
git fetch origin
git reset --hard origin/your-branch-name

## Change remote URL
git remote -v # check current remote configuration
git remote set-url origin <new-repository-url>
git remote remove origin
git remote add origin <new-repository-url>

## Reset to match remote
git fetch origin
git reset --hard origin/main

WSL

System info


## Install the neofetch package
sudo apt install neofetch -y

## Show system info
neofetch --memory_unit gib

Change sudo password

In the windows terminal, run the following command to change the sudo password:


wsl -u root

## Input the new password
passwd <username>

WSL configuration
- WSL global config: C:\Users\<UserName>\.wslconfig
- WSL distro config: \\wsl.localhost\Ubuntu-24.04\etc\wsl.conf

--- title: Linux date: 2022-10-09 published-title: Created date-modified: last-modified title-block-banner: "#212529" # toc: true # toc-location: left toc-title: "Contents" execute: eval: false format: html: code-tools: source: true toggle: true --- ## UNIX Shell Basics ### Basic ```bash pwd # Show current directory cd /path/to/directory # Change directory cd .. # Go up one level cd - # Go to previous directory cd ~ # Go to home directory ls # List files and folders ls -lhrt # List files with details, human-readable sizes, sorted by time, reverse order mkdir folder_name # Create a new folder mkdir -p /path/to/folder # Create nested folders mv old_name new_name # Rename or move file/folder rm file_name # Remove file rm -r folder_name # Remove folder recursively rm -f file_name # Force remove file without prompt rm -rf folder_name # Force remove folder recursively without prompt cp source_file destination_file # Copy file cp -a /source/folder /destination/folder # Copy folder recursively with attributes scp user@remote:/path/to/remote/file /local/destination/ # Copy file from remote server scp /local/file user@remote:/path/to/remote/destination/ # Copy file to remote server ``` ```bash wc -l file_name # Count number of lines in a file cat file_name # Display file content zcat file_name.gz # Display compressed file content less file_name # View file content with pagination zless file_name.gz # View compressed file content with pagination head -n 10 file_name # Show first 10 lines of a file tail -n 10 file_name # Show last 10 lines of a file history # Show command history top # Display running processes htop # Interactive process viewer (may need installation) chmod 755 file_name # Change file permissions chmod +x file_name # Make file executable chown user:group file_name # Change file owner and group tar xzvf archive.tar.gz # Extract tar.gz archive tar czvf archive.tar.gz /path/to/folder # Create tar.gz archive tar xjvf archive.tar.bz2 # Extract tar.bz2 archive tar cjvf archive.tar.bz2 /path/to/folder # Create tar.bz2 archive unzip archive.zip # Extract zip archive zip -r archive.zip /path/to/folder # Create zip archive ping google.com # Check network connectivity ifconfig # Show network interfaces (may need installation) ip addr show # Show network interfaces (modern alternative to ifconfig) wget http://example.com/file # Download file from the internet curl -O http://example.com/file # Download file using curl ``` ```bash sort file_name # Sort lines in a file sort -u file_name # Sort and remove duplicate lines uniq file_name # Remove duplicate lines (requires sorted input) diff file1 file2 # Compare two files line by line diff -r dir1 dir2 # Compare two directories recursively ln -s /path/to/target /path/to/symlink # Create a symbolic link tree /path/to/directory # Display directory structure as a tree (may need installation) tree -L 2 /path/to/directory # Display directory structure up to level 2 du -h /path/to/directory # Show disk usage of a directory du -sh /path/to/directory # Show total disk usage of a directory du -sh /path/to/directory/*/ # Show disk usage of all subdirectories du -h --max-depth=1 /path/to/directory | sort -h # Show disk usage of subdirectories sorted by size rsync -av /source/folder/ /destination/folder/ # Sync folders rsync -auzP /source/folder/ user@remote:/destination/folder/ # Sync to remote server with compression and progress ``` ```bash ## Show current shell echo $SHELL ## Change default shell to bash chsh -s /bin/bash chsh -s /bin/zsh ``` ### grep `grep` Globally search a Regular Expression and Print: is a command-line utility for searching plain-text data sets for lines that match a specified pattern: * Search files or input for specific patterns * Supports regular expressions for complex pattern matching * Options for case sensitivity, whole word matching, line numbers, etc. ### find `find` is a powerful command to search for files and directories based on various criteria: * Search by name, type, size, modification time, permissions, etc. * Execute actions on found items (delete, move, etc.) * Supports complex expressions with logical operators ```bash ## Find all .txt files in current directory and subdirectories find . -name "*.txt" ## Find files/folders by name find /path/to/search -name "pattern" ## Find files larger than 100MB find /path/to/search -type f -size +100M ## Find files in pattern and list details find ./Clean -name "*Clean.fastq.gz" -type f -exec ls -lh {} \; ``` ### cut `cut` is used to extract sections from each line of input: * Extract columns or fields from text files * Specify delimiter for fields * Useful for processing structured data like CSV or TSV files * Works line by line ```bash ## Extract first column from a tab-delimited file cut -f1 file.txt ## Extract first and third columns from a comma-separated file cut -d',' -f1,3 file.csv ## Extract characters from position 1 to 5 cut -c1-5 file.txt ``` ### sed `sed` is a stream editor used to perform basic text transformations: * substituction, deletion, insertion, and more * works line by line, applying patterns and edits ```bash ## Remove all spaces from a string echo "a b c" | sed 's/ //g' #> "abc" ## Remove all occurences of "foo" with "bar" echo "foo baz foo" | sed 's/foo/bar/g' #> "bar baz bar" ## Delete lines containing a "pattern" pattern from file.txt sed '/pattern/d' file.txt ## Print only lines matching a "pattern" pattern from file.txt sed -n '/pattern/p' file.txt ## Print lines 2 to 5 of a file sed -n '2,5p' file.txt ``` ### awk `awk` used for pattern scanning and processing. * More powerful than `cut` for data extraction and reporting * Process text as fileds and records, support arithmetic and logic ```bash ## Print the first column of a file awk '{print $1}' file.txt ## Sum values in the first column awk '{sum += $1} END {print sum}' file.txt ## Print lines where the third column is greater than 100 awk '$3 > 100' file.txt ## Print the lines that match a pattern "error" awk '/error/' file.txt ## Print the number of lines in a file awk 'END {print NR}' file.txt ``` ## File Compression * Commone tar flags: + `-x` = extract + `-f` = file + `-v` = verbose + `-z` = gzip compression + `-j` = bzip2 compression + `-J` = xz compression + `-c` = change to directory + `-C` = specify directory + `-p` = preserve permissions ```bash ## Extract .tar.gz file tar -xf fuke.tar tar -xzvf file.tar.gz tar -xzf file.tgz ## Extract .tar.bz2 file tar -xjf file.tar.bz2 ## Extract .tar.xz file tar -xJf file.tar.xz ## Extract to specific directory tar -xzf file.tar.gz -C /path/to/directory ## List contents without extracting tar -tzf file.tar.gz tar -tf file.tar ``` ```bash ## Extract zip files unzip file.zip unzip file.zip -d /path/to/directory ## List contents without extracting unzip -l file.zip ## Extract .gz file gunzip file.gz gunzip -dk file.gz # keep original file ## Extract .bz2 file bunzip2 file.bz2 ## Extract .xz file unxz file.xz ## Extract .7z file 7z x file.7z ## Extract .rar file unrar x file.rar ``` * Common tar flags for compression * `-c` = create archive * `-f` = file name * `-z` = gzip compression (.tar.gz) * `-j` = bzip2 compression (.tar.bz2) * `-J` = xz compression (.tar.xz) * `-v` = verbose output ```bash tar -cf archive.tar file1 file2 directory/ ## gzip compressed tar -czf archive.tar.gz file1 file2 directory/ ## bzip2 compressed tar -cjf archive.tar.bz2 file1 file2 directory/ ## .tar.xz compressed tar -cJf archive.tar.xz file1 file2 directory/ ``` ```bash ## Backup and Restore a Folder tar -czvf mybackup.tar.gz myfolder tar -xzvf mybackup.tar.gz ``` ## File Permissions Only the user can copy, execute, and change ```bash chmod 700 /path/to/folder ``` `700` means: * Owner: 7 (read = 4, write = 2, execute = 1; 4+2+1 = 7) * Group: 0 (no permissions) * Others: 0 (no permissions) All can read and execute, only the owner can change ```bash chmod 755 /path/to/folder ``` * Owner: 7 (read = 4, write = 2, execute = 1; 4+2+1 = 7) * Group: 5 (read = 4, execute = 1; 4+1 = 5) * Others: 5 (read = 4, execute = 1; 4+1 = 5) Give a user named someuser read, write, and execute permissions on the /path/to/folder directory ```bash setfacl -m u:someuser:rwx /path/to/folder ``` Verify the Permissions ```bash getfacl /path/to/folder ``` Remove all ACL entries for a user ```bash setfacl -x u:someuser /path/to/folder ``` Change owner ```bash sudo chown username:groupname /path/to/folder ``` ## Data transfer * Transfering data efficiently between local and remote systems: * `wget`: Download files directly from the internet to your local machine or server. * `scp`: Securely copy files between local and remote systems over SSH. * `rsync`: Synchronize files and directories efficiently, with options for selective copying and deletion. ```bash ## Basic command rsync -av /source/folder/ /destination/folder/ ## Sync source to destination rsync -av --delete /source/folder/ /destination/folder/ ## If you need two-way sync, run both directions rsync -av --delete /folder1/ /folder2/ rsync -av --delete /folder2/ /folder1/ ## Network sync rsync -av --delete /local/folder/ user@remote:/remote/folder/ ## Copy only new files (files that don't exist in the destination) rsync -av --ignore-existing /source/folder/ /destination/folder/ ## Copy only if source is newer rsync -av --update /source/folder/ /destination/folder/ ``` :::{.callout-note} - `--archive`: Preserves permissions, timestamps, symbolic links, etc. - `--verbose`: Shows progress - `--delete`: Removes files in destination that don't exist in source - `--dry-run`: Preview changes without actually syncing - `--update`: Only copy newer file - `--ignore-existing`: Skips files that already exist in destination (regardless of age) - `--update`: Only copies if source file is newer than destination - `--no-perms`: Don't preserve Unix permissions - `--no-times`: Don't preserve modification times - `--inplace`: Write directly without temp files ::: - Sync remote server folders to local using rsync ```bash ## Remote server details remote_user="username" remote_host="hpcio2" # hpc2021-io1.hku.hk,hpc2021-io2.hku.hk ## Define folders to sync (absolute paths) ## Format: "local_absolute_path:remote_absolute_path" folders_to_sync=( # "/mnt/m/Reference:/lustre1/g/path_my/Reference" # "/mnt/m/WES/DFSP/Annovar:/lustre1/g/path_my/pipeline/somatic_variants_calling/data/DFSP/Annovar" "/mnt/m/WES/SARC/BAM:/lustre1/g/path_my/pipeline/somatic_variants_calling/data/SARC/BAM" # Add more folders as needed in the same format ) ## Sync each folder for folder_pair in "${folders_to_sync[@]}"; do # Split the pair into local and remote paths local_path="${folder_pair%%:*}" remote_path="${folder_pair#*:}" echo "Syncing ${remote_user}@${remote_host}:${remote_path} to ${local_path}" # Create local directory structure mkdir -p "${local_path}" # Sync the folder contents from remote to local (trailing slash on source copies contents) rsync -av --update --progress --inplace --no-perms --no-times \ "${remote_user}@${remote_host}:${remote_path}/" \ "${local_path}/" echo "-----------------------------------" done ``` - Sync local folders to remote server ```bash ## Remote server details remote_user="username" remote_host="hpcio1" # hpc2021-io1.hku.hk ## Define folders to sync (absolute paths) ## Format: "local_absolute_path:remote_absolute_path" folders_to_sync=( "/mnt/f/Reference:/lustre1/g/path_my/Reference" # Add more folders as needed in the same format ) ## Sync each folder for folder_pair in "${folders_to_sync[@]}"; do # Split the pair into local and remote paths local_path="${folder_pair%%:*}" remote_path="${folder_pair#*:}" echo "Syncing ${local_path} to ${remote_user}@${remote_host}:${remote_path}" # Create remote directory structure ssh ${remote_user}@${remote_host} "mkdir -p ${remote_path}" # Sync the folder contents (trailing slash on source copies contents) rsync -av --update --progress \ "${local_path}/" \ "${remote_user}@${remote_host}:${remote_path}/" echo "-----------------------------------" done ``` ## Vim Text Editor ### Modes * Vim text editor has three main modes: * `Esc`, back to normal mode * `i`, insert mode * `v`, select mode | Command | What it does | Most common follow-up | |-----------|----------------------------|----------------------| | i | Insert before cursor | typing text | | a | Append after cursor | typing text | | I | Insert at beginning of line | — | | A | Append at end of line | — | | o | Open new line below | typing | | O | Open new line above | typing | | Esc | Back to Normal mode | — | | v | Visual mode | move + operator | | V | Visual Line mode | — | | Ctrl-v | Visual Block mode | — | ### Navigation | Command | Movement | Common combos | |--------------------|---------------------------------|------------------------| | h j k l | ← ↓ ↑ → | — | | w / b | forward/backward by word start | ciw, daw | | e / ge | forward/backward to word end | — | | W / B | forward/backward by big word | (ignores punctuation) | | 0 | Start of line | — | | ^ | First non-blank character | — | | $ | End of line | — | | gg | Top of file | — | | G | Bottom of file | 50G → line 50 | | Ctrl-u / Ctrl-d | Half page up/down | — | | Ctrl-f / Ctrl-b | Full page down/up | — | ### Editing | Command | Meaning | examples | |--------------------|---------------------------------------------|----------------------------------| | x | Delete character under cursor | — | | dd | Delete current line | 5dd = delete 5 lines | | yy | Yank (copy) current line | p to paste | | p / P | Paste after/before cursor | — | | u | Undo | (most used command after Esc) | | Ctrl-r | Redo | — | | . | Repeat last change/command | (extremely powerful) | | dw / d$ | Delete to end of word / end of line | diw, daw | | ciw / caw | Change (delete + insert) inside/around word | most common text object usage | | "ci"" / ci( / ci{" | Change inside quotes/parentheses/braces | extremely common in code | ### Search & Replace | Command | Meaning | |---------------------|----------------------------------------------| | /pattern | Search forward | | ?pattern | Search backward | | n / N | Next / previous match | | * / # | Search forward/backward for word under cursor| | :%s/old/new/g | Replace all occurrences (whole file) | | :%s/old/new/gc | Replace with confirmation | ### Saving & Exiting | Command | Meaning | |--------------------|----------------------------------------------------| | :w | Save (write) | | :q | Quit (only if no changes) | | :q! | Quit without saving | | :wq or :x or ZZ | Save and quit | | :w !sudo tee % | Save file needing root (very common workaround) | ## Nano Text Editor 1. Open a file with: `nano filename.txt` 2. Use the arrow keys to navigate. 3. Save the changes with `ctrl + o`, 4. Confirm the filename by pressing `Enter`. 5. Exit nano with `ctrl + x`. ### Searching Text * To find words or phrases within a file: 1. Initiate search with `ctrl + w`. 2. Type the search term and press `Enter`. 3. Exit search mode with `ctrl + c`. ### Customizing Nano Enhance Nano's functionality: ```bash nano -miA file.txt ``` * `-m`: Enable mouse support. * `-i`: Auto-indent new lines. * `-A`: Enable syntax highlighting. ## Job Schedule ### OpenPBS OpenPBS (Portable Batch System) is an open-source workload manager and job scheduler used in High-Performance Computing (HPC) environments to automate and optimize the execution of tasks across clusters and clouds. * PBS job script ```bash #!/bin/bash #PBS -N job_name # Job name #PBS -l nodes=1:ppn=4 # Number of nodes and processors per node #PBS -l walltime=01:00:00 # Walltime (hh:mm:ss) #PBS -l mem=8gb # Memory per node #PBS -j oe # Combine stdout and stderr #PBS -M user@example.com # Email address for notifications #PBS -m abe # Send email on (a) abort, (b) begin, (e) end #PBS -q batch # Queue named #PBS -V # Export all environment variables to the job ## example cd $PBS_O_WORKDIR # Change to the directory where the job was submitted module load python/3.8.5 # Load necessary modules python my_script.py # Command to run your program ``` ```bash ## Some variables $PBS_O_WORKDIR # The directory from which the job was submitted $PBS_O_HOME # The home directory of the user who submitted the job $PBS_JOBDIR # The working directory for the job on the compute node $TMPDIR # The temporary directory for the job on the compute node $PBS_JOBID # The unique identifier assigned to the job $PBS_NODEFILE # The file containing the list of nodes allocated to the job $PBS_NUM_NODES # The number of nodes allocated to the job $PBS_NUM_PPN # The number of processors per node allocated to the job $PBS_QUEUE # The name of the queue to which the job was submitted $PBS_JOBNAME # The name of the job as specified in the job script ``` The `qstat` command is used to request the status of jobs, queues, or a batch server: * `qsub job_script.pbs` - Submit a job * `qsub -v sample="$sample" -M "$EMAIL" pbs_1_cutadapt.pbs` - Submit a job with environment variables * `qstat` - Check job status * `qstat -u guorui -a` - Check all jobs for user guorui * `qdel job_id` - Delete a job * `qhold job_id` - Hold a job * `qrls job_id` - Release a held job ```bash ## List finished job details in summary qstat -x job_id ## List finished job details in full details qstat -H job_id qstat -x -f job_id qstat -x -f job_id | grepl "exec_host" ## Find out all compute nodes allocated to a job for job_id in $(find . -name "log.*.e*" -printf "%f\n" | sed 's/.*\.e$[0-9]*$$/\1/' | sort -n | uniq); do echo -n "Job ID: $job_id: "; qstat -x -f $job_id | grep exec_host done ## Start and interactive job with 16 processors, 64GB memory, 24 hours walltime in cgs_queue qsub -I -q cgs_queue -l nodes=hpcf3-c01:ppn=16,walltime=12:00:00,mem=64gb qsub -I -q cgs_queue -l nodes=hpcf3-c02:ppn=16,walltime=12:00:00,mem=64gb qsub -I -q cgs_queue -l nodes=hpcf3-c03:ppn=16,walltime=12:00:00,mem=64gb ``` ## Shell session ### Tmux * Tmux Configuration Consider adding these aliases to your shell configuration for quick tmux access. ::: {.code-block} ```bash # Terminal aliases for tmux alias t="tmux" alias ta="t a -t" alias tls="t ls" alias tn="t new -s" alias tk="t kill-session -t" alias tks="t kill-server" ``` ::: ::: {.callout-note} ### 🔧 Custom Prefix Key Change the default prefix in `.tmux.conf`: ```bash set-option -g prefix C-a bind-key C-a send-prefix ``` ::: * Session Management | **Action** | **Command/Shortcut** | |--------------------------------|----------------------------| | List sessions | `tmux list-sessions` | | Attach to session | `tmux attach-session -t target-session` | | Switch between sessions | `Ctrl + A + s` | | Switch to latest session | `Ctrl + A + l` | | Detach from session | `Ctrl + A + d` | * Window Management | **Action** | **Shortcut** | |--------------------------------|----------------------------| | List all windows | `Ctrl + A + w` | | Rename current window | `Ctrl + A + ,` | | Switch to next window | `Ctrl + A + →` | | Switch to previous window | `Ctrl + A + ←` | | Create new window | `Ctrl + A + c` | | Kill current window | `Ctrl + A + q` | * Pane Management | **Action** | **Shortcut** | |--------------------------------|----------------------------| | Switch to pane above | `Shift + ↑` | | Switch to pane below | `Shift + ↓` | | Switch to pane left | `Shift + ←` | | Switch to pane right | `Shift + →` | | Kill current pane | `Ctrl + A + x` | * Copy Mode | **Action** | **Shortcut** | |--------------------------------|----------------------------| | Enter copy mode | Drag mouse to select text | | Paste copied text | `Ctrl + A + ]` | --- ### Screen ::: {.callout-tip} ### 💡 Screen Aliases Add these aliases to your shell configuration for easier screen management. ::: ::: {.code-block} ```bash # Terminal aliases for screen alias s="screen" # start a screen session alias ss="s -S" # start a named screen session alias sr="s -r" # reattach to a screen session alias sls="s -ls" # list current running screen sessions ``` ::: * Basic Commands | **Action** | **Command** | |--------------------------------|--------------------------------| | Start screen session | `screen` | | Start named session | `screen -S session_name` | | Reattach to session | `screen -r` | | List running sessions | `screen -ls` | * Window Management | **Action** | **Shortcut** | |--------------------------------|----------------------------| | Create new window | `Ctrl + A + C` | | Kill current window | `Ctrl + A + K` | | List all windows | `Ctrl + A + W` | | Go to window 0-9 | `Ctrl + A + 0-9` | | Go to next window | `Ctrl + A + N` | | Toggle between windows | `Ctrl + A + Ctrl + A` | | Rename current window | `Ctrl + A + A` | * Region Management | **Action** | **Shortcut** | |--------------------------------|----------------------------| | Split horizontally | `Ctrl + A + S` | | Split vertically | `Ctrl + A + \|` | | Switch between regions | `Ctrl + A + Tab` | | Close all regions but current | `Ctrl + A + Q` | | Close current region | `Ctrl + A + X` | * Session & Copy Mode | **Action** | **Shortcut** | |--------------------------------|----------------------------| | Detach from session | `Ctrl + A + D` | | Start copy mode | `Ctrl + A + [` | | Paste copied text | `Ctrl + A + ]` | | Show help | `Ctrl + A + ?` | | Quit screen | `Ctrl + A + Ctrl + \` | --- ::: {.callout-note} ### Additional Resources - **VSCode**: [Official Keyboard Shortcuts Reference](https://code.visualstudio.com/docs/getstarted/keybindings) - **Vim**: [Vim Cheat Sheet](https://vim.rtorr.com/) - **Tmux**: [Tmux Cheat Sheet](https://tmuxcheatsheet.com/) - **Terminal**: [Bash Reference Manual](https://www.gnu.org/software/bash/manual/) ::: ## Software Environment ### Conda * Installation ```bash # Miniforge linux wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh" # Miniforge Mac wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh" # Install it bash Miniforge3-$(uname)-$(uname -m).sh # Add conda to PATH if found if [ -f "$HOME/miniconda3/etc/profile.d/conda.sh" ]; then . $HOME/miniconda3/etc/profile.d/conda.sh fi ``` * Configuration ```bash # Init conda init --all # Use mamba for faster solving conda update -n base conda conda install -n base conda-libmamba-solver # Config conda config --set solver libmamba conda config --set always_yes true conda config --set auto_activate_base false # Setup channels conda config --add channels bioconda conda config --add channels conda-forge #conda config --set channel_priority strict # Install software from a specific channel conda install -c conda-forge numpy ``` * Create Environment ```bash # Create R environment with specific R version and packages conda create -n renv r=4.5 r-languageserver r-tidyverse r-irkernel r-httpgd r-downlit r-xml2 r-markdown r-devtools radian # Install bioconductor packages, all package names are lower case conda install -n renv bioconductor-deseq2 bioconductor-edger # Install repository packages conda install -n renv r-qs r-fs r-tidyverse # Remove a specific environment conda remove --name renv --all # Remove a software in a environment conda remove --name renv package_name # Export and save the conda env file with all software information conda env export -n renv > renv.yml # Export and save the conda env file with all software information # without the prefix and bundle information conda env export -n renv --no-builds > renv.yml conda env export -n renv --no-builds --file renv.yml # Create a new environment with a environment file conda env create --file renv.yml # Rename an environment conda create --name new_env_name --clone old_env_name conda activate new_env_name conda remove --name old_env_name --all ``` ### Pixi ```bash ## Auto-activate pixi when entering project directory eval "$(pixi completion --shell bash)" # or zsh ``` ## Git Version Control ```bash ## keep your original Git repository but provide a clean version to others, git archive --format=zip --output=my-project-clean.zip main git init git add . git commit -m "first commit" git remote add origin git push -u origin main git config --global user.name "" # set an user name that will be associated with each history marker git config --global user.email "" # set an email address that will be associated with each history marker git config --global color.ui auto # set automatic command line coloring for Git for easy reviewing git init # # 1. Delete the local branch git branch -d branch-name # Use -D to force delete # 2. Delete the remote branch git push origin --delete branch-name # 3. Verify branches are gone git branch -a # List all branches echo "" # create file "" git status # check git add # add a file as it looks now to your next commit (stage) git commit # commit your staged content as a new commit snapshot ### if go to the viam editing mode, i to input, esc to quit, :wq to save and quit git log # show all commits in the current branch’s history, check versions, q to quit touch .gitignore git branch "" # create branch "" git branch # check branches git checkout "" # switch to another branch and check it out into your working directory "" git branch -D "" # delete branch ## git checkout -b temp # create temporary branch ## git merage # merge the specified branch’s history into the current one git clone # retrieve an entire repository from a hosted location via URL git remote -V # check remove repos information git push # Transmit local branch commits to the remote repository branch git fetch # fetch down all the branches from that Git remote git diff # diff of what is changed but not staged git pull # fetch and merge any commits from the tracking remote branch git reset # unstage a file while retaining the changes in working directory git rm -r --cached . # ignore already added files run ## Go back to a specifi commit git reset --hard e4098a7 ## Pull remote updates and ignore local changes git fetch origin git reset --hard origin/your-branch-name ## Change remote URL git remote -v # check current remote configuration git remote set-url origin <new-repository-url> git remote remove origin git remote add origin <new-repository-url> ## Reset to match remote git fetch origin git reset --hard origin/main ``` ## WSL * System info ```bash ## Install the neofetch package sudo apt install neofetch -y ## Show system info neofetch --memory_unit gib ``` * Change sudo password In the windows terminal, run the following command to change the sudo password: ```bash wsl -u root ## Input the new password passwd <username> ``` * WSL configuration - WSL global config: `C:\Users\<UserName>\.wslconfig` - WSL distro config: `\\wsl.localhost\Ubuntu-24.04\etc\wsl.conf`