HW2 - Introduction to Biocluster and Linux
2 minute read
Topic: Linux Basics
-
Log into your user account on the HPCC cluster, and from there into a compute node with
srun
.srun --x11 --partition=short --mem=2gb --cpus-per-task 4 --ntasks 1 --time 1:00:00 --pty bash -l
-
Download code from this page
wget https://cluster.hpcc.ucr.edu/~tgirke/Linux.sh --no-check-certificate
-
Download Halobacterium proteome and inspect it
wget https://ftp.ncbi.nlm.nih.gov/genomes/genbank/archaea/Halobacterium_salinarum/all_assembly_versions/GCA_004799605.1_ASM479960v1/GCA_004799605.1_ASM479960v1_protein.faa.gz gunzip GCA_004799605.1_ASM479960v1_protein.faa.gz mv GCA_004799605.1_ASM479960v1_protein.faa halobacterium.faa less halobacterium.faa # press q to quit
-
How many protein sequences are stored in the downloaded file?
grep '>' halobacterium.faa | wc grep '^>' halobacterium.faa --count
-
How many proteins contain the pattern
WxHxxH
orWxHxxHH
?egrep 'W.H..H{1,2}' halobacterium.faa --count
-
Use
less
to find IDs for pattern matches or useawk
awk --posix -v RS='>' '/W.H..(H){1,2}/ { print ">" $0;}' halobacterium.faa | less awk --posix -v RS='>' '/W.H..(H){1,2}/ { print ">" $0;}' halobacterium.faa | grep '^>' | cut -c 2- | cut -f 1 -d\ > myIDs
-
Create a BLASTable database with
formatdb
module load ncbi-blast/2.2.31+ makeblastdb -in halobacterium.faa -out halobacterium.faa -dbtype prot -hash_index -parse_seqids
-
Query BLASTable database by IDs stored in a file (e.g.
myIDs
)blastdbcmd -db halobacterium.faa -dbtype prot -entry_batch myIDs -get_dups -out myseq.fasta
-
Run BLAST search for sequences stored in
myseq.fasta
blastp -query myseq.fasta -db halobacterium.faa -outfmt 0 -evalue 1e-6 -out blastp.out blastp -query myseq.fasta -db halobacterium.faa -outfmt 6 -evalue 1e-6 -out blastp.tab
-
Return system time and host name
date hostname
Additional exercise material in Linux Manual
Homework assignment
Perform above analysis on the protein sequences from E. coli. A right click on the link will allow you to copy the URL so that it can be used together with wget
.
Record result from final BLAST command (with outfmt 6
) in text file named myresult.txt
.
Homework submission
Upload result file (myresult.txt
) to your private course GitHub repository under Homework/HW2/HW2.txt
.
Due date
Most homeworks will be due one week after they are assigned. This one is due on Thu, April 11th at 6:00 PM.
Homework solution
See here.