How To Write my Bash Script To Automate it?

dellia222 · February 16, 2019, 8:24am

Hello guys, I need some help. I am new in bash and I don't know how to automate the following script.

head -2 out1 > 1.fasta
sed �1,2 d' out1 > out2
rm out1
head -2 out2 > 2.fasta
sed �1,2 d' out2 > out1
rm out2
head -2 out2 > 3.fasta
sed '1,2 d' out2 > out1
rm out2
.......
......

my file has 69800 lines and I want to do that automatically.

thanks a lot
laura

RudiC · February 16, 2019, 9:42am

Welcome to the forum.

If you post a question, it's best to also mention the OS, shell, and tools' versions that you use, so a solution can be taylored to your situation. For instance, several sed s out there have features that allow for a concise solution that others don't.

Do you have awk at hand? Could be used to do all of it in one go.
Did you consider the split command? Looks like it is made for your problem, even to produce the desired file names.

Don_Cragun · February 16, 2019, 10:36am

Hi RudiC,
The standard split utility creates output filenames with sequence "numbers" added to the end of a given output filename prefix where the "numbers" aren't numeric; they are names like fasta.aa , fasta.ab , fasta.ac , ...

Like you said, awk would work well for this.

Hi laura,
To get the requested output, one could use something like:

#!/bin/bash
awk 'NR % 2 {	f = ++cnt".fasta" }
{		print > f }
!(NR % 2){	close(f)}
' out1 && rm out1

but that would produce names with varying output lengths that would be inconvenient to process if you wanted to handle them in the same order that they originally appeared in the file named out1 . If you do later want to process them in that order, you would probably be happier with something more like:

#!/bin/bash
awk 'NR % 2 {	f = sprintf("%05d.fasta", ++cnt) }
{		print > f }
!(NR % 2){	close(f)}
' out1 && rm out1

which would produce all 34900 output files with names that start with 5 decimal digits (with leading zeros on the first 9999 files so normal listings of the produced files are in the order you probably want.

If you intend to try this on a Solaris/SunOS operating system, change awk in both of the above suggestions to /usr/xpg4/bin/awk or nawk .

As RudiC said, next time you need help, plan to tell us what operating system as well as what shell you're using and show us what you have tried to solve the problem on your own. We want to help you learn how to do stuff like this on your own; not to act as your unpaid programming staff.

RudiC · February 16, 2019, 1:45pm

Hi Don,

I had a comment on the non-fitting file names similar to what you said but removed it when I tested the split installed on my linux (split (GNU coreutils) 8.28):

$ split -dl2 --additional-suffix=.fasta file ""
$ ls -la
-rw-rw-r-- 1 user group  379 Feb 16 19:39 00.fasta
-rw-rw-r-- 1 user group  349 Feb 16 19:39 01.fasta
-rw-rw-r-- 1 user group  175 Feb 16 19:39 02.fasta

Had the requestor told us what system s/he has, a more specific answer had resulted.