Script to delete rows in a file

alok2082 · November 20, 2013, 10:53am

Hi All,

I am new to UNIX . Please help me in writing code to delete all records from the file where all columns after cloumn 5 in file is either 0, #MI or NULL.
Initial 5 columns are string

e.g.

"alsod" "1FEV2" "wjwroe" " wsse" "hd3" 1 2 34 #Mi
"malasl" "wses" "trwwwe" " wsse" "hd3" 1 2 0 #Mi
"alsod" "1FEV2" "asd" " wsse" "hd3" #Mi #Mi
"alsod" "1FEV2" "asd" " wsse" "hd3" 0 0 0
"alsod" "1FEV2" "asd" " wsse" "hd3" 1 2 3 4 5

expected output is

"alsod" "1FEV2" "wjwroe" " wsse" "hd3" 1 2 34 #Mi
"malasl" "wses" "trwwwe" " wsse" "hd3" 1 2 0 #Mi
"alsod" "1FEV2" "asd" " wsse" "hd3" 1 2 3 4 5

The file is around 1-2 GB large.
I have written a code but it is taking 45-50 min to execute the script.

grep -EHv ([1-9]/s) file.txt > file2.txt

can some one please suggest alternate code where we are selectively deleting the records containing 0/#Mi/NULL after column 5

Thanks

blackrageous · November 20, 2013, 11:23am

I don't think that grep you show is correct. For example the -H on the grep command you show prints the filename per match. I don't understand why you have included it. I ran your example and it does not work. Use awk.

Akshay_Hegde · November 20, 2013, 11:23am

alok2082:

Hi All,

I am new to UNIX . Please help me in writing code to delete all records from the file where all columns after cloumn 5 in file is either 0, #MI or NULL.
Initial 5 columns are string

e.g.

"alsod" "1FEV2" "wjwroe" " wsse" "hd3" 1 2 34 #Mi
"malasl" "wses" "trwwwe" " wsse" "hd3" 1 2 0 #Mi
"alsod" "1FEV2" "asd" " wsse" "hd3" #Mi #Mi
"alsod" "1FEV2" "asd" " wsse" "hd3" 0 0 0
"alsod" "1FEV2" "asd" " wsse" "hd3" 1 2 3 4 5

expected output is

"alsod" "1FEV2" "wjwroe" " wsse" "hd3" 1 2 34 #Mi
"malasl" "wses" "trwwwe" " wsse" "hd3" 1 2 0 #Mi
"alsod" "1FEV2" "asd" " wsse" "hd3" 1 2 3 4 5

The file is around 1-2 GB large.
I have written a code but it is taking 45-50 min to execute the script.

grep -EHv ([1-9]/s) file.txt > file2.txt

can some one please suggest alternate code where we are selectively deleting the records containing 0/#Mi/NULL after column 5

Thanks

Try :

$ cat file
"alsod" "1FEV2" "wjwroe" " wsse" "hd3" 1 2 34 #Mi
"malasl" "wses" "trwwwe" " wsse" "hd3" 1 2 0 #Mi
"alsod" "1FEV2" "asd" " wsse" "hd3" #Mi #Mi
"alsod" "1FEV2" "asd" " wsse" "hd3" 0 0 0
"alsod" "1FEV2" "asd" " wsse" "hd3" 1 2 3 4 5

$ awk '$7~/[0-9]/ && $7 !=0' file
"alsod" "1FEV2" "wjwroe" " wsse" "hd3" 1 2 34 #Mi
"malasl" "wses" "trwwwe" " wsse" "hd3" 1 2 0 #Mi
"alsod" "1FEV2" "asd" " wsse" "hd3" 1 2 3 4 5

disedorgue · November 20, 2013, 11:44am

Hi,
With sed:

$ sed '/"[^"]*[1-9][^"]*$/!d' file.txt
"alsod" "1FEV2" "wjwroe" " wsse" "hd3" 1 2 34 #Mi
"malasl" "wses" "trwwwe" " wsse" "hd3" 1 2 0 #Mi
"alsod" "1FEV2" "asd" " wsse" "hd3" 1 2 3 4 5

Regards.

MadeInGermany · November 20, 2013, 12:07pm

Your requirement says after 5th column, but it looks like it is after the last " character.
Further it looks like a non-zero digit should be reason enough to print.
Then with awk it becomes

awk '{for (f=NF; f>=1 && $f!~/"/; f--) if ($f~/[1-9]/) {print; next}}' file

like was done in the previous sed solution.
The advantage of awk is, you have more means to modify your search.

---------- Post updated at 12:07 PM ---------- Previous update was at 12:01 PM ----------

While the previous sed could be abbreviated

sed '/[1-9][^"]*$/!d' file

that is equivalent to

grep '[1-9][^"]*$' file