Hi All,
I am new to UNIX . Please help me in writing code to delete all records from the file where all columns after cloumn 5 in file is either 0, #MI or NULL.
Initial 5 columns are string
e.g.
"alsod" "1FEV2" "wjwroe" " wsse" "hd3" 1 2 34 #Mi
"malasl" "wses" "trwwwe" " wsse" "hd3" 1 2 0 #Mi
"alsod" "1FEV2" "asd" " wsse" "hd3" #Mi #Mi
"alsod" "1FEV2" "asd" " wsse" "hd3" 0 0 0
"alsod" "1FEV2" "asd" " wsse" "hd3" 1 2 3 4 5
expected output is
"alsod" "1FEV2" "wjwroe" " wsse" "hd3" 1 2 34 #Mi
"malasl" "wses" "trwwwe" " wsse" "hd3" 1 2 0 #Mi
"alsod" "1FEV2" "asd" " wsse" "hd3" 1 2 3 4 5
The file is around 1-2 GB large.
I have written a code but it is taking 45-50 min to execute the script.
grep -EHv ([1-9]/s) file.txt > file2.txt
can some one please suggest alternate code where we are selectively deleting the records containing 0/#Mi /NULL after column 5
Thanks
I don't think that grep you show is correct. For example the -H on the grep command you show prints the filename per match. I don't understand why you have included it. I ran your example and it does not work. Use awk.
alok2082:
Hi All,
I am new to UNIX . Please help me in writing code to delete all records from the file where all columns after cloumn 5 in file is either 0, #MI or NULL.
Initial 5 columns are string
e.g.
"alsod" "1FEV2" "wjwroe" " wsse" "hd3" 1 2 34 #Mi
"malasl" "wses" "trwwwe" " wsse" "hd3" 1 2 0 #Mi
"alsod" "1FEV2" "asd" " wsse" "hd3" #Mi #Mi
"alsod" "1FEV2" "asd" " wsse" "hd3" 0 0 0
"alsod" "1FEV2" "asd" " wsse" "hd3" 1 2 3 4 5
expected output is
"alsod" "1FEV2" "wjwroe" " wsse" "hd3" 1 2 34 #Mi
"malasl" "wses" "trwwwe" " wsse" "hd3" 1 2 0 #Mi
"alsod" "1FEV2" "asd" " wsse" "hd3" 1 2 3 4 5
The file is around 1-2 GB large.
I have written a code but it is taking 45-50 min to execute the script.
grep -EHv ([1-9]/s) file.txt > file2.txt
can some one please suggest alternate code where we are selectively deleting the records containing 0/#Mi /NULL after column 5
Thanks
Try :
$ cat file
"alsod" "1FEV2" "wjwroe" " wsse" "hd3" 1 2 34 #Mi
"malasl" "wses" "trwwwe" " wsse" "hd3" 1 2 0 #Mi
"alsod" "1FEV2" "asd" " wsse" "hd3" #Mi #Mi
"alsod" "1FEV2" "asd" " wsse" "hd3" 0 0 0
"alsod" "1FEV2" "asd" " wsse" "hd3" 1 2 3 4 5
$ awk '$7~/[0-9]/ && $7 !=0' file
"alsod" "1FEV2" "wjwroe" " wsse" "hd3" 1 2 34 #Mi
"malasl" "wses" "trwwwe" " wsse" "hd3" 1 2 0 #Mi
"alsod" "1FEV2" "asd" " wsse" "hd3" 1 2 3 4 5
Hi,
With sed:
$ sed '/"[^"]*[1-9][^"]*$/!d' file.txt
"alsod" "1FEV2" "wjwroe" " wsse" "hd3" 1 2 34 #Mi
"malasl" "wses" "trwwwe" " wsse" "hd3" 1 2 0 #Mi
"alsod" "1FEV2" "asd" " wsse" "hd3" 1 2 3 4 5
Regards.
Your requirement says after 5th column, but it looks like it is after the last "
character.
Further it looks like a non-zero digit should be reason enough to print.
Then with awk it becomes
awk '{for (f=NF; f>=1 && $f!~/"/; f--) if ($f~/[1-9]/) {print; next}}' file
like was done in the previous sed solution.
The advantage of awk is, you have more means to modify your search.
---------- Post updated at 12:07 PM ---------- Previous update was at 12:01 PM ----------
While the previous sed could be abbreviated
sed '/[1-9][^"]*$/!d' file
that is equivalent to
grep '[1-9][^"]*$' file
1 Like