Hello
I have a file with contents like this...
Part1 Field2 Field3 Field4 (line1)
Part2 Field2 Field3 Field4 (line2)
Part3 Field2 Field3 Field4 (line3)
Part1 Field2 Field3 Field4 (line4)
Part4 Field2 Field3 Field4 (line5)
Part5 Field2 Field3 Field4 (line6)
Part2 Field2 Field3 Field4 (line7)
Part1 Field2 Field3 Field4 (line8)
...
The lines are added throughout the day at different times by various programs so the listing is in the order of timestamp . At the end of the day, I want to remove the oldest values (since they are superseded). So in the example above, I want to get rid of line 1 line 2 and line 4 as there are more recent row of these Parts. Also delete the empty rows that get created during the delete of the row.
Part3 Field2 Field3 Field4 (line3)
Part4 Field2 Field3 Field4 (line5)
Part5 Field2 Field3 Field4 (line6)
Part2 Field2 Field3 Field4 (line7)
Part1 Field2 Field3 Field4 (line8)
Any help will be greatly appreciated.
I think the (line number) are added for demonstration, not in the real file?
Then it is with awk
awk '
{s[$0]=NR}
END {for (i=1;i<=NR;i++) for (j in s) if (i==s[j]) print j}
' file
For big files the END section should sort on the line numbers. With perl it becomes
perl -ne '
$s{$_}=++$i;
END {print sort {$s{$a}<=>$s{$b}} keys %s}
' file
Yes, the line numbers at the end were added for demonstration purpose.
---------- Post updated at 05:10 PM ---------- Previous update was at 02:52 PM ----------
madeingermany:
I think the (line number) are added for demonstration, not in the real file?
Then it is with awk
awk '
{s[$0]=NR}
END {for (i=1;i<=NR;i++) for (j in s) if (i==s[j]) print j}
' file
For big files the END section should sort on the line numbers. With perl it becomes
perl -ne '
$s{$_}=++$i;
END {print sort {$s{$a}<=>$s{$b}} keys %s}
' file
I tried it, but it just returned the original values.
It works with this file:
Part1 Field2 Field3 Field4
Part2 Field2 Field3 Field4
Part3 Field2 Field3 Field4
Part1 Field2 Field3 Field4
Part4 Field2 Field3 Field4
Part5 Field2 Field3 Field4
Part2 Field2 Field3 Field4
Part1 Field2 Field3 Field4
1 Like
ok, i see it works only when the entire line duplicated.
Anyway to just check on the first column and not the entire row ?
Thank you so much for sharing your experience and expertise.
RudiC
August 22, 2014, 4:42am
6
Use s[$1]
instead of s[$0]
in awk
.
1 Like
s[$1]
only stores the key (column 1), so one needs to also store the rest of the row.
Or the entire row:
awk '
{s[$1]=NR; row[NR]=$0}
END {for (i=1;i<=NR;i++) for (j in s) if (i==s[j]) print row}
' file
Or
awk '
{s[$1]=NR; row[$1]=$0}
END {for (i=1;i<=NR;i++) for (j in s) if (i==s[j]) print row[j]}
' file
I wonder which one consumes less memory?
2 Likes
Thank you very much. It works perfectly.