Hello folks
I have a question for you gurus of sed or grep (maybe awk, but I would prefer the first two)
I have a file (f1) that says:
(actually, these are not numbers but md5sum, but for simplicity, let's assume these numbers.)
1
2
3
4
5
And I have a file (f2) that says
1|a
1|b
1|c
2|d
2|e
2|f
2|g
3|h
3|i
4|j
4|k
4|l
5|m
5|n
I would like to keep either
- one of each line starting with the same number
1|a
2|d
3|h
4|j
5|m
- or all other lines starting with the same number (I'll chose the most efficient)
1|b
1|c
2|e
2|f
2|g
3|i
4|k
4|l
5|n
I already accomplished miracles with sed and grep on previous steps of my final script, so I hope someone will get something simple for this problem.
Here is what I get with bash (It works but is slow...). Only f2 is needed in this example
while read l; do
n="$md5"; md5="${l%%|*}"
[ "$n" = "$md5" ] && { echo "$l" >> "$TMP1"; }
done < "f2"
In this script, all the second and later lines of similar md5 go to $TMP1 file to be processed later.
All datas are sorted by 1st field
Thank you in advance.