I need to pull the data based on Column 1,2 and 3, for column 1 having the same date pull only the first line for that hour ignore other lines. If hour changes pull the first line of that hour.
Output should have only lines 1 2 3 4 5 6 10 12 13 14 19 20 21 22
nice of you to create a table with markdown, but it would be also easier to see the actual records without markdown table - just the markdown code tags.
I'm listing it here - modify if the actual data looks different:
A 2020-06-09 7:12:22
A 2020-06-03 18:26:07
A 2020-05-16 21:52:12
A 2020-05-16 22:37:18
B 2020-05-19 3:11:05
C 2020-06-06 21:14:44
C 2020-06-06 21:14:44
C 2020-06-06 21:14:46
C 2020-06-06 21:14:48
C 2020-06-06 22:16:50
C 2020-06-06 22:16:55
C 2020-06-06 0:16:50
C 2020-06-06 1:16:50
C 2020-06-12 22:20:00
C 2020-06-12 22:20:00
C 2020-06-12 22:20:03
C 2020-06-12 22:20:03
C 2020-06-12 22:20:04
D 2020-06-03 23:26:07
D 2020-05-16 21:52:12
D 2020-05-16 22:37:18
D 2020-05-16 23:39:18
Have you tried anything on your end and if so where exactly are you stuck?
There's a slew of similar questions in these forums and on the web dialing with "uniqueness" of one sort or the other. Have you searched anywhere for hints?
Usually it's a one-liner and it's in this case as well...
awk is perfect for this. Form a key from $2 (field 2) and from $3 (the string left from the :)
Then compare the key with the key from the previous line...
I still don't quite understand what the above means, but to ease the OP's path forward:
awk -F'[ :]+' '!a[$1,$2,$3]++' final4
where final4 file is your data sample from post, results in:
A 2020-06-09 7:12:22
A 2020-06-03 18:26:07
A 2020-05-16 21:52:12
A 2020-05-16 22:37:18
B 2020-05-19 3:11:05
C 2020-06-06 21:14:44
C 2020-06-06 22:16:50
C 2020-06-06 0:16:50
C 2020-06-06 1:16:50
C 2020-06-12 22:20:00
D 2020-06-03 23:26:07
D 2020-05-16 21:52:12
D 2020-05-16 22:37:18
D 2020-05-16 23:39:18
The array a[] holds the keys as index plus numbers as value, all in memory.
If your file is long, it makes sense to save some memory by not storing a value.
awk -F'[ :]+' '{ key=($1 SUBSEP $2 SUBSEP $3) } !(key in a) { a[key]; print }' final4
If your file is always sorted then you can even only store the key from the previous line (not all keys in an array)