Filtering unique records

rajjev_saini123 · June 15, 2020, 10:08pm

I need to pull the data based on Column 1,2 and 3, for column 1 having the same date pull only the first line for that hour ignore other lines. If hour changes pull the first line of that hour.

Output should have only lines 1 2 3 4 5 6 10 12 13 14 19 20 21 22

Column 1	Column 2	Column 3
A	2020-06-09	7:12:22
A	2020-06-03	18:26:07
A	2020-05-16	21:52:12
A	2020-05-16	22:37:18
B	2020-05-19	3:11:05
C	2020-06-06	21:14:44
C	2020-06-06	21:14:44
C	2020-06-06	21:14:46
C	2020-06-06	21:14:48
C	2020-06-06	22:16:50
C	2020-06-06	22:16:55
C	2020-06-06	0:16:50
C	2020-06-06	1:16:50
C	2020-06-12	22:20:00
C	2020-06-12	22:20:00
C	2020-06-12	22:20:03
C	2020-06-12	22:20:03
C	2020-06-12	22:20:04
D	2020-06-03	23:26:07
D	2020-05-16	21:52:12
D	2020-05-16	22:37:18
D	2020-05-16	23:39:18

vgersh99 · June 15, 2020, 10:33pm

nice of you to create a table with markdown, but it would be also easier to see the actual records without markdown table - just the markdown code tags.
I'm listing it here - modify if the actual data looks different:

A 2020-06-09 7:12:22
A 2020-06-03 18:26:07
A 2020-05-16 21:52:12
A 2020-05-16 22:37:18
B 2020-05-19 3:11:05
C 2020-06-06 21:14:44
C 2020-06-06 21:14:44
C 2020-06-06 21:14:46
C 2020-06-06 21:14:48
C 2020-06-06 22:16:50
C 2020-06-06 22:16:55
C 2020-06-06 0:16:50
C 2020-06-06 1:16:50
C 2020-06-12 22:20:00
C 2020-06-12 22:20:00
C 2020-06-12 22:20:03
C 2020-06-12 22:20:03
C 2020-06-12 22:20:04
D 2020-06-03 23:26:07
D 2020-05-16 21:52:12
D 2020-05-16 22:37:18
D 2020-05-16 23:39:18

Have you tried anything on your end and if so where exactly are you stuck?
There's a slew of similar questions in these forums and on the web dialing with "uniqueness" of one sort or the other. Have you searched anywhere for hints?
Usually it's a one-liner and it's in this case as well...

rajjev_saini123 · June 16, 2020, 12:49am

actual data looks the same instead of A B C these are names

vgersh99 · June 16, 2020, 1:19am

ok, how can we help?

rajjev_saini123 · June 16, 2020, 12:44pm

I am not sure how to get the info, tried using awk and grep not working

vgersh99 · June 16, 2020, 1:07pm

Please share what you've tried so far to see your trend of thought...
Have you searched for similar questions as previously suggested?

MadeInGermany · June 16, 2020, 2:20pm

awk is perfect for this. Form a key from $2 (field 2) and from $3 (the string left from the :)
Then compare the key with the key from the previous line...

vgersh99 · June 16, 2020, 2:34pm

... or change the FS to also include :

rajjev_saini123 · June 18, 2020, 6:52pm

I was still not able to the required information for each hour i was able to get but for each day i was not able to make query

cat final4 | awk  ' { print $1$2} '   |cut -d: -f1-2|  sort  | uniq  |  cat final4 | awk  ' { print $1$2} '   |cut -d: -f1-2|  sort  | uniq

vgersh99 · June 18, 2020, 7:12pm

I still don't quite understand what the above means, but to ease the OP's path forward:

awk -F'[ :]+' '!a[$1,$2,$3]++' final4

where final4 file is your data sample from post, results in:

A 2020-06-09 7:12:22
A 2020-06-03 18:26:07
A 2020-05-16 21:52:12
A 2020-05-16 22:37:18
B 2020-05-19 3:11:05
C 2020-06-06 21:14:44
C 2020-06-06 22:16:50
C 2020-06-06 0:16:50
C 2020-06-06 1:16:50
C 2020-06-12 22:20:00
D 2020-06-03 23:26:07
D 2020-05-16 21:52:12
D 2020-05-16 22:37:18
D 2020-05-16 23:39:18

rajjev_saini123 · June 18, 2020, 8:40pm

thanks a lot its working

MadeInGermany · June 19, 2020, 5:27am

The array a[] holds the keys as index plus numbers as value, all in memory.
If your file is long, it makes sense to save some memory by not storing a value.

awk -F'[ :]+' '{ key=($1 SUBSEP $2 SUBSEP $3) } !(key in a) { a[key]; print }' final4

If your file is always sorted then you can even only store the key from the previous line (not all keys in an array)

awk -F'[ :]+' '{ key=($1 SUBSEP $2 SUBSEP $3) } key!=prevkey { prevkey=key; print }' final4

system · August 18, 2020, 5:31am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.