Keep only the closet match of timestamped row (include headers) from file1 to precede file2 row/s

aachave1 · June 29, 2017, 1:30pm

This is a question that is related to one I had last August when I was trying to sort/merge two files by millsecond time column (in this case column 6).

The script (below) that helped me last august by RudiC solved the puzzle of sorting/merging two files by time, except it gets lost when the day/time changes back to zero (next day).

awk '
NR == 1         {getline HD1 < F1
                 HD2 = $0
                 next
                }

$6 >= T[6]      {do     {LAST = TMP
                         ST = getline TMP < F1
                         split (TMP, T, FS)
                        }
                 while (($6 >= T[6]) && (ST == 1))
                 if (ST == 0)   {LAST = TMP
                                 T[6] = "ZZZ"
                                }
                 print HD1
                 print LAST
                 print HD2
                 print
                 next
                }
                {print 
                }

' FS="," F1=file1 file2

Recap: The output file needs to have file1 contents on top of file2 contents (headers included) where file2 column 6 is >= to file1 column 6, and file2 column 6 (same value) is < file1 column 6 (next value).

It works well until the day changes and time restarts, then it gets lost and doesn't match data properly.

Original files (snippets from very large files)
Column 6 = Bold

File1 

TIMEFORMATTED	G_CCSDS_2HDR_FLAG	G_CCSDS_APID	G_CCSDS_SEQ_COUNT	G_CCSDS_DOY	G_CCSDS_MSEC
6/19/2017 23:59:58	1	572	11353	21719	86398214
6/19/2017 23:59:59	1	572	11354	21719	86399214
6/20/2017 0:00:00	1	572	11355	21720	214
6/20/2017 0:00:01	1	572	11356	21720	1214

File2

TIMEFORMATTED	CCSDS_2HDR_FLAG	CCSDS_APID	CCSDS_SEQ_COUNT	CCSDS_DOY	CCSDS_MSEC
6/19/2017 23:59:58	1	544	6677	21719	86398318
6/19/2017 23:59:59	1	544	6678	21719	86399318
6/20/2017 0:00:00	1	544	6679	21720	318
6/20/2017 0:00:01	1	544	6680	21720	1318

I need it to do this below, so that when the day/time changes, it still needs to match column 6 properly.

You can see how the times from both files match in column 1, but in column 6 (millisecond) the preceding file1 is always less than file2 column 6.
When the day changes to 6/20, it still needs to sort properly like this.

Desired Output
File1 - green
File2 - red

TIMEFORMATTED	G_CCSDS_2HDR_FLAG	G_CCSDS_APID	G_CCSDS_SEQ_COUNT	G_CCSDS_DOY	G_CCSDS_MSEC
6/19/2017 23:59:58	1	572	11353	21719	86398214
TIMEFORMATTED	CCSDS_2HDR_FLAG	CCSDS_APID	CCSDS_SEQ_COUNT	CCSDS_DOY	CCSDS_MSEC
6/19/2017 23:59:58	1	544	6677	21719	86398318
TIMEFORMATTED	G_CCSDS_2HDR_FLAG	G_CCSDS_APID	G_CCSDS_SEQ_COUNT	G_CCSDS_DOY	G_CCSDS_MSEC
6/19/2017 23:59:59	1	572	11354	21719	86399214
TIMEFORMATTED	CCSDS_2HDR_FLAG	CCSDS_APID	CCSDS_SEQ_COUNT	CCSDS_DOY	CCSDS_MSEC
6/19/2017 23:59:59	1	544	6678	21719	86399318

TIMEFORMATTED	G_CCSDS_2HDR_FLAG	G_CCSDS_APID	G_CCSDS_SEQ_COUNT	G_CCSDS_DOY	G_CCSDS_MSEC
6/20/2017 0:00:00	1	572	11355	21720	214
TIMEFORMATTED	CCSDS_2HDR_FLAG	CCSDS_APID	CCSDS_SEQ_COUNT	CCSDS_DOY	CCSDS_MSEC
6/20/2017 0:00:00	1	544	6679	21720	318
TIMEFORMATTED	G_CCSDS_2HDR_FLAG	G_CCSDS_APID	G_CCSDS_SEQ_COUNT	G_CCSDS_DOY	G_CCSDS_MSEC
6/20/2017 0:00:01	1	572	11356	21720	1214
TIMEFORMATTED	CCSDS_2HDR_FLAG	CCSDS_APID	CCSDS_SEQ_COUNT	CCSDS_DOY	CCSDS_MSEC
6/20/2017 0:00:01	1	544	6680	21720	1318

This is so difficult for me to explain, sorry!

---------- Post updated 06-29-17 at 01:30 PM ---------- Previous update was 06-28-17 at 03:26 PM ----------

I may have found a workaround by adding column 1 in the conditional checks. That way it not only finds >= in the column 1 time, but still keeps the the lower millisecond number (column 6) with file 1 row preceding file 2 row.
There is probably a better way to compensate for the changing of the day/time back to zero, but this seems to work.

awk '
NR == 1         {getline HD1 < F1
                 HD2 = $0
                 next
                }

$6 >= T[6] && $1 >= T[1]   {do     {LAST = TMP
                         ST = getline TMP < F1
                         split (TMP, T, FS)
                        }
                 while (($6 >= T[6]) && ($1 >= T[1]) && (ST == 1))
                 if (ST == 0)   {LAST = TMP
                                 T[6] = "ZZZ"
                                }
                 print HD1
                 print LAST
                 print HD2
                 print
                 next
                }
                {print 
                }

' FS="," F1=file1 file2