My original files are like this below and I distinguish them from the AP_ID (file1 has 572 and file2 has 544). Also, the header on file1 has �G_� pre-pended. NOTE: these are only snippets of very large files and much of the data is not present here.
Original File 1:
TIMEFORMATTED,G_CCSDS_VERSION,G_CCSDS_TYPE,G_CCSDS_2HDR_FLAG,G_CCSDS_APID,G_CCSDS_GRP_FLAGS,G_CCSDS_SEQ_COUNT,G_CCSDS_PKT_LEN,G_CCSDS_DOY,G_CCSDS_MSEC
2014/04/07 16:02:55,0,0,1,572,3,0,1917,20550,57775339
2014/04/07 16:03:00,0,0,1,572,3,0,1917,20550,57780339
2014/04/07 16:03:05,0,0,1,572,3,0,1917,20550,57785339
2014/04/07 16:03:10,0,0,1,572,3,0,1917,20550,57790339
2014/04/07 16:03:15,0,0,1,572,3,0,1917,20550,57795339
Original File 2:
TIMEFORMATTED,CCSDS_VERSION,CCSDS_TYPE,CCSDS_2HDR_FLAG,CCSDS_APID,CCSDS_GRP_FLAGS,CCSDS_SEQ_COUNT,CCSDS_PKT_LEN,CCSDS_DOY,CCSDS_MSEC
2014/04/07 16:03:12,0,0,1,544,3,0,985,20550,57788894
2014/04/07 16:03:13,0,0,1,544,3,0,985,20550,57793894
2014/04/07 16:03:14,0,0,1,544,3,0,985,20550,57794894
2014/04/07 16:03:15,0,0,1,544,3,0,985,20550,57795894
2014/04/07 16:03:16,0,0,1,544,3,0,985,20550,57796894
2014/04/07 16:03:17, 0,0,1,544,3,0,985,20550,57797894
I sorted/merged with this code below. Note: the �-k21,21� was only used with my very large �real� files.
#!/bin/bash
function f() { awk 'NR==1{h=$0; next} {print $0 "\t" h}' $1; }; sort -t"," -k21,21 <(f file1) <(f file2) |
awk -F'\t' '$2!=p{print $2; p=$2} {print $1}' > temp5
PROBLEM: I only need one row from file1 that is an equal match or nearest match to file2 timestamp/s row/s and precede the file2 row/s (1 file1 row to 1 to many file2 rows). As you can see on this �example�, there are 3 rows (after first header) from file1 that are not needed. I only need the file1 row with timestamp �16:03:10�. So basically I only need the last row from file1 (AP_ID=572) to precede file2 row/s (1 to many). The space is only for readability between matched data.
MY OUTPUT:
TIMEFORMATTED,G_CCSDS_VERSION,G_CCSDS_TYPE,G_CCSDS_2HDR_FLAG,G_CCSDS_APID,G_CCSDS_GRP_FLAGS,G_CCSDS_SEQ_COUNT,G_CCSDS_PKT_LEN,G_CCSDS_DOY,G_CCSDS_MSEC
2014/04/07 16:02:55,0,0,1,572,3,0,1917,20550,57775339
2014/04/07 16:03:00,0,0,1,572,3,0,1917,20550,57780339
2014/04/07 16:03:05,0,0,1,572,3,0,1917,20550,57785339
2014/04/07 16:03:10,0,0,1,572,3,0,1917,20550,57790339
TIMEFORMATTED,CCSDS_VERSION,CCSDS_TYPE,CCSDS_2HDR_FLAG,CCSDS_APID,CCSDS_GRP_FLAGS,CCSDS_SEQ_COUNT,CCSDS_PKT_LEN,CCSDS_DOY,CCSDS_MSEC
2014/04/07 16:03:12,0,0,1,544,3,0,985,20550,57788894
2014/04/07 16:03:13,0,0,1,544,3,0,985,20550,57793894
2014/04/07 16:03:14,0,0,1,544,3,0,985,20550,57794894
TIMEFORMATTED,G_CCSDS_VERSION,G_CCSDS_TYPE,G_CCSDS_2HDR_FLAG,G_CCSDS_APID,G_CCSDS_GRP_FLAGS,G_CCSDS_SEQ_COUNT,G_CCSDS_PKT_LEN,G_CCSDS_DOY,G_CCSDS_MSEC
2014/04/07 16:03:15,0,0,1,572,3,0,1917,20550,57795339
TIMEFORMATTED,CCSDS_VERSION,CCSDS_TYPE,CCSDS_2HDR_FLAG,CCSDS_APID,CCSDS_GRP_FLAGS,CCSDS_SEQ_COUNT,CCSDS_PKT_LEN,CCSDS_DOY,CCSDS_MSEC
2014/04/07 16:03:15,0,0,1,544,3,0,985,20550,57795894
2014/04/07 16:03:16,0,0,1,544,3,0,985,20550,57796894
2014/04/07 16:03:17, 0,0,1,544,3,0,985,20550,57797894
I then ran this below to try and resolve, but it only kept the FIRST file1 row, but not the preferred LAST.
QUESTION: How can I modify this code to keep only the last file1 (AP_ID=572) row?
#!/bin/bash
function f() { awk 'NR==1{h=$0; next} {print $0 "\t" h}' $1; }; sort -t"," -k21,21 <(f file1) <(f file2) |
awk -F'\t' '$2!=p{print $2; p=$2; b++; c=1} !(b%2)||c&&c--{print $1}' > temp5
I hope this isn't too long winded and confusing. Thank you!!