I know that 'brute-force' scripting could accomplish this with lots of cat/echo/cut/grep and more. But, because my real file has 800k records, and the matching files have 10-20k records, this is not time-possible or efficient.
I have input file:
> cat file_in
1234567890123456789012345678901234567890
Joe 123456 30 Main St 1234 F
Jim 101362 1492 Hugh 0101 P
Kerry 040419 6091 Lost St 0101 F
Linda 123456 50 High Way 1235
Matt 242424 48 Speedway Dr4343 F
Kerrin180118 99 Skaters Way2012 P *
(you can ignore the first line - just a help since a fixed record file)
(tail +2 file_in skips over this line during testing)
Begin by only reviewing records where position 40 is blank = still need to process.
Want to see those records that cannot be processed because (a) the data in columns 7-12 does not exist in the following file:
> cat file_cd1
040419
101362
180118
242424
789012
967539
988012
I know Joe does not match, so ideally I would like to put a "1" in position 39 telling me I failed the first test.
A second test (b) is to only process records that are "abc" based on lookup of columns 29-32 into the following file:
> cat file_cd2
0101 abc
1234 abc
1235 ghi
2012 ghi
4343 ghi
9012 abc
Linda & Matt should then have a "2" put in position 39.
So, my start would be
awk 'substr($0,40,1)==" " {print}' file_in >file_out
which would create an output file, but only records I want to even consider that are not yet marked as processed. So, yes I intend to start with 6 records and make a file of 5 records. I now need to add those two codes at position 39 when appropriate.