Match value in two files and replace values in selected columns

jiam912 · February 20, 2019, 10:44am

The purpose is to check if values for column 3 and 4 in file1 match with column 1 in file2.

If any value match do:

1) Replace values in file2 for column 2 and 3 using the information of file1 columns 5 and 6

2) Replace string ($1,1,5) and string ($1,6,5) in file2 with values of columns 7 and 8 from file1

3) Add character R for lines replaced and O for lines not replaced.

file1

2,100,31431,37131,999991.70,0000000.30,11111,22222,3
3,100,31431,37471,111113.20,1111111.30,22222,33333,4

file2

3143137113 318512.50 2334387.50 100
3143137131 318737.50 2334387.50 100
3143137201 319612.50 2334387.50 100
3143137471 322987.50 2334387.50 100
3143137491 323237.50 2334387.50 100

Desired output:

31431,37113,318512.50,2334387.50,100,O
11111,22222,999991.70,0000000.30,100,R
31431,37201,319612.50,2334387.50,100,O
22222,33333,111113.20,1111111.30,100,R
31431,37491,323237.50,2334387.50,100,O

I tried

awk -F, 'NR==FNR{a[$3 $4]=substr($0,length($3 FS)+1);next} $1 in a{print a[$1],$NF;next} {$1=substr($1,1,5) OFS substr($1,6,5);} 1' OFS=, file1 FS=' ' file2

steveo314 · February 20, 2019, 10:47am

are you required to use awk or could you write a perl script?

jiam912 · February 20, 2019, 10:49am

Hi steveo314.. awk please

vgersh99 · February 20, 2019, 12:06pm

This is as close as I could get given your description and sample files - they don't exactly jive well.....
Lots of guessing on the verbiage....
awk -f jiam.awk OFS=, FS=, file1.txt FS=' ' file2.txt where jiam.awk is:

FNR==NR { f1_56[$3$4]=($5 OFS $6)
          f1_78[$3$4]=($7 OFS $8)
          next
        }
{$(NF+1)="O"}
$1 in f1_56 {
  split(f1_56[$1], a, OFS)
  $2 = a[1]
  $3 = a[2]
  split(f1_78[$1], a, OFS)
  $1=a[1] OFS a[2]
  $NF="R"
}
1

output:

3143137113,318512.50,2334387.50,100,O
11111,22222,999991.70,0000000.30,100,R
3143137201,319612.50,2334387.50,100,O
22222,33333,111113.20,1111111.30,100,R
3143137491,323237.50,2334387.50,100,O

jiam912 · February 20, 2019, 12:39pm

Hi vgersh99
Many thanks for the code..it works but still something that not match with desired output

3143137113,318512.50,2334387.50,100,O

shoulbe

31431,37113,318512.50,2334387.50,100,O

The first column in file 2 need to be separated as 5dig,5dig.. same as example of output file
Please can you help

vgersh99 · February 20, 2019, 1:04pm

how about:

FNR==NR { f1_56[$3$4]=($5 OFS $6)
          f1_78[$3$4]=($7 OFS $8)
          next
        }
{$(NF+1)="O"}
!($1 in f1_56) { $1=substr($1,1,5) OFS substr($1,6)}
$1 in f1_56 {
  split(f1_56[$1], a, OFS)
  $2 = a[1]
  $3 = a[2]
  split(f1_78[$1], a, OFS)
  $1=a[1] OFS a[2]
  $NF="R"
}
1

jiam912 · February 20, 2019, 1:16pm

Hi vgersh99
I remove the OFS here:

  $1=a[1]  a[2]

because double , appears in the results.
Many thanks for your help.. It works perfectly now

31431,37113,318512.50,2334387.50,100,O
11111,22222,999991.70,0000000.30,100,R
99999,88888,111113.20,1111111.30,100,R
22222,33333,111113.20,1111111.30,100,R
31431,37491,323237.50,2334387.50,100,O

vgersh99 · February 20, 2019, 1:18pm

Take the latest code - I had to change something to split the first field.
You don't need to make the change you mentioned above....

jiam912 · February 20, 2019, 1:35pm

Hi vgersh99
Appreciate your help