Replacing column value in file1 if there's a match from file2

pmchang · June 23, 2023, 3:52pm

Hi Forum,

I hope you can help me troubleshoot as to why the following command is not working as expected. I want to replace the first column from org_file.dat with the second column from lookup_file.dat if the first column in both files match:

I have the following 2 csv files:

org_file.dat:
2210280,01/30/2023 17:42:38,01/30/2023 00:00:00,50.0000000000,DR,SWING,SAV,Withdrawal,N/A,557.6000000000,2211,,
5732284,01/30/2023 23:30:44,01/30/2023 00:00:00,45.0000000000,DR,NSFRET,NSF - EFT Withdrawal,N/A,531.7300000000,5536,,
11371406,01/30/2023 15:57:10,01/30/2023 00:00:00,309.3500000000,CR,DDIEMTC,DDA Deposit - E-Mail,N/A,618.3100000000,2024,,
12237880,01/30/2023 22:35:58,01/30/2023 00:00:00,420.0000000000,CR,SDMRDC,SAV Deposit - Mobile Remote Deposit,N/A,69282.7500000000,641,,

lookup_file.dat:
12237880,123
11371406,456
5732284,789
2210280,890

Desired Output:

890,01/30/2023 17:42:38,01/30/2023 00:00:00,50.0000000000,DR,SWING,SAV,Withdrawal,N/A,557.6000000000,2211,,
789,01/30/2023 23:30:44,01/30/2023 00:00:00,45.0000000000,DR,NSFRET,NSF - EFT Withdrawal,N/A,531.7300000000,5536,,
456,01/30/2023 15:57:10,01/30/2023 00:00:00,309.3500000000,CR,DDIEMTC,DDA Deposit - E-Mail,N/A,618.3100000000,2024,,
123,01/30/2023 22:35:58,01/30/2023 00:00:00,420.0000000000,CR,SDMRDC,SAV Deposit - Mobile Remote Deposit,N/A,69282.7500000000,641,,

This is the awk command that I ran but I'm not getting the expected results:

awk -vOFS="," 'NR==FNR{a[$1]=$2; next}{$1=a[$1]; print}' lookup_file.dat org_file.dat

Thank you for your time and feedback.

munkeHoller · June 23, 2023, 4:08pm

try

awk 'NR==FNR{a[$1]=$2; next} { if ( $1 in a) { $1=a[$1]; print $0}}' FS=',' OFS=',' lookup_file.dat org_file.dat

pmchang · June 23, 2023, 5:02pm

Thank you mukeHoller - works like a charm.

MadeInGermany · June 23, 2023, 5:25pm

You missed a FS="," or -F"," (so it defaults to whitespace-separated columns).

You did not say what should happen if there is no column1 match. The suggested solution omits such a row.

pmchang · June 23, 2023, 5:58pm

Thanks MadeInGermany.

If there's no match, then it should keep the original value.

munkeHoller · June 23, 2023, 6:03pm

in that case, try the following

awk 'NR==FNR{a[$1]=$2; next} { if ($1 in a) $1=a[$1]} 1' FS=',' OFS=',' lookup_file.dat org_file.dat

pmchang · June 28, 2023, 7:08pm

Sorry to revive this topic - I have a new set of file requirements and I tried to adopt the previous code (that worked correctly) but I'm not getting the expected results.

I now have 2 pipe delimited files and want to replace col#15 from file1 with col#2 from file2 if col#15 (from file1) matches with col#1 (from file2).

file1:

"20230130"|183798|"14647"|"45017206"|"Y"|"20201029"||"N"|99000006|3500.00|3500.00|1050.00|1050.00|"20201029"|**13417669**|"Active"|19.95|19.95|0.00|0.00|"20221016"|0|0|0|1|0.00||"CAD"|"Pre-approved lients"|"Unsecured"|"Personal"|"1905"|"CAD"|"20220619"|"20240515"||"20221004"||"N"|"N/A"|"RNA"||3500.00|0|0.00|0|0.00|0.00|0|745|0.00|0.00|0.00|0.00|0.00|0.00|0.00|0.00|0.00|0.00|0.00|0|0.00|749|"20230116"|16|2476.15|"20210623"||"20220825"|"20220916"|"20221004"|"20210616"|"20201106"||||849|"20230606"|"Credit Card"

file2:

13417669|TSYS~355320
10435029|TSYS~148824
10435030|TSYS~148845
10435031|TSYS~148865

Desired File:

"20230130"|183798|"14647"|"45017206"|"Y"|"20201029"||"N"|99000006|3500.00|3500.00|1050.00|1050.00|"20201029"|**TSYS~355320**|"Active"|19.95|19.95|0.00|0.00|"20221016"|0|0|0|1|0.00||"CAD"|"Pre-approved clients"|"Unsecured"|"Personal"|"1905"|"CAD"|"20220619"|"20240515"||"20221004"||"N"|"N/A"|"RNA"||3500.00|0|0.00|0|0.00|0.00|0|745|0.00|0.00|0.00|0.00|0.00|0.00|0.00|0.00|0.00|0.00|0.00|0|0.00|749|"20230116"|16|2476.15|"20210623"||"20220825"|"20220916"|"20221004"|"20210616"|"20201106"||||849|"20230606"|"Credit Card"

Here's my code attempt - what am I missing here?

awk 'NR==FNR{a[$1]=$2; next} { if ($15 in a) $15=a[$1]} 1' FS='|' OFS='|' file2 file1

Thank you in advance for your time and feedback.

vgersh99 · June 28, 2023, 7:52pm

@pmchang,
For the future... please start a new thread if your requirements change. The original ask was answered.

Is your file1's 15th field actually has surrounding **-s on both sides of the value OR is it your way of indicating what needs to be replaced?
If it's the former:

$ awk -F'|' 'FNR==NR{f2["**"$1"**"]="**"$2"**";next} $15 in f2 {$15=f2[$15]}1' OFS='|' file2.txt file1.txt
"20230130"|183798|"14647"|"45017206"|"Y"|"20201029"||"N"|99000006|3500.00|3500.00|1050.00|1050.00|"20201029"|**TSYS~355320**|"Active"|19.95|19.95|0.00|0.00|"20221016"|0|0|0|1|0.00||"CAD"|"Pre-approved clients"|"Unsecured"|"Personal"|"1905"|"CAD"|"20220619"|"20240515"||"20221004"||"N"|"N/A"|"RNA"||3500.00|0|0.00|0|0.00|0.00|0|745|0.00|0.00|0.00|0.00|0.00|0.00|0.00|0.00|0.00|0.00|0.00|0|0.00|749|"20230116"|16|2476.15|"20210623"||"20220825"|"20220916"|"20221004"|"20210616"|"20201106"||||849|"20230606"|"Credit Card"

pmchang · June 28, 2023, 8:20pm

Thanks vgersh99 but 13417669 was the value that I wanted to replace.

It added the "**" when I tried to bold the text.

vgersh99 · June 28, 2023, 8:28pm

in the future, don't modify your input/output - post them "as is" and describe (like you did).
BTW, you cannot "bold" the text within the markdown code block for "code".

I don't know why your attempt didn't work. Make sure your input is ONE contiguous line (with no line breaks/wraps - I had to fix your original input data sample to join the wrapped lines to make it 1 contiguous line) - mine is similar to yours:

$ awk -F'|' 'FNR==NR{f2[$1]=$2;next} $15 in f2 {$15=f2[$15]}1' OFS='|' file2.txt file1.txt
"20230130"|183798|"14647"|"45017206"|"Y"|"20201029"||"N"|99000006|3500.00|3500.00|1050.00|1050.00|"20201029"|TSYS~355320|"Active"|19.95|19.95|0.00|0.00|"20221016"|0|0|0|1|0.00||"CAD"|"Pre-approved clients"|"Unsecured"|"Personal"|"1905"|"CAD"|"20220619"|"20240515"||"20221004"||"N"|"N/A"|"RNA"||3500.00|0|0.00|0|0.00|0.00|0|745|0.00|0.00|0.00|0.00|0.00|0.00|0.00|0.00|0.00|0.00|0.00|0|0.00|749|"20230116"|16|2476.15|"20210623"||"20220825"|"20220916"|"20221004"|"20210616"|"20201106"||||849|"20230606"|"Credit Card"

MadeInGermany · June 29, 2023, 10:17am

It must be $15=a[$15] not $15=a[$1]

awk 'NR==FNR{a[$1]=$2; next} ($15 in a){$15=a[$15]} 1' FS='|' OFS='|' file2 file1

If you have embedded newlines in quoted fields then try my sed script from

sed '
# Save input in hold buffer
  h
  :Q1
# Substitute an opening "
  s/"//
# If success jump to :Q2
  tQ2
# Restore input; jump to the end (next cycle)
  x; b
  :Q2
# Substitute a closing "
  s/"//
# If success jump to :Q1
  tQ1
# Restore input; jump to the end if last line (having unbalanced quotes)
  x; $b
# Append the next line with a separating space
  N; s/\n/ /
# Save input in hold buffer; jump to :Q1
  h; bQ1
' file1 | awk 'NR==FNR{a[$1]=$2; next} ($15 in a){$15=a[$15]} 1' FS='|' OFS='|' file2 -

pmchang · June 29, 2023, 12:50pm

Thank you to both of you.

I have noted all of your recommendations.

system · April 24, 2024, 12:50pm

This topic was automatically closed 300 days after the last reply. New replies are no longer allowed.