Count specific character of a file in each line and delete this character in a specific position

teokon90 · August 8, 2018, 12:07pm

I will appreciate if you help me here in this script in Solaris Enviroment.

Scenario:

i have 2 files :

1) /tmp/TRANSACTIONS_DAILY_20180730.txt:

201807300000000004 
201807300000000005 
201807300000000006 
201807300000000007 
201807300000000008

2) /opt/TRANSACTIONS_DAILY_20180730.txt

20180730|201807300000000005||50001521111200|0106276-4|5SIJ00|WIRE||EUR|EUR|20180730|20180730|||||||0000000000030 0.00|00000000000300.00|Credit||||||||||SIJ|||500015|506|||||||||||||||||||||||||FI3158410220205399||||FI|SME5
20180730|201807300000000005||50001521111200|0106276-4|5SIJ00|WIRE||EUR|EUR|20180730|20180730|||||||00000000000300.00|00000000000300.00|Credit||||||||||SIJ|||500015|506|||||||||||||||||||||||||FI3158410220205399||||FI|SME5
20180730|201807300000000006||50001521111200|0106276-4|5SIJ00|WIRE||EUR|EUR|20180730|20180730|||||||00000000000050.00|00000000000050.00|Credit||||||||||SIJ|||500015|506|||||||||||||||||||||||||FI3650005020017008||||FI|SME5
20180730|201807300000000007||50001521111200|0106276-4|5SIJ00|WIRE||EUR|EUR|20180730|20180730|||||||00000000000015.00|00000000000015.00|Credit||||||||||SIJ|||500015|506|||||||||||||||||||||||||FI1958410220026068||||FI|SME5
20180730|201807300000000008||50001521111200|0106276-4|5SIJ00|WIRE||EUR|EUR|20180730|20180730|||||||00000000000300.00|00000000000300.00|Credit||||||||||SIJ|||500015|506|||||||||||||||||||||||||FI8358410220212320||||FI|SME5

i) I want to read each line of the first file and if this "string" line will exist to the second file will put whole transaction of the second file to a new file.

ii) This new file that is created we will count the " | " characters in each line and if they more than 64 in each line , the 61 " | " in the specific line will be deleted.

I have managed to do the i) part of the script but i need help for the ii) part.

MY code until now for i) part which works:

#!/bin/bash

while read line
do

grep "$line" /opt/TRANSACTIONS_DAILY_20180730.txt

done < /tmp/TRANSACTIONS_DAILY_20180730.txt > tmp/TRANSACTIONS_DAILY_NEW_20180730.txt

This specific Solaris enviroment doesnt support sed -r option neither grep -f or grep -o

vgersh99 · August 8, 2018, 12:17pm

Not quite sure about ii) requirement. You want to delete the whole line? You want to truncate the line to contain only 61 fields (I-separated)? Something else?
Could you provide another representative sample of say 10 fields requirement (instead of 61) and provide a desired output?
Thanks

teokon90 · August 8, 2018, 12:28pm

Basically i want from this new file that is created to delete the 61th pipe in each line " | " if in its line is more than 64th pipes " | " .

I dont want to delete the whole line neither to truncate.

Is it more understable now ?

RudiC · August 8, 2018, 12:39pm

I have to second vgersh99 in that your sample data should be refined. There should be lines that don't match either of your criteria. Right now, all file2 lines have a match and will be selected, and all have 64 separators. On top, with the sample given, removing the 61. separator or 60. or 62. wouldn't make a difference as they all are grouped together.

vgersh99 · August 8, 2018, 1:08pm

try this - a bit rough, but...:

awk -F'|' 'FNR==NR{f1[$1+0];next}($2+0) in f1 {if (NF>64) {$60=$60 $61;$61="";sub("[|]{61}","")}print}' OFS='|' /tmp/TRANSACTIONS_DAILY_20180730.txt /opt/TRANSACTIONS_DAILY_20180730.txt

MadeInGermany · August 8, 2018, 4:30pm

Deleting an input field in awk might not be portable.
Even this rmcol() function is not portable.
The following should work with Posix-compatible awk and sed. On Solaris requires the /usr/xpg4/bin/ versions.

#!/bin/bash
PATH=/usr/xpg4/bin:/bin:/usr/bin
awk 'NR==FNR { K[$1]; next } ($2 in K)' file1 FS="|" file2 |
sed '/\([^|]*|\)\{65\}/ s/|//61'

It uses a pipe between awk and sed.
Of course you can have an intermediate file as you stated in post #1

awk 'NR==FNR { K[$1]; next } ($2 in K)' file1 FS="|" file2 > newfile
sed '/\([^|]*|\)\{65\}/ s/|//61' newfile

RudiC · August 8, 2018, 6:02pm

Try also

awk -F\| '

function RMFLD(FNO, REP, PAT)   {for (i=1; i<FNO; i++) REP = REP $i FS
                                 PAT = REP $FNO
                                 if (index ($0, PAT) == 1) $0 = REP substr ($0, length (PAT) + 2) 
                                }

FNR == NR       {T[$1]
                 next
                }

                {for (t in T) if ($0 ~ t)       {if (NF > 65) RMFLD(61)
                                                 print
                                                }
                }
' file[12]

MadeInGermany · August 9, 2018, 9:16am

The latter script does not work with all awk versions; some convert the search key to a number that overflows, even after a ~ operator!
Fix: cast to a string ($0 ~ t"") .
But at this occasion I would add a full field match ("|"$0"|" ~ "[|]"t"[|]") .
Also the given file1 has trailing spaces, therefore it is advisable to use
awk 'script' file1 FS="|" file2 rather than awk -F\| 'script' file1 file2 , so file1 works with the default FS where $1 strips leading and trailing spaces.
Here is another all-in-awk solution that uses an array. Like the latter script it deletes field #61 - not the 61th delimiter.

#!/bin/bash
PATH=/usr/xpg4/bin:/bin:/usr/bin
awk '
function prtARR() {
  out=ARR[1]
  for (a=2; a<=nARR; a++) out=(out FS ARR[a])
  print out
}
function rmARR(num) {
  for (a=num; a<nARR; a++) ARR[a]=ARR[a+1]
  nARR--
}
NR==FNR {
  K[$1]; next
}
{
  nARR=split($0,ARR)
  if (nARR>65) rmARR(61)
  prtARR()
}
' file1 FS="|" file2

teokon90 · August 16, 2018, 1:27pm

Until now this code works for me :

#!/bin/bash
PATH=/usr/xpg4/bin:/bin:/usr/bin

while read line
do

grep "$line" /tmp/BadTransactions/test_data_for_validation_script.txt

awk 'NR==FNR { K[$1]; next } ($2 in K)' /tmp/BadTransactions/TRANSACTIONS_DAILY_20180730.txt FS="|" /opt/NorkomC
onfigS2/inbox/TRANSACTIONS_DAILY_20180730.txt > /tmp/BadTransactions/TRANSACTIONS_DAILY_NEW_20180730.txt

sed '/\([^|]*[|]\)\{65\}/ s/|//61' /tmp/BadTransactions/TRANSACTIONS_DAILY_NEW_20180730.txt

done < /tmp/BadTransactions/TRANSACTIONS_DAILY_20180730.txt > /tmp/BadTransactions/TRANSACTIONS_DAILY_NEW_201807
30.txt

So until now if there are more than 64th pipes in each line , it delete the 61th pipe.

Now , i want to delete the 61th pipe in each line if the line has more than 64 pipes until the line reaches the 64 pipes in whole line

What i mean :

If a line has for example 67 pipes , it will delete the 61th pipe , then it will go again to the same line and now it will check that it has more than 64 pipes(which actually has 66 now ) and i t will delete the 61th pipe.

This will be continued until the pipes are more than 64.

Could you please suggest me any idea how to loop that ?

Thank you

------ Post updated at 07:27 PM ------

Until now this code works for me :

Code:

#!/bin/bash
PATH=/usr/xpg4/bin:/bin:/usr/bin

while read line
do

grep "$line" /tmp/BadTransactions/test_data_for_validation_script.txt

awk 'NR==FNR { K[$1]; next } ($2 in K)' /tmp/BadTransactions/TRANSACTIONS_DAILY_20180730.txt FS="|" /opt/NorkomC
onfigS2/inbox/TRANSACTIONS_DAILY_20180730.txt > /tmp/BadTransactions/TRANSACTIONS_DAILY_NEW_20180730.txt

sed '/\([^|]*[|]\)\{65\}/ s/|//61' /tmp/BadTransactions/TRANSACTIONS_DAILY_NEW_20180730.txt

done < /tmp/BadTransactions/TRANSACTIONS_DAILY_20180730.txt > /tmp/BadTransactions/TRANSACTIONS_DAILY_NEW_201807
30.txt

So until now if there are more than 64th pipes in each line , it delete the 61th pipe.

Now , i want to delete the 61th pipe in each line if the line has more than 64 pipes until the line reaches the 64 pipes in whole line

What i mean :

If a line has for example 67 pipes , it will delete the 61th pipe , then it will go again to the same line and now it will check that it has more than 64 pipes(which actually has 66 now ) and i t will delete the 61th pipe.

This will be continued until the pipes are more than 64.

Could you please suggest me any idea how to loop that ?

Thank you

Don_Cragun · August 20, 2018, 8:37am

As has been noted before, the sample files you provided in post #1 in this thread do not test any of your requirements. All of the lines in your second sample file have a field #2 value that matches a value found in your first sample file. And, all lines in your second sample file have exactly 64 pipe symbols (so there is never any need to remove any pipe symbols) to achieve your goal. Using your sample input files, your second sample input file is identical to the output you say you want.

You say that the code you have shown us in post #9 in this thread works until now. That means that something has changed recently and that it no longer does what you want it to do. What has changed? In what way does it fail to produce the output you want?

I note that the awk in your inner loop redirects its standard output to the same file to which the outer loop redirects its standard output. That would usually have the effect of throwing away everything written to that file except for the output produced by the last invocation of awk and the last invocation of sed .

Please give us two small sample input files that actually test the features you want your code to provide and also give us a sample output file that is the exact output you want from those sample input files.

I think I have a fairly simple awk script that does what you want, but with no way to test it, I'm not sure that I have understood your requirements. Also, it assumes that the IDs found in your first file can be found in the second field of your second input file (as shown in your sample input files in post #1 in this thread). Is this a valid assumption, or does the code you want need to look for those IDs in every field in your second input file?

MadeInGermany · August 21, 2018, 10:33am

If you want to stick with your sed script, you can augment it with a loop:

sed -e ':Loop' -e '/\([^|]*[|]\)\{65\}/ s/|//61; tLoop'

The t branches to Loop if there was a successful substitution.
Alternatively you can do an unconditional branch if you put it in a { } block. The / / provides the condition for the whole block.

sed -e ':Loop' -e '/\([^|]*[|]\)\{65\}/{' -e 's/|//61; bLoop' -e '}'