suffix a sequence in awk

hi

I have a string pattern like

...
...
000446448742    00432265               040520100408 21974435      DEWSWATER GARRIER AAG IK4000            N 017500180000000000000000077000000000100
000446448742    00580937               040520100408 32083576      PEWSWATER BARRIER DAG GK4000            N 017500180000000000000000077000000000100
000446448742    00580937               040520100408 32083576      PEWSWATER BARRIER DAG GK4000            N 017509900000000000000000077000000000100
000446448742    00543376               040520100408 43194667      KEWSWATER FARRIER NAG HK4000            N 017500180000000000000000077000000000100
...
...

I am trying to use an awk code that will search every line and for a given

substr($0,17,8) 

value compute SUM as sum of corresponding

substr($0,114,6)

.

And if this SUM exceeds one million, then add an alphabetical suffix but ensure that its overall length is no greater than eight characters. So that the above data transforms to

...
...
000446448742    00432265               040520100408 21974435      DEWSWATER GARRIER AAG IK4000            N 017500180000000000000000077000000000100
000446448742    0580937A               040520100408 32083576      PEWSWATER BARRIER DAG GK4000            N 017500180000000000000000077000000000100
000446448742    0580937B               040520100408 32083576      PEWSWATER BARRIER DAG GK4000            N 017500180000000000000000077000000000100
000446448742    00543376               040520100408 43194667      KEWSWATER FARRIER NAG HK4000            N 017509900000000000000000077000000000100
...
...

I have written a code in UNIX awk like

awk -v RECREATE_FILE=re.txt 'BEGIN {SUFFIX="A"};
{
SHIP_NUMBER=substr($0,17,8)
QTY_DELIVERED=substr($0,114,6)
TOT_QTY[SHIP_NUMBER]+=QTY_DELIVERED
DATA_VAL[SHIP_NUMBER]=$0"^"DATA_VAL[SHIP_NUMBER]
};
END {
for (SHIPMENT_NUMBER in DATA_VAL)
{
if(TOT_QTY[SHIPMENT_NUMBER]<1000000) {
#print DATA_VAL[SHIPMENT_NUMBER] > CLEAN_FILE
}
else{
i=split(DATA_VAL[SHIPMENT_NUMBER],GT_ONE_MIL,"^");
for (j=1;j<=i;j++)
{
TEMP_PO=int(SHIPMENT_NUMBER)SUFFIX++
gsub(SHIPMENT_NUMBER,"%08s"TEMP_PO,GT_ONE_MIL[j])
print GT_ONE_MIL[j] > RECREATE_FILE
}
}
}
}' <input_file_name>

But the output that I am getting is

cat re.txt
000446448742    %08s8590730               040520100408 32083576      PEWSWATER BARRIER DAG GK4000            N 017509900000000000000000077000000000100
000446448742    %08s8590731               040520100408 32083576      PEWSWATER BARRIER DAG GK4000            N 017500180000000000000000077000000000100

Can you please help me here :(.

awk -v RECREATE_FILE=re.txt 'BEGIN {STR="A B C D E F G H I J K L M N O P Q R S T U V W X Y Z"; split(STR,SUFFIX, " "); a=1};
NR==FNR { TOT_QTY[$2]+=substr($NF,6,6) }
NR>FNR { for (i in TOT_QTY) { (($2==i)&&(TOT_QTY[$2]>1000000))?$2=substr($2,2,7)  SUFFIX[a++]:$2=$2}
         print $0 > RECREATE_FILE}' urfile urfile
$ cat re.txt
000446448742 00432265 040520100408 21974435 DEWSWATER GARRIER AAG IK4000 N 017500180000000000000000077000000000100
000446448742 0580937A 040520100408 32083576 PEWSWATER BARRIER DAG GK4000 N 017500180000000000000000077000000000100
000446448742 0580937B 040520100408 32083576 PEWSWATER BARRIER DAG GK4000 N 017509900000000000000000077000000000100
000446448742 00543376 040520100408 43194667 KEWSWATER FARRIER NAG HK4000 N 017500180000000000000000077000000000100
1 Like

Hi

I can see that the above solution will work. I have now tried to find out a manner in which the suffix will start with:

AA

[CENTER]

AB

then

AC

....
....
when the suffix reaches

Z

Can this be feasible?

not sure if anybody can help on this?

not clear.

So you need export as:

AA, AB, AC,.... BA, BB, BC, ..... ZZ ?

So above sample will convert from 00580937 to 580937AA

hey rdcwayx

The answer to your question is yes. To summarise,

intially the suffix needs to start from

A, B, C, ..., Z

so that the 'processed' output (or as you call export) looks like

0580937A, 0580937B, 0580937C, ... 0580937Z 

and when the suffix reaches Z then 'processed' output looks need to appear as

580937AA, 580937AB, 580937AC, ... , 580937AZ, 580937BA, ...  580937ZZ 

i.e. in brief

0580937A, 0580937B, ..., 0580937Z, 580937AA, 580937AB, ...  580937ZZ 

hope I'm making sense with my query.

The silly way to use my code without big change is:

  1. First generate the sequence
$ echo {A..Z}{A..Z}
AA AB AC AD AE AF AG AH AI AJ AK AL AM AN AO AP AQ AR AS AT AU AV AW AX AY AZ BA BB BC BD BE BF BG BH BI BJ BK BL BM BN BO BP BQ BR BS BT BU BV BW BX BY BZ CA CB CC CD CE CF CG CH CI CJ CK CL CM CN CO CP CQ CR CS CT CU CV CW CX CY CZ DA DB DC DD DE DF DG DH DI DJ DK DL DM DN DO DP DQ DR DS DT DU DV DW DX DY DZ EA EB EC ED EE EF EG EH EI EJ EK EL EM EN EO EP EQ ER ES ET EU EV EW EX EY EZ FA FB FC FD FE FF FG FH FI FJ FK FL FM FN FO FP FQ FR FS FT FU FV FW FX FY FZ GA GB GC GD GE GF GG GH GI GJ GK GL GM GN GO GP GQ GR GS GT GU GV GW GX GY GZ HA HB HC HD HE HF HG HH HI HJ HK HL HM HN HO HP HQ HR HS HT HU HV HW HX HY HZ IA IB IC ID IE IF IG IH II IJ IK IL IM IN IO IP IQ IR IS IT IU IV IW IX IY IZ JA JB JC JD JE JF JG JH JI JJ JK JL JM JN JO JP JQ JR JS JT JU JV JW JX JY JZ KA KB KC KD KE KF KG KH KI KJ KK KL KM KN KO KP KQ KR KS KT KU KV KW KX KY KZ LA LB LC LD LE LF LG LH LI LJ LK LL LM LN LO LP LQ LR LS LT LU LV LW LX LY LZ MA MB MC MD ME MF MG MH MI MJ MK ML MM MN MO MP MQ MR MS MT MU MV MW MX MY MZ NA NB NC ND NE NF NG NH NI NJ NK NL NM NN NO NP NQ NR NS NT NU NV NW NX NY NZ OA OB OC OD OE OF OG OH OI OJ OK OL OM ON OO OP OQ OR OS OT OU OV OW OX OY OZ PA PB PC PD PE PF PG PH PI PJ PK PL PM PN PO PP PQ PR PS PT PU PV PW PX PY PZ QA QB QC QD QE QF QG QH QI QJ QK QL QM QN QO QP QQ QR QS QT QU QV QW QX QY QZ RA RB RC RD RE RF RG RH RI RJ RK RL RM RN RO RP RQ RR RS RT RU RV RW RX RY RZ SA SB SC SD SE SF SG SH SI SJ SK SL SM SN SO SP SQ SR SS ST SU SV SW SX SY SZ TA TB TC TD TE TF TG TH TI TJ TK TL TM TN TO TP TQ TR TS TT TU TV TW TX TY TZ UA UB UC UD UE UF UG UH UI UJ UK UL UM UN UO UP UQ UR US UT UU UV UW UX UY UZ VA VB VC VD VE VF VG VH VI VJ VK VL VM VN VO VP VQ VR VS VT VU VV VW VX VY VZ WA WB WC WD WE WF WG WH WI WJ WK WL WM WN WO WP WQ WR WS WT WU WV WW WX WY WZ XA XB XC XD XE XF XG XH XI XJ XK XL XM XN XO XP XQ XR XS XT XU XV XW XX XY XZ YA YB YC YD YE YF YG YH YI YJ YK YL YM YN YO YP YQ YR YS YT YU YV YW YX YY YZ ZA ZB ZC ZD ZE ZF ZG ZH ZI ZJ ZK ZL ZM ZN ZO ZP ZQ ZR ZS ZT ZU ZV ZW ZX ZY ZZ
  1. Update the code with that sequence to:
awk -v RECREATE_FILE=re.txt 'BEGIN {STR="A B C D E F G H I J K L M N O P Q R S T U V W X Y Z AA AB AC AD AE AF AG AH AI AJ AK AL AM AN AO AP AQ AR AS AT AU AV AW AX AY AZ BA BB BC BD BE BF BG BH BI BJ BK BL BM BN BO BP BQ BR BS BT BU BV BW BX BY BZ CA CB CC CD CE CF CG CH CI CJ CK CL CM CN CO CP CQ CR CS CT CU CV CW CX CY CZ DA DB DC DD DE DF DG DH DI DJ DK DL DM DN DO DP DQ DR DS DT DU DV DW DX DY DZ EA EB EC ED EE EF EG EH EI EJ EK EL EM EN EO EP EQ ER ES ET EU EV EW EX EY EZ FA FB FC FD FE FF FG FH FI FJ FK FL FM FN FO FP FQ FR FS FT FU FV FW FX FY FZ GA GB GC GD GE GF GG GH GI GJ GK GL GM GN GO GP GQ GR GS GT GU GV GW GX GY GZ HA HB HC HD HE HF HG HH HI HJ HK HL HM HN HO HP HQ HR HS HT HU HV HW HX HY HZ IA IB IC ID IE IF IG IH II IJ IK IL IM IN IO IP IQ IR IS IT IU IV IW IX IY IZ JA JB JC JD JE JF JG JH JI JJ JK JL JM JN JO JP JQ JR JS JT JU JV JW JX JY JZ KA KB KC KD KE KF KG KH KI KJ KK KL KM KN KO KP KQ KR KS KT KU KV KW KX KY KZ LA LB LC LD LE LF LG LH LI LJ LK LL LM LN LO LP LQ LR LS LT LU LV LW LX LY LZ MA MB MC MD ME MF MG MH MI MJ MK ML MM MN MO MP MQ MR MS MT MU MV MW MX MY MZ NA NB NC ND NE NF NG NH NI NJ NK NL NM NN NO NP NQ NR NS NT NU NV NW NX NY NZ OA OB OC OD OE OF OG OH OI OJ OK OL OM ON OO OP OQ OR OS OT OU OV OW OX OY OZ PA PB PC PD PE PF PG PH PI PJ PK PL PM PN PO PP PQ PR PS PT PU PV PW PX PY PZ QA QB QC QD QE QF QG QH QI QJ QK QL QM QN QO QP QQ QR QS QT QU QV QW QX QY QZ RA RB RC RD RE RF RG RH RI RJ RK RL RM RN RO RP RQ RR RS RT RU RV RW RX RY RZ SA SB SC SD SE SF SG SH SI SJ SK SL SM SN SO SP SQ SR SS ST SU SV SW SX SY SZ TA TB TC TD TE TF TG TH TI TJ TK TL TM TN TO TP TQ TR TS TT TU TV TW TX TY TZ UA UB UC UD UE UF UG UH UI UJ UK UL UM UN UO UP UQ UR US UT UU UV UW UX UY UZ VA VB VC VD VE VF VG VH VI VJ VK VL VM VN VO VP VQ VR VS VT VU VV VW VX VY VZ WA WB WC WD WE WF WG WH WI WJ WK WL WM WN WO WP WQ WR WS WT WU WV WW WX WY WZ XA XB XC XD XE XF XG XH XI XJ XK XL XM XN XO XP XQ XR XS XT XU XV XW XX XY XZ YA YB YC YD YE YF YG YH YI YJ YK YL YM YN YO YP YQ YR YS YT YU YV YW YX YY YZ ZA ZB ZC ZD ZE ZF ZG ZH ZI ZJ ZK ZL ZM ZN ZO ZP ZQ ZR ZS ZT ZU ZV ZW ZX ZY ZZ"; split(STR,SUFFIX, " "); a=1};
NR==FNR { TOT_QTY[$2]+=substr($NF,6,6) }
NR>FNR { for (i in TOT_QTY) { (($2==i)&&(TOT_QTY[$2]>1000000)&&a<=26)?$2=substr($2,2,7)  SUFFIX[a++]:$2=$2 ;
                             (($2==i)&&(TOT_QTY[$2]>1000000)&&a>26)?$2=substr($2,3,6)  SUFFIX[a++]:$2=$2}
         print $0 > RECREATE_FILE}' urfile urfile

(Code is not tested.)

Otherwise, you have to rewrite the code with your new rules.