awk to store in a list the values of certain fileds

Dear Group,

I have following input:

sa;sb;sc;sd;period;ma;mb;mc;md;me
sa1;sb1;sc1;sd1;200001;ma1;mb1;mc1;md1;me1
sa2;sb2;sc2;sd2;200002;ma2;mb2;mc2;md2;me2
sa3;sb3;sc3;sd3;200003;ma3;mb3;mc3;md3;me3

first line contains the headers!

I want to create with one pass the following output:

sa1|sb1|sc1|sd1|200001_ma|ma1|
sa1|sb1|sc1|sd1|200001_mb|mb1|
sa1|sb1|sc1|sd1|200001_mc|mc1|
sa1|sb1|sc1|sd1|200001_md|md1|
sa1|sb1|sc1|sd1|200001_me|me1|
sa2|sb2|sc2|sd2|200002_ma|ma2|
sa2|sb2|sc2|sd2|200002_mb|mb2|
sa2|sb2|sc2|sd2|200002_mc|mc2|
sa2|sb2|sc2|sd2|200002_md|md2|
sa2|sb2|sc2|sd2|200002_me|me2|
sa3|sb3|sc3|sd3|200003_ma|ma3|
sa3|sb3|sc3|sd3|200003_mb|mb3|
sa3|sb3|sc3|sd3|200003_mc|mc3|
sa3|sb3|sc3|sd3|200003_md|md3|
sa3|sb3|sc3|sd3|200003_me|me3|

tried following one liner with partial success:

awk -F\; 'NR==1{for(ctyi=1;ctyi<=NF;ctyi++) if ($ctyi ~ /period/) found=++ctyi;for(m=found;m<=NF;m++) vector[$m];print vector[$m]}NR>1{i=$1"|"$2"|"$3"|"$4"|"$5;for(x=6;x<=NF;x++)print i"_""|"$x"|"}' in.csv > out.txt

I cannot read and add the 'ma mb mc md me'! The second part:

NR>1{i=$1"|"$2"|"$3"|"$4"|"$5;for(x=6;x<=NF;x++)print i"_""|"$x"|"}

works as expected. It does produce the following output:

sa1|sb1|sc1|sd1|200001_|ma1|
sa1|sb1|sc1|sd1|200001_|mb1|
sa1|sb1|sc1|sd1|200001_|mc1|
sa1|sb1|sc1|sd1|200001_|md1|
sa1|sb1|sc1|sd1|200001_|me1|
sa2|sb2|sc2|sd2|200002_|ma2|
sa2|sb2|sc2|sd2|200002_|mb2|
sa2|sb2|sc2|sd2|200002_|mc2|
sa2|sb2|sc2|sd2|200002_|md2|
sa2|sb2|sc2|sd2|200002_|me2|
sa3|sb3|sc3|sd3|200003_|ma3|
sa3|sb3|sc3|sd3|200003_|mb3|
sa3|sb3|sc3|sd3|200003_|mc3|
sa3|sb3|sc3|sd3|200003_|md3|
sa3|sb3|sc3|sd3|200003_|me3|

In the first part:

NR==1{for(ctyi=1;ctyi<=NF;ctyi++) if ($ctyi ~ /period/) found=++ctyi;for(m=found;m<=NF;m++) vector[$m];print vector[$m]}

I locate the position of the field 'period' in order to check which fields have to be considered till the end of the record. The number of fields after 'period' can vary!

Thanks for your help.

You can try this:

awk 'NR==1 {next}
{for (x=6;x<=NF;x++) {
    print $1 OFS $2 OFS $3 OFS $4 OFS $5"_"substr($x,1,match($x,/[0-9]/)-1) OFS $x OFS
    }
}' FS=\; OFS=\| in.csv > out.txt
1 Like

Thank you very much for your prompt reply and this did work well but there is a catch :frowning:

The 'ma mb mc md me' are actually dummy values! Their length can vary! So it can be 'maaaaaaa' or any other value!

Please try the updated code.

is producing the same result!

For example, let's assume you have the following input:

sa;sb;sc;sd;period;ma;mbbbbbbb;mcc;mddddddddddddd;m
sa1;sb1;sc1;sd1;200001;ma1;mb1;mc1;md1;me1
sa2;sb2;sc2;sd2;200002;ma2;mb2;mc2;md2;me2
sa3;sb3;sc3;sd3;200003;ma3;mb3;mc3;md3;me3

and you want to transform to:

sa1|sb1|sc1|sd1|200001_ma|ma1|
sa1|sb1|sc1|sd1|200001_mbbbbbbb|mb1|
sa1|sb1|sc1|sd1|200001_mcc|mc1|
sa1|sb1|sc1|sd1|200001_mddddddddddddd|md1|
sa1|sb1|sc1|sd1|200001_m|me1|
sa2|sb2|sc2|sd2|200002_ma|ma2|
sa2|sb2|sc2|sd2|200002_mbbbbbbb|mb2|
sa2|sb2|sc2|sd2|200002_mcc|mc2|
sa2|sb2|sc2|sd2|200002_mddddddddddddd|md2|
sa2|sb2|sc2|sd2|200002_m|me2|
sa3|sb3|sc3|sd3|200003_ma|ma3|
sa3|sb3|sc3|sd3|200003_mbbbbbbb|mb3|
sa3|sb3|sc3|sd3|200003_mcc|mc3|
sa3|sb3|sc3|sd3|200003_mddddddddddddd|md3|
sa3|sb3|sc3|sd3|200003_m|me3|

Thank you for your time

Sorry I get you now:

awk 'NR==1 {
for (i=6;i<=NF;i++) {
    h=$i
}
next
}
{for (x=6;x<=NF;x++) {
    print $1 OFS $2 OFS $3 OFS $4 OFS $5"_"h[x] OFS $x OFS
    }
}' FS=\; OFS=\|  in.csv > out.txt 
1 Like

That did the job!

Thank you

Try also

awk     'NR==1  {MX=NF; for (i=NF;i>=1;i--) {if ($i=="period") break; T=$i};i+=1;next}
                {for (j=i; j<=NF; j++) print $1, $2, $3, $4, $5"_"T[j], $(6-i+j), _}
        ' FS=";" OFS="|" file
1 Like

worked indeed. Thank you very much for your support