Insert FileName to headers

hi there

I have a lot of files with same number of rows and columns.$2 and $3 are the same in all files .
I run awk script and merge $2,$3,$1 from first file and $1 from another files.

I would like to know how to modify awk script to insert filename as header for merged columns ( Expected Result).

Thanks in advance.

awk 'BEGIN{OFS = "\t"}
  NR == FNR {a[$2 " " $3] = $1; next}
  {a[$2 " " $3] = (a[$2 " " $3] OFS $1)}
  END {for(x in a) print (x OFS a[x])}' *  > Result.csv 

FileName:SG1

  76   3.62945 101.54361  2.000
  76   3.62945 101.54862  2.000
  75   3.62945 101.55362  2.000
  73   3.62945 101.55863  2.000
  76   3.62446 101.54361  2.000
  77   3.62446 101.54862  2.000
  77   3.62446 101.55362  2.000
  76   3.62446 101.55863  2.000
  77   3.61946 101.54361  2.000
  78   3.61946 101.54862  2.000
  78   3.61946 101.55362  2.000
  79   3.61946 101.55863  2.000
  77   3.61447 101.54361  2.000
  78   3.61447 101.54862  2.000
  78   3.61447 101.55362  2.000
  80   3.61447 101.55863  2.000

FileName:SG2

  74   3.62945 101.54361  2.000
  74   3.62945 101.54862  2.000
  74   3.62945 101.55362  2.000
  74   3.62945 101.55863  2.000
  74   3.62446 101.54361  2.000
  74   3.62446 101.54862  2.000
  74   3.62446 101.55362  2.000
  76   3.62446 101.55863  2.000
  75   3.61946 101.54361  2.000
  75   3.61946 101.54862  2.000
  75   3.61946 101.55362  2.000
  78   3.61946 101.55863  2.000
  75   3.61447 101.54361  2.000
  76   3.61447 101.54862  2.000
  76   3.61447 101.55362  2.000
  79   3.61447 101.55863  2.000

FileName:SG3

  71   3.62945 101.54361  2.000
  71   3.62945 101.54862  2.000
  71   3.62945 101.55362  2.000
  74   3.62945 101.55863  2.000
  72   3.62446 101.54361  2.000
  72   3.62446 101.54862  2.000
  72   3.62446 101.55362  2.000
  76   3.62446 101.55863  2.000
  72   3.61946 101.54361  2.000
  73   3.61946 101.54862  2.000
  74   3.61946 101.55362  2.000
  78   3.61946 101.55863  2.000
  74   3.61447 101.54361  2.000
  75   3.61447 101.54862  2.000
  75   3.61447 101.55362  2.000
  79   3.61447 101.55863  2.000

Result AWK

3.62945 101.55362	75	74	71
3.61447 101.55863	80	79	79
3.61447 101.55362	78	76	75
3.61946 101.54862	78	75	73
3.61946 101.54361	77	75	72
3.62945 101.55863	73	74	74
3.61946 101.55863	79	78	78
3.61946 101.55362	78	75	74
3.62446 101.54862	77	74	72
3.62446 101.54361	76	74	72
3.62446 101.55863	76	76	76
3.62446 101.55362	77	74	72
3.62945 101.54862	76	74	71
3.62945 101.54361	76	74	71
3.61447 101.54862	78	76	75
3.61447 101.54361	77	75	74

Expected Result:

                        SG1     SG2     SG3
3.62945 101.55362	75	74	71
3.61447 101.55863	80	79	79
3.61447 101.55362	78	76	75
3.61946 101.54862	78	75	73
3.61946 101.54361	77	75	72
3.62945 101.55863	73	74	74
3.61946 101.55863	79	78	78
3.61946 101.55362	78	75	74
3.62446 101.54862	77	74	72
3.62446 101.54361	76	74	72
3.62446 101.55863	76	76	76
3.62446 101.55362	77	74	72
3.62945 101.54862	76	74	71
3.62945 101.54361	76	74	71
3.61447 101.54862	78	76	75
3.61447 101.54361	77	75	74

Try (untested):

awk 'BEGIN          {OFS = "\t"; HD=OFS OFS}
     FNR == 1       {HD=HD OFS FILENAME}
     FNR == NR      {a[$2 " " $3] = $1; next}
                    {a[$2 " " $3] = (a[$2 " " $3] OFS $1)}
     END            {print HD
                     for(x in a) print (x OFS a[x])}
    ' *  > Result.csv
1 Like

Thank you for your responding. But your code didn`t work well.

the result:

SG1130113000247.CAPNFXS.csv.csv    SG1130113001247.CAPNFXT.csv.csv     SG1130113002247.CAPNFXU.csv.csv
76,3.62945,101.54361,2.000 76,3.62945,101.54862,2.000     80,3.61447,101.55863,2.000
         75,3.62945,101.54361,2.000    75,3.62945,101.54862,2.000     75,3.62945,101.55362,2.000    76,3.62945,101.55863,2.000     76,3.62446,101.54361,2.000    76,3.62446,101.54862,2.000     76,3.62446,101.55362,2.000    78,3.62446,101.55863,2.000     77,3.61946,101.54361,2.000    77,3.61946,101.54862,2.000     77,3.61946,101.55362,2.000    81,3.61946,101.55863,2.000     77,3.61447,101.54361,2.000    78,3.61447,101.54862,2.000     78,3.61447,101.55362,2.000    80,3.61447,101.55863,2.000     76,3.62945,101.54361,2.000    76,3.62945,101.54862,2.000     75,3.62945,101.55362,2.000    73,3.62945,101.55863,2.000     76,3.62446,101.54361,2.000    77,3.62446,101.54862,2.000     77,3.62446,101.55362,2.000    76,3.62446,101.55863,2.000     77,3.61946,101.54361,2.000    78,3.61946,101.54862,2.000     78,3.61946,101.55362,2.000    79,3.61946,101.55863,2.000     77,3.61447,101.54361,2.000    78,3.61447,101.54862,2.000     78,3.61447,101.55362,2.000    80,3.61447,101.55863,2.000     74,3.62945,101.54361,2.000    74,3.62945,101.54862,2.000     74,3.62945,101.55362,2.000    74,3.62945,101.55863,2.000     74,3.62446,101.54361,2.000    74,3.62446,101.54862,2.000     74,3.62446,101.55362,2.000    76,3.62446,101.55863,2.000     75,3.61946,101.54361,2.000    75,3.61946,101.54862,2.000     75,3.61946,101.55362,2.000    78,3.61946,101.55863,2.000     75,3.61447,101.54361,2.000    76,3.61447,101.54862,2.000     76,3.61447,101.55362,2.000    79,3.61447,101.55863,2.000
SG1130113001247.CAPNFXT.csv.csv SG1130113002247.CAPNFXU.csv.csv     SG1130113000247.CAPNFXS.csv.csv

If you are using a Solaris/SunOS System, change awk in the script to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk .

No matter what system you're using, change the last line of the script from:

    ' *  > Result.csv

to:

    ' SG[1-3]  > Result.csv

or, if your filenames aren't really SG1 , SG2 , and SG3 as you said, but are instead named SG1130113000247.CAPNFXS.csv.csv , SG1130113001247.CAPNFXT.csv.csv, and SG1130113002247.CAPNFXU.csv.csv , respectively, use something like:

    ' SG*.csv.csv  > Result.csv

If this doesn't fix your problem, let us know what OS (including the version) and shell you're using to give us some hint as to what might be wrong.

1 Like

I'd say my code does work VERY well.

Using your sample code on your sample files, I get

awk 'BEGIN{OFS = "\t"}
  NR == FNR {a[$2 " " $3] = $1; next}
  {a[$2 " " $3] = (a[$2 " " $3] OFS $1)}
  END {for(x in a) print (x OFS a[x])}' SG1 SG2 SG3
3.61447 101.55362    78    76    75
3.61946 101.54361    77    75    72
3.62446 101.54361    76    74    72
3.61946 101.55362    78    75    74
3.62446 101.55362    77    74    72
3.61447 101.54862    78    76    75
3.62945 101.54361    76    74    71
3.61447 101.55863    80    79    79
3.62945 101.55362    75    74    71
3.61946 101.54862    78    75    73
3.62446 101.54862    77    74    72
3.61946 101.55863    79    78    78
3.62446 101.55863    76    76    76
3.62945 101.54862    76    74    71
3.62945 101.55863    73    74    74
3.61447 101.54361    77    75    74

Applying my proposal to your sample files, I get

awk 'BEGIN          {OFS = "\t"; HD=OFS OFS}
     FNR == 1       {HD=HD OFS FILENAME}
     FNR == NR      {a[$2 " " $3] = $1; next}
                    {a[$2 " " $3] = (a[$2 " " $3] OFS $1)}
     END            {print HD
                     for(x in a) print (x OFS a[x])}
    ' SG1 SG2 SG3
                     SG1   SG2   SG3
3.61447 101.55362    78    76    75
3.61946 101.54361    77    75    72
3.62446 101.54361    76    74    72
3.61946 101.55362    78    75    74
3.62446 101.55362    77    74    72
3.61447 101.54862    78    76    75
3.62945 101.54361    76    74    71
3.61447 101.55863    80    79    79
3.62945 101.55362    75    74    71
3.61946 101.54862    78    75    73
3.62446 101.54862    77    74    72
3.61946 101.55863    79    78    78
3.62446 101.55863    76    76    76
3.62945 101.54862    76    74    71
3.62945 101.55863    73    74    74
3.61447 101.54361    77    75    74

Whatever strange results you show above are from applying the code to data not adhering to your specifications.

EDIT: BTW, looks like your data files are comma delimited, which you did not mention in your spec.

1 Like

I added comma in script, and it works fine. Thanks you so much.

awk 'BEGIN          {OFS = "\t"; HD=OFS","OFS}
     FNR == 1       {HD=HD OFS FILENAME}
     FNR == NR      {a[$2 " " $3] = $1; next}
                    {a[$2 " " $3] = (a[$2 " " $3] OFS $1)}
     END            {print HD
                     for(x in a) print (x OFS a[x])}
    ' *  > Result2.csv

That change only affects the 1st line of the output. With your sample input, it changes the header from:

			SG1	SG2	SG3

to:

	,		SG1	SG2	SG3

If your input file is comma delimited rather than whitespace delimited, the way to fix it would be to change the 1st line of the script to something like:

awk 'BEGIN          {FS=","; OFS = "\t"; HD=OFS OFS}
1 Like