Merge files and generate a resume in two files

jiam912 · December 2, 2013, 4:11am

Dear Gents,

Please I need your help... I need small script to do the following.

I have a thousand of files in a folder produced daily.

I need first to merge all files called. txt (0009.txt, 0010.txt, 0011.txt) and and to output a resume of all information on 2 separate files in csv format (I wrote .txt only to be able to upload the files), at the output should be to files RP.csv and SP.csv , as I show in the attached files..

Thanks always for your support.

rbatte1 · December 2, 2013, 6:41am

So, what have you tried so far?
What language or tools are you most comfortable with?
What shell are you using?
What is your operating system and version?
What logic rules do you have for merging the files are generating the reports?

Most importantly, what have you tried so far?

Regards,
Robin

jiam912 · December 2, 2013, 7:52am

Hi Robin

Thanks for your answer.

I using linux red hat 5,, shell csh

I like to use awk,

Just I start to try to di it mergin the files
using :

awk 'FNR==1{print ""}1' *.txt > tmp1

and then

I try to do

cat tmp1 | awk -F, '{print $0}' | sed '1,17d' | sed 's/_/,/g' | sed 's/,/,/g > tmp2

That is all I did.

RudiC · December 2, 2013, 12:05pm

Hmmm, not sure I understand your attempts. Looking at your files, as a starting point try

awk -F, 'FNR>17 {print >$1"P.csv"}' 0*.txt

You may want to add gsub (/_/, ",", $2) before printing.

jiam912 · December 2, 2013, 3:48pm

Hello RudiC

I am using

 awk -F, 'FNR>17  gsub (/_/, ",", $0); {print >$1"P.csv"}' 0*.txt

It create for many files like
AP.csv, CP.csv, HP.csv, YP.csv, ZP.csv and I need in the output only The RP.csv and SP.csv. How we can eliminate at the begining all the rows which start with letters A,C,Y and Z, to avoid to get these output files.

And also in the row 18 for each file there is a value that I need to recovery and to put at the end of each row.. example.

S,1915_7690_1,2013,30,18,58,37,0.000712977,0.000374325,0.93,0.000199737,0.000430314

in the output file only where the first caracter is R, we need to do:

R,1996_7571_1_2_1,2170.12,1.5,5,0.000235452,9.30413e-05,2268,6.9,12.8,20.8,1915,7690

for each line starting with R we need to append these values.. It is a reference becasue all the rows correspond to the same code.

Example

S,1915_7690_1,2013,30,18,58,37,0.000712977,0.000374325,0.93,0.000199737,0.000430314
R,1996_7571_1_2_1,2170.12,1.5,5,0.000235452,9.30413e-05,2268,6.9,12.8,20.8

In the output R file need to be included for each row.

R,1996_7571_1_2_1,2170.12,1.5,5,0.000235452,9.30413e-05,2268,6.9,12.8,20.8,1915,7690
R,1996_7572_1_2_1,2163.76,1.59,5,0.000250454,0.000103684,2260,4.9,12.8,42.5,1915,7690
R,1996_7573_1_2_1,2233.66,0.49,5,0.000250275,8.56004e-05,2252,4.9,13.8,21.7,1915,7690

Thanks a lot for your help:b:

I mean for each file we need to check the value in the row 18 and to append in all rows starting with R these values, For each file will be a diferent value..

RudiC · December 3, 2013, 7:13am

You lost me. Which value from line 18 (FNR==18) do you want to save and append to the respective R- lines of that file?
To eliminate the A, Y, Z line try sth like

$1~/[RS]/

jiam912 · December 3, 2013, 7:47am

Dear RudiC
Sorry to confuse you , What I need is only to get at the end of each R line are:
Example
line 18

S,1915_7690_1,2013,30,18,58,37,0.000712977,0.000374325,0.93,0.000199737,0.000430314

line 19

R,1996_7571_1_2_1,2170.12,1.5,5,0.000235452,9.30413e-05,2268,6.9,12.8,20.8

I need to take these values from line 18

1915_7690

to append all lines with R

R,1996_7571_1_2_1,2170.12,1.5,5,0.000235452,9.30413e-05,2268,6.9,12.8,20.8,1915,7690

For each file there is differente value in line 18, need to be append in lines with R

RudiC · December 3, 2013, 8:03am

Well, then try

awk -F, 'FNR<18 || $1 !~ /[RS]/ {next}
                                {gsub (/_/,",")}
         FNR==18                {A=$2;B=$3}
                                {$(NF+1)=A
                                 $(NF+1)=B
                                 print > $1"P.csv"}
        ' OFS="," file

jiam912 · December 3, 2013, 8:52am

Dear RudiC,

Thanks a lot for your help.

Please only the last thing, how we can append also the name of the file at the last of each line for both files RP.csv and SP.csv.

Example

R,1996_7566_1_2_1,2168.85,1.09,5,0.000323826,0.000241978,692,7.9,12.8,16.8,1915,7685,9

S,1915_7685_1,2013,30,18,54,34,0.000909391,0.000410113,0.93,0.000262673,0.000494078,9

where the last column will be the name of file without ceros and without extention..only the numbers.

Thanks again for your help

RudiC · December 3, 2013, 9:10am

I leave this as an exercise to you. The FILENAME variable has each respective file's name; you could sub/gsub the extension and/or the zeroes, and append the result like the other fields.

jiam912 · December 3, 2013, 10:23am

Thanks I will try..

---------- Post updated at 09:48 AM ---------- Previous update was at 09:25 AM ----------

Dear RudiC,

I have changed

*.txt

instead of

 file

in the script and I got error

No such file or directory?..

---------- Post updated at 10:23 AM ---------- Previous update was at 09:48 AM ----------

Dear RudiC.
there was a problem with my file... apologies it works fine now

RudiC · December 3, 2013, 10:57am

If you're happy with your code, why don't you post it as the solution to your first request?

jiam912 · December 3, 2013, 1:30pm

Dear RudiC

Here the last modification that I did to the script it works fine

#!/bin/bash 

rm -f *.csv*

awk -F, 'FNR<18 || $1!~/[RS]/ {next}
                                {gsub (/_/,",")}
         FNR==18                {A=$2;B=$3}
                                {C=FILENAME}
                                {$(NF+1)=A
                                 $(NF+1)=B
                             $(NF+1)=C
                 {gsub (/.adsta/,"")}
                                 print > $1"P.csv"} 
    ' OFS="," *.adsta

Thanks a lot for your support

RudiC · December 4, 2013, 12:30pm

Thanks for posting your solution. May I propose a few minor modifications?

awk -F, 'FNR<18 || $1!~/[RS]/ {next}
                                {gsub (/_/,",")}
         FNR==18                {A=$2;B=$3}
                                {C=FILENAME}             # Needless to do this for every line
                                {$(NF+1)=A
                                 $(NF+1)=B
                             $(NF+1)=C
                 {gsub (/.adsta/,"")}                    # Needless to do this for every line; curly braces also pointless
                                 print > $1"P.csv"} 
    ' OFS="," *.adsta

So it might look like

awk -F, 'FNR<18 || $1!~/[RS]/   {next}
                                {gsub (/_/,",")}
         FNR==18                {A=$2;B=$3 
                                 C=FILENAME 
                                 gsub (/.adsta/,"",C)}       # up to here done once for line no. 18
                                {$(NF+1)=A
                                 $(NF+1)=B
                                 $(NF+1)=C
                                 print > $1"P.csv"}
    ' OFS="," *.adsta

jiam912 · December 4, 2013, 5:03pm

Dear RudiC

Thanks a lot for your support..