Merging all (48) CSV files from a directory

RissaR · June 20, 2011, 2:54pm

I have 48 csv files in my directory that all have this form:

Storm Speed (mph),43.0410781151
Storm motion direction (degrees),261.580774982
MLCAPE,2450.54098661
MLCIN,-9.85040520279
MLLCL,230
MLLFC,1070.39871
MLEL,207.194689294
MLCT,Not enough data
Sbcape,2203.97617778
Sbcin,-25.2617829901
SbLCL,320
SbLFC,1120.39871
SbEL,214.374492767
SbCT,Not enough data
0-1km SRH,458.349258678
10m-1km SRH,329.149187093
0-3km SRH,463.618101844
10m-3km SRH,334.418030258
0-6km SRH,587.783727309
10m-6km SRH,458.583655724
SFC-1Km EHI,5.04120983549
SFC-3km EHI,5.1219068113
SFC-6Km EHI,7.02361277587
0-1km bulk shear (m/s),25.6914044845
0-3km bulk shear (m/s),29.6908256645
0-6km bulk shear (m/s),21.7298524954
0-9km bulk shear (m/s),28.1866755643

The first entry in every line is the same for every file, but the numbers change for every file. I'd like to merge all 48 csv files in a single directory and I'd also like a line to be created at the top of the file that contains the names of each of the files in the order in which they are read in. So, if I merged the file above with this file:

Storm Speed (mph),44.7614503892
Storm motion direction (degrees),231.668848047
MLCAPE,2803.79127488
MLCIN,-50.712494335
MLLCL,300
MLLFC,1668.39871
MLEL,199.222951335
MLCT,15.46839871
Sbcape,3159.56809086
Sbcin,-45.0430330816
SbLCL,240
SbLFC,1443.39871
SbEL,196.149400393
SbCT,15.84339871
0-1km SRH,471.772033724
10m-1km SRH,295.907072991
0-3km SRH,581.250777417
10m-3km SRH,405.385816683
0-6km SRH,530.668022483
10m-6km SRH,354.803061749
SFC-1Km EHI,5.18538543392
SFC-3km EHI,7.1038575986
SFC-6Km EHI,6.21746080521
0-1km bulk shear (m/s),18.5250715019
0-3km bulk shear (m/s),24.2810028978
0-6km bulk shear (m/s),20.0096587186
0-9km bulk shear (m/s),30.6511667163

I'd like it to create this file:

N02022007values.csv,N03122006values.csv
Storm Speed (mph),43.0410781151,44.7614503892
Storm motion direction (degrees),261.580774982,231.668848047
MLCAPE,2450.54098661,2803.79127488
MLCIN,-9.85040520279,-50.712494335
MLLCL,230,300
MLLFC,1070.39871,1668.39871
MLEL,207.194689294,199.222951335
MLCT,Not enough data,15.46839871
Sbcape,2203.97617778,3159.56809086
Sbcin,-25.2617829901,-45.0430330816
SbLCL,320,240
SbLFC,1120.39871,1443.39871
SbEL,214.374492767,196.149400393
SbCT,Not enough data,15.84339871
0-1km SRH,458.349258678,471.772033724
10m-1km SRH,329.149187093,295.907072991
0-3km SRH,463.618101844,581.250777417
10m-3km SRH,334.418030258,405.385816683
0-6km SRH,587.783727309,530.668022483
10m-6km SRH,458.583655724,354.803061749
SFC-1Km EHI,5.04120983549,5.18538543392
SFC-3km EHI,5.1219068113,7.1038575986
SFC-6Km EHI,7.02361277587,6.21746080521
0-1km bulk shear (m/s),25.6914044845,18.5250715019
0-3km bulk shear (m/s),29.6908256645,24.2810028978
0-6km bulk shear (m/s),21.7298524954,20.0096587186
0-9km bulk shear (m/s),28.1866755643,30.6511667163

Thanks for any help in advance!

bartus11 · June 20, 2011, 3:18pm

awk -F, 'NR==1{n=FILENAME}NR==FNR{a[NR]=$0;next}FNR==1{n=n","FILENAME}{a[FNR]=a[FNR]","$2}END{print n;for (i=1;i<=FNR;i++){print a}}' *.csv > out.csv

Corona688 · June 20, 2011, 3:20pm

Not quite sure what you want with the file headers... Do you want all but the last ignored? All but the first? None of them ignored?

This can be adapted for any of those:

#!/bin/sh
cd /path/to/csvdir

(
        # set $1 = first file, $2 = second, etc.
        set -- *.csv

        # echo $1 $2 $3 ... $N changing space to comma with tr
        echo $* | tr ' ' ','

        # Print the first two lines (title) of first file
        head -n 2 $1

        # Loop through all files in $1, $2, ...
        for FILE
        do
                # Print out every file, ignoring first two lines
                tail "$FILE" -n +3
        done
# Dump the whole mess into the output file.  Don't put it in the
# same directory as the input files!
) > /path/to/output.csv

alister · June 20, 2011, 3:45pm

I believe the OP is interested in merging like lines in files in a manner congruent to a paste(1) or join(1), and not cat(1). My understanding is that none of the lines in any file need to be skipped and that the first comma-delimited field is a key (although since all files use the same order, a solution may not need to inspect it) which needs to be removed from all but the first file's line to avoid redundant fields in each line in the final output.

Regards,
Alister

---------- Post updated at 03:45 PM ---------- Previous update was at 03:38 PM ----------

Perhaps something along these lines:

paste *.csv | sed 's/<tab>[^,]*//g'

Where <tab> represents a literal tab character and I have assumed that there are no tabs in your source files. If there are, you can choose a suitable delimiter by informing paste with the -d option and using it instead of a tab in the sed command.

Regards,
Alister