Create csv from four disparate files

Hey everyone,
So newbie when it comes to scripting and lurked on these pages for some time as I slowly learn. I'd like to thank everyone for taking the time to share their knowledge, and would like to think I've picked up a something from these forums.
What I'm trying to do, is create a csv from four disparate files containing data.

Here's my current script. Its meant to have multiple files from multiple hosts, hence the loop.
I am limited to the tool set on AIX, and cannot use any GNU tools.

#!/bin/ksh
set -A allHosts $(ls -1 *.out | awk -F\. '{print $1}' | sort -u | uniq)
echo "host,accounts,max,pwd,standard"
for h in ${allHosts[@]}; do
        col1=$(cat ./${h}.account.out)
        col2=$(cat ./${h}.max.out)
        col3=$(cat ./${h}.pwd.out)
        col4=$(cat ./${h}.standard.out |  sed -e 's/^[ \t]*//')
        echo "${h}","${col1}","${col2}","${col3}","${col4}"
        unset col1
        unset names
        unset col2
        unset col3
        unset col4
done

File are setup like:
host1.max.out

9

host1.account.out

root
user1
user2

host1.pwd.out

abc pwd=

host1.standard.out

standard:
        user = true
        expires = 0
        core_path = on
        default_roles =

Results I'm current receiving:

host,accounts,max,pwd,standard
host1,root
user1
user2,9,abc pwd=,standard:
user = true
expires = 0
core_path = on
default_roles =
 

Desired output:

host,account,max,pwd,standard
host1,root,9,abc pwd=,standard:
,user1,,,user = true
,user2,,,expires = 0
,,,,core_path = on
 ,,,,default_roles =

Thank you for your time and my apologies if this has been asked before, but my searching abilities has failed me.

Try the below codes and integrate into your script.

#!/bin/sh
#By default array string tokenizer considered as space, But we need entire line. 
# So whitespace replaced as _SPACE_
F1=( `cat host1.account.out | sed 's/ /_SPACE_/g'  ` )
F2=( `cat host1.max.out | sed  's/ /_SPACE_/g'  ` )
F3=( `cat host1.pwd.out  | sed  's/ /_SPACE_/g'  ` )
F4=( `cat host1.standard.out | sed  's/ /_SPACE_/g'  ` )
h='host1'
i=0
# get maximum number of lines from the above files and iterate here
while [ $i -lt ${#F4[@]} ]
do
        echo "$h",${F1[$i]},${F2[$i]},${F3[$i]},${F4[$i]} |  tr -d '\r' | sed -E 's/(_SPACE_)+/ /g'
        i=`expr $i + 1`
done
1 Like

How about

IFS=. read H REST <<EOF
$(echo *.out)
EOF
echo "host,accounts,max,pwd,standard"; echo $H | paste -d, - *.out
host,accounts,max,pwd,standard
host1,root,9,abc pwd=,standard:
,user1,,,        user = true
,user2,,,        expires = 0
,,,,        core_path = on
,,,,        default_roles =

Pipe result through sed 's/, \+/,/' to get rid of the residual spaces from *standard*.out.

1 Like

OH, et max lines and iterate! I like! YES!!!!

I seem to be having a hard time getting this to process multiple files from multiple host. If there's only host1..out files its fine, as soon as host2..out files exist it gets all funky.

Example of there's host2.*.out files:

host,accounts,max,pwd,standard
host1,root,9,abc pwd=,standard:,root,9,abc pwd=,standard:
,user1,,,        user = true,user1,,,        user = true
,user2,,,       expires = 0,user2,,,    expires = 0
,,,,    core_path = on,,,,      core_path = on
,,,,    default_roles =,,,,     default_roles =

Thank you both!

Combining your original script with k_manimuthu's solution you might end up with something like this:

#!/bin/ksh
set -A allHosts $(ls -1 *.out | awk -F\. '{print $1}' | sort -u | uniq)
echo "host,accounts,max,pwd,standard"
for h in ${allHosts[@]}; do
    OLDIFS="$IFS"
    IFS="
"   # Actual newline in quotes
    set -A F1 `cat ${h}.account.out`
    set -A F2 `cat ${h}.max.out`
    set -A F3 `cat ${h}.pwd.out`
    set -A F4 `sed -e 's/^ *//' -e 's/  */ /' ${h}.standard.out`
    IFS="$OLDIFS"
    i=0
    # get maximum number of lines from the above files and iterate here
    while [ $i -lt ${#F4[@]} ]
    do
        echo "$h",${F1[$i]},${F2[$i]},${F3[$i]},${F4[$i]}
        let i=i+1
        h="" # Blank host for 2nd and subsequent lines
    done
done
1 Like

OK, multiple hostn files. Try

for FN in *.out
  do    H=${FN%%.*}
        if [ ! "$H" = "$OH" ]
          then  OH=$H
                {
                echo "host,accounts,max,pwd,standard"
                echo $H | paste -d, - $H*.out | sed 's/, \+/,/'
                } > $H.result
        fi
  done
---------- host1.result: ----------

host,accounts,max,pwd,standard
host1,root,9,abc pwd=,standard:
,user1,,,user = true
,user2,,,expires = 0
,,,,core_path = on
,,,,default_roles =

---------- host2.result: ----------

host,accounts,max,pwd,standard
host2,root,9,abc pwd=,standard:
,user1,,,user = true
,user2,,,expires = 0
,,,,core_path = on
,,,,default_roles =
1 Like

Chubler_XL,
Thank you for taking the time to reply. Think I may have kludged something together like this? Eh, I have a lot of temp files floating about at the moment! Ok, way too many temp files floating about as I tried out diff trains of thought. But not as neat as your solution.

RudiC,
Could you please break it down for me what is going on here:

for FN in *.out
  do    H=${FN%%.*}
        if [ ! "$H" = "$OH" ]
          then  OH=$H

I sorta understand it, Assign H to FN and strip off everything until the .out ?
And I'm sorry, my mind is failing at the if statement there. Trying to pick up what I can :slight_smile:
Thank you.

for FN in *.out                                 # cycle through ALL *.out files (you could narrow it down somewhat by using *account.out)
  do    H=${FN%%.*}                             # use "parameter expansion" to remove everything beyond first
                                                #   dot (= beyond "host_n") and assign to H
        if [ ! "$H" = "$OH" ]                   # check if new host_n (different from old H)
          then  OH=$H                           # save new $H into old H, and
                {
                echo "host,accounts,max,pwd,standard"                   # print header
                echo $H | paste -d, - $H*.out | sed 's/, \+/,/'         # create csv fields / lines 

                } > $H.result                   # save all output to result file
        fi
  done
1 Like

RudiC,
Thank you.
My head was failing at using a new variable within the if statement.
I've a lot learn when it comes to this stuff.

Most of the time I smash on something, poke about this and other sites until I come up with something that works.