AWK delimiting

barrro · September 4, 2002, 1:51pm

I have a directory of files DATA1,DATA2,DATA3, etc... each file has 1 column of data

Example:

File DATA1

name
date
time
time requested
approved

I need to change these to CSV files so I can import the data. I believe AWK is what i need but I'm new to AWK and can't seem to get the syntex correct.

awk -F , '{ print $1 }' DATA1 > DATA.TEST

can someone help me ???

Kelam_Magnus · September 4, 2002, 2:15pm

I used to have this problem when transferring client lists to Foxpro or Excel.

You are going to have to run a for loop or some kind of loop. Do all of your data files have the same number of fields? I keep thinking of the "join" function of vi. I know you can join X lines with the command in command mode "7 J". This will join 7 lines together, space delimited which should be fine for your purposes or you can then use AWK to fill in commas.

OR try this.

Forgive me if the logic is wrong this was off the top of my head.
The ; is for a new record. You can then use AWK to break it out 1 record per line from there. I think there is a way to do a CRLF but I don't remember right now.

i=1
for name in `cat file`
do
if [$i -le 8 ]
awk '{ print $1 "," }' >> file.out
i=i+1
else
awk '{ print ";" }' >> file.out
i=1
done

I will have to research this, but you probably can figure it out from here.

////EDIT HERE:

That first attempt doesn' t seem to work. Here is something else I have tried.

#!/bin/ksh

for name in `cat todd`
do
for i in `1 2 3 4 5 6 7 8`
do
awk '{ print $name "," }' >> file.out
if [$i -eq 8]
then
awk '{ print ";" }' >> file.out
fi
done
done

Although I can't get it to work either... Heheh

Optimus_P · September 4, 2002, 2:34pm

If your files only have 1 set of name/date/time/time requested/approved entries then this will work.

If you have multiple lines i would put a counter in teh foreach loop to restart every 6th line.

#! /usr/bin/perl -w

$TESTFILE="test.file";
$OUTPUT="new.file";
open (INPUT, $TESTFILE) || die "NO GO ($!)";
open (OUTPUT, ">${OUTPUT}") || die "NO GO ($!)";

foreach (<INPUT>) { chomp, push(@contents,$_); };

$_ = join(",",@contents);
print OUTPUT $_;;

[hostname]optimus_P$cat new.file
name,date,time,time requested,approved
[hostname]optimus_P$cat test.file
name
date
time
time requested
approved

peter.herlihy · September 5, 2002, 2:04am

Maybe a bit more clarity on what you're trying to do here....I took it differently to the others....

first file

name
date
time
time requested
approved

second file

john smith
20020808
12:00PM
12:01PM
Yes

Third File

comment for line 1
comment for line 2
comment for line 3
comment for line 4
comment for line 5

Then paste would work and you can specify your delimiter.
paste -s -d"," file1 file2 file3

Is this what you are after. Careful with awk...as there are spaces in your file and these will be treated as delimiters.

system · September 6, 2002, 12:13pm

Yes, need more clarity, but I think each file is an independent set of data, rather than needing a paste job, one line from each. On that assumption, the following code will create one comma-delimited line for each 5 lines of input. Each file is assumed to have a multiple of 5 lines. In the event that a file is incorrect and does not contain a multiple of 5 lines, output lines could be shifted for that file, but the FNR check will ensure alignment at start of next file. I assume an entire line represents one data column, such as "john smith" which includes a space.

awk '{\
if (FNR==1 && linecount!=0)
  {print ""
   linecount=0}
linecount++
if (linecount<5)
   printf "%s,",$0
else
  {print $0
   linecount=0}
}' DATA?

spieterman · September 6, 2002, 11:09pm

This worked for me in a similar situation as you described...

It works in two parts...

First the join command to create the delimited file from the two separate files. (The -j1, and the ~/file1 correlate to one another, and the -j2 and the ~/file2 correlate to one another):

join -a1 -t"|" -j1 1 -j2 1 -o 1.1 -o 1.2 /directory/path/here/file1 /directory/path/here/file2 > joinedfile.name

This join compares the first field of each file and joins them together if a match if found...Kind of like a query in a database matching keys. The -o specifies which fields are kept and their order in the final output file. So 1.1 would mean keep field one in file one, and 2.1 would mean keep field one in file two...

You can join multiple files this way by simply adding another line of code and specifying that the output concantenate (>>) to the end of the file you're building.

join -a1 -t"|" -j1 1 -j2 1 -o 1.1 -o 2.1 1.2 /directory/path/here/file3 /directory/path/here/file2 >> joinedfile.name

then the awk command. The delimiter is set using the switch \|:

awk -F"|" '{printf "%s\|%s\|\n",$1,$2}' joinedfile.name > finalfile.name

Let me know if this makes sense or if it even serves your purpose.