Awk multiple lines with 4th column on to a single line

Vasan · September 30, 2011, 4:27pm

This is related to one of my previous post.. I have huge file currently I am using loop to read file and checking each line to build this single record, its taking much much time to parse those records.. I thought there should be a way to do this in awk or sed.

I found this code in this forum and I think it's closed to my request.

nawk 'BEGIN {FS="|"}END{for(r in _)print r FS _[r]}{idx=$1 FS $2;_[idx]=_[idx]?_[idx] FS $3:$3}' myFile

I changed based on my request but I can't get this worked.. Can any one Please help me on this. Much much appreciated.

Input file:

XXXXXXXXXX1|07/24/2007|1|aaaaaaaaaaabbbbbbbbccccccccccccc
XXXXXXXXXX1|07/24/2007|2|sometxt
XXXXXXXXXX1|07/30/2007|1|some_random_text
XXXXXXXXXX1|07/30/2007|2|new_random.
XXXXXXXXXX1|09/27/2007|1|some_nre_random_test
XXXXXXXXXX1|09/27/2007|2|blabla
XXXXXXXXXX1|09/27/2007|3|fixed_text_random
XXXXXXXXXX1|11/14/2007|1|blabla
XXXXXXXXXX1|11/28/2007|1|junk_text
XXXXXXXXXX2|12/21/2007|1|Notes

I am looking for the out put something like

Out:

XXXXXXXXXX1|07/24/2007|aaaaaaaaaaabbbbbbbbccccccccccccc|sometxt
XXXXXXXXXX1|07/30/2007|some_random_text|new_random.
XXXXXXXXXX1|09/27/2007|some_nre_random_test|blabla|fixed_text_random
XXXXXXXXXX1|11/14/2007|blabla
XXXXXXXXXX1|11/28/2007|junk_text
XXXXXXXXXX2|12/21/2007|Notes

ieth0 · September 30, 2011, 5:15pm

cat FILENAME|awk -F"|" '{print $1,$2,$4}' |tr " " "\|"

Vasan · September 30, 2011, 5:39pm

Thanks.

But I am not getting the right output..

Here is what I got

$ cat FILENAME|awk -F"|" '{print $1,$2,$4}' |tr " " "\|"
XXXXXXXXXX1|07/24/2007|aaaaaaaaaaabbbbbbbbccccccccccccc
XXXXXXXXXX1|07/24/2007|sometxt
XXXXXXXXXX1|07/30/2007|some_random_text
XXXXXXXXXX1|07/30/2007|new_random.
XXXXXXXXXX1|09/27/2007|some_nre_random_test
XXXXXXXXXX1|09/27/2007|blabla
XXXXXXXXXX1|09/27/2007|fixed_text_random
XXXXXXXXXX1|11/14/2007|blabla
XXXXXXXXXX1|11/28/2007|junk_text
XXXXXXXXXX2|12/21/2007|Notes

Am I missing something..

Thanks

ieth0 · September 30, 2011, 6:05pm

no i just misunderstood ,
so you need to merge all records in each day into 1 line separated by pipe line,.. may be i could figure it out in one line, i need time.

---------- Post updated at 05:05 PM ---------- Previous update was at 05:02 PM ----------

try this one:

awk 'BEGIN {FS="|"}END{for(r in _)print r FS _[r]}{idx=$1 FS $2;_[idx]=_[idx]?_[idx] FS $4:$4}' FILENAME |sort -k2

Vasan · September 30, 2011, 7:11pm

Thanks.

I got the results it looks like grouped by Date, But I would like to have the following output.

Sample Input

XXXXXXXXXX1|07/24/2007|1|aaaaaaaaaaabbbbbbbbccccccccccccc
XXXXXXXXXX1|07/26/2007|2|sometxt
XXXXXXXXXX1|07/30/2007|1|some_random_text
XXXXXXXXXX1|08/31/2007|2|new_random.
XXXXXXXXXX1|09/27/2007|3|some_nre_random_test

Required output..

First record would be:

XXXXXXXXXX1|07/24/2007|aaaaaaaaaaabbbbbbbbccccccccccccc|sometxt

or

XXXXXXXXXX1|07/26/2007|aaaaaaaaaaabbbbbbbbccccccccccccc|sometxt

The second record would be

XXXXXXXXXX1|07/30/2007|some_random_text|new_random.|some_nre_random_test

or

XXXXXXXXXX1|08/31/2007|some_random_text|new_random.|some_nre_random_test

or

XXXXXXXXXX1|09/27/2007|some_random_text|new_random.|some_nre_random_test

In the out put the date can be any date

Eg: from the second record from the possible date 07/30 , 08/31 and 09/27. We ca have any one date, but the sequence (next field) is what I am looking for

Apreaciate your Help..

Thanks Again

alister · September 30, 2011, 8:47pm

awk -F\| 'd==$2 {printf("%s", FS$4); next} {d=$2; printf("%s", o$1FS$2FS$4)} NR==1 {o=ORS} END {print}' file

d = the date in the previous line's second field, $2
o = for the first line, it's empty. for all subsequent lines it's set to the output record separator.

If the current line's date matches the previous', just print the field separator, |, followed by the value of the fourth field.

Otherwise, the current line's date is different, set d to store the new date, print the current line's fields of interest. If it is not the first line printed, print the output record separator before the current line, to terminate the previous record.

When done, print out one last record separator to cap the output.

Regards,
Alister

Vasan · October 1, 2011, 11:04pm

Thanks Alister

It works great.

Thanks Again.

---------- Post updated at 06:14 PM ---------- Previous update was at 08:09 AM ----------

Hi Alister,

Small clarification

When we append it to previous line, I do not want to append field separator"|" to be append - just for the new .

in other way, I do need field separator for col1 and col2 but not for the rest of the column.

Is it possible?

I am looking for some thing like,

XXXXXXXXXX1|07/24/2007|aaaaaaaaaaabbbbbbbbccccccccccccc sometxt
XXXXXXXXXX1|07/30/2007|some_random_text new_random.some_nre_random_test

Thanks for your help.

---------- Post updated at 10:04 PM ---------- Previous update was at 06:14 PM ----------

Able to figured it out.

Thanks.

durden_tyler · October 2, 2011, 1:14am

Using Perl:

$
$
$ cat f10
XXXXXXXXXX1|07/24/2007|1|aaaaaaaaaaabbbbbbbbccccccccccccc
XXXXXXXXXX1|07/26/2007|2|sometxt
XXXXXXXXXX1|07/30/2007|1|some_random_text
XXXXXXXXXX1|08/31/2007|2|new_random.
XXXXXXXXXX1|09/27/2007|3|some_nre_random_test
$
$
$ perl -F"\|" -lane 'if ($F[2] == 1){print $x; $x="$F[0]|$F[1]|$F[3]"} else {$x.=$F[3]} END {print $x}' f10

XXXXXXXXXX1|07/24/2007|aaaaaaaaaaabbbbbbbbcccccccccccccsometxt
XXXXXXXXXX1|07/30/2007|some_random_textnew_random.some_nre_random_test
$
$
$

tyler_durden