Filtering Issues Using sed and awk

Hi,

I am currently using the sed and awk commands to filter a file that has multiple sets of data in different columns. An example of part of the file i am filtering is as follows;

Sat Oct  2 07:42:45 2010    01:33:46 R1_CAR_12.34
Sun Oct  3 13:09:53 2010    00:02:34 R2_BUS_56.78
Sun Oct  3 21:11:39 2010    00:43:21 R3_TRAIN_COACH_90.12
Mon Oct  4 06:07:10 2010    00:01:50 R4_TRAIN_CARRAIGE_34.56X

when i filter the file i get the following result;

Sat,Oct,2,2010,01:33:46,CAR,
Sun,Oct,3,2010,00:02:34,BUS,
Sun,Oct,3,2010,00:43:21,TRAIN,
Mon,Oct,4,2010,00:01:50,TRAIN,X

The sed and awk commands i am using are as follows;

sed 's/[^ \t][^ \t]*[ \t]//4;s/[^ \t_]*_//;s/_.*\(.\)$/ \1/;s/[^X]$//' |  awk '{print $1","$2","$3","$4","$5","$
6","$7}' 

I am trying to figure out how to filter the data so that, for example, instead of getting;

Sat,Oct,2,2010,01:33:46,CAR,
Sun,Oct,3,2010,00:02:34,BUS,
Sun,Oct,3,2010,00:43:21,TRAIN,
Mon,Oct,4,2010,00:01:50,TRAIN,X

i would like to get;

Sat,Oct,2,2010,01:33:46,CAR,
Sun,Oct,3,2010,00:02:34,BUS,
Sun,Oct,3,2010,00:43:21,COACH,
Mon,Oct,4,2010,00:01:50,CARRAIGE,X

Could i use the sed command twice so that i would get;

Sat Oct  2 07:42:45 2010    01:33:46 CAR
Sun Oct  3 13:09:53 2010    00:02:34 BUS
Sun Oct  3 21:11:39 2010    00:43:21 TRAIN_COACH
Mon Oct  4 06:07:10 2010    00:01:50 TRAIN_CARRAIGE X

first and then use the sed command to remove the "TRAIN_" part to get;

Sat Oct  2 07:42:45 2010    01:33:46 CAR
Sun Oct  3 13:09:53 2010    00:02:34 BUS
Sun Oct  3 21:11:39 2010    00:43:21 COACH
Mon Oct  4 06:07:10 2010    00:01:50 CARRIAGE X

This is only a suggestion but a much better method could probably be used.

Unfotunately i am new to unix so i am only just getting used to all the commands

If i have made anything unclear please let me know and i will try to explain the problem better.

Any help would be greatly appreciated

Thanks in advance

nawk '{n=split($NF,a,"[_.]");print $1,$2,$3,$5,$6,a[n-2],(/[A-Za-z]$/)?substr($0,length):""}' OFS=, myFile

Hi vgersh,

The nawk command is working perfectly. Is there any way to add a comma as a delimiter between the different sets of data i.e. instead of

SatOct2201000:30:21CAR
SatOct2201000:30:24BUS
SatOct2201000:33:14COACH
SatOct2201000:41:51CARRAIGEX

that i am getting i would be able to get

Sat,Oct,2,2010,00:30:21,CAR,
Sat,Oct,2,2010,00:30:24,BUS,
Sat,Oct,2,2010,00:33:14,COACH,
Sat,Oct,2,2010,00:41:51,CARRAIGE,X

instead?

Thanks in advance

based on the sample file you provided, this is the output I get:

Sat,Oct,2,2010,01:33:46,CAR,
Sun,Oct,3,2010,00:02:34,BUS,
Sun,Oct,3,2010,00:43:21,COACH,
Mon,Oct,4,2010,00:01:50,CARRAIGE,X

don't forget the OFS=, in the code I've posted!

Hi,

A problem has cropped up. Whenever the program tries to filter the following line;

Mon Oct 11 15:07:16 2010    00:01:30 R3_TRAIN_COACH_12.1.2X

I get the following output;

Mon,Oct,11,2010,00:01:30,12,X

Is there any way to alter the code so that

 COACH 

is filtered instead of the

 12 

digit?

The command that i am using is as follows;

nawk '{n=split($NF,a,"[_.]");print $1,$2,$3,$5,$6,a[n-2],(/[A-Za-z]$/)?substr($0,length):""}' OFS=, $FileName

Any help would be greatly appreciated

Thanks in advance