awk script to split a file based on the condition

superprogrammer · June 13, 2005, 8:50am

I have the file with the records like

4234234 US phone
3244234 US cup
2342342 CA phone
8947234 US phone
2389472 CA cup
2348972 US maps
3894234 CA phone

I want the records with (US,phone) as record to be in one file, (Us, cup) in another file and (CA,cup) to be in another
I mean all records with the last two records forming unique pair in the one file itself
Is it possible in awk?

vino · June 13, 2005, 9:01am

How about this ?

awk '{ print $0 >> $2$3.txt }' input.txt

where all US cup combination will go into the file UScup.txt, likewise with US phone into USphone.txt et al.

Vino

superprogrammer · June 14, 2005, 12:56am

Thanks vino, it worked
I have one more question
If the file is like this
WSRTK10000000000000067839904809787489959595924667889USMNC
WSRTK10000893479900006783990480978748995959592466673CNATT
WSRTK10000893472387462342349899000067839904809787455USAPT
I know that the last 5 characters in each line is my search pattern and my problem remains the same
I want lines containing patterns like USMNC to go to US_MNC etc
Can I extract last few characters of each line in awk?

vino · June 14, 2005, 1:04am

Ah. The plot thickens !

Is it always the last 5 characters ?

Vino

superprogrammer · June 14, 2005, 1:13am

Yes, and each line has equal number of bits, say the pattern starts always after 300 bits in each line and pattern is of 5 bits only

vino · June 14, 2005, 1:24am

How about this ?

sed -e 's/\(.*\)\([A-Z][A-Z]\)\([A-Z][A-Z][A-Z]\)/\1 \2 \3/p' list.txt | awk '{ printf $0 >> $2_$3.txt }'

In this case, your output file will contain each line delimited by spaces between the first long stretch of characters then US and then the last 3 characters.

Not 100% right, but close. Have to figure out why..

Vino

vino · June 14, 2005, 1:53am

Here this works fine.

#! /bin/sh

while read line
do
name=`echo $line | sed -n -e 's/\(.*\)\([A-Z][A-Z]\)\([A-Z][A-Z][A-Z]\)/\2_\3/p'`
echo "$line" >> $name.txt
done < list.txt

Where $name will have the value like US_APT et al.. and list.txt is your input file.

Vino

Ygor · June 14, 2005, 2:29am

awk '{print > substr($0,length-4,2) "_" substr($0,length-2) ".txt"}' infile

vino · June 14, 2005, 2:34am

Shouldn't it be
awk '{print $0 >> substr($0,length-4,2) "_" substr($0,length-2) ".txt"}' infile

Vino

Ygor · June 14, 2005, 2:41am

No it shouldn't.

vino · June 14, 2005, 2:44am

Is it that, by default print outputs $0.

And I dont understand by you dont need to append to existing files. Else wouldnt already existing entries be over written ? What am I missing here ?

vino

Ygor · June 14, 2005, 3:06am

See Printing.

maykap100 · June 14, 2005, 3:59am

DATA=`cat input.txt`
for data in $DATA
do
`echo $data > temp.txt`
length=`wc -c temp.txt | tr -s " " | cut -d " " -f 2`
prvlength=`expr $length - 5`
file_name=`cut -c $prvlength-$length temp.txt`
`echo $data >> $file_name.txt`
done