awk parsing file to create a database

Hi Guys,

I have a list a hotels stored in many different text files.

This list is kept in the following format:

20/03
Hotel:
The Bear Hotel
Honey Street 
Woodstock
UK
Tel:+44-xxxxxx
Rate: 100

21/03
Hotel:
The Bush Hotel
Nice Street
Farnham
UK
Tel:+44-xxxxxx
Rate: 90

22/03
Hotel:
The Bear Hotel
Honey Street 
Woodstock
UK
Tel:+44-xxxxxx
Rate: 100

I would like to make a script (using awk) that parses all files containing this kind of data and that will produce an output file containing all hotels only once and sorted by towns.

Many thanks for your help and keep up the good work.
Cheers,
Fred

A Quick way using a semicolon as record separator only works if the are no semicolons in the text, otherwise use a character that does not occur in the text..
The lines between the input records need to be completely empty, there can be no spaces. Try:

awk '{$1=$1}1' FS='\n' OFS=\; RS= infile | sort -t\; -k3,3 -u | awk '{$1=$1}1' ORS='\n\n' FS=\; OFS='\n'

if your sort combines -u with the -k option

--
Otherwise:

awk '{$1=$1}1' FS='\n' OFS=\; RS= infile | sort -t\; -k3,3 | awk '!A[$3=$3]++' ORS='\n\n' FS=\; OFS='\n'

Are hotels with same name in two or more towns possible? Try

sort -t\; -k3,3 -k5,5 -u 

, then.

1 Like

Good point, or likewise with the second approach:

.... | sort -t\; -k3,3 -k5,5 | awk '!A[$3=$3, $5]++' ORS='\n\n' FS=\; OFS='\n'
perl -00 -alnF'\n' -e '@{$h{"$F[2]$F[4]"}}=@F; END{for(sort {$h{$a}->[4] cmp $h{$b}->[4]} keys %h){print join "\n", @{$h{$_}}}}' freddie50.hotels
21/03
Hotel:
The Bush Hotel
Nice Street
Farnham
UK
Tel:+44-xxxxxx
Rate: 90

22/03
Hotel:
The Bear Hotel
Honey Street
Woodstock
UK
Tel:+44-xxxxxx
Rate: 100