awk parsing file to create a database

system · March 22, 2016, 1:45pm

Hi Guys,

I have a list a hotels stored in many different text files.

This list is kept in the following format:

20/03
Hotel:
The Bear Hotel
Honey Street 
Woodstock
UK
Tel:+44-xxxxxx
Rate: 100

21/03
Hotel:
The Bush Hotel
Nice Street
Farnham
UK
Tel:+44-xxxxxx
Rate: 90

22/03
Hotel:
The Bear Hotel
Honey Street 
Woodstock
UK
Tel:+44-xxxxxx
Rate: 100

I would like to make a script (using awk) that parses all files containing this kind of data and that will produce an output file containing all hotels only once and sorted by towns.

Many thanks for your help and keep up the good work.
Cheers,
Fred

Scrutinizer · March 22, 2016, 3:00pm

A Quick way using a semicolon as record separator only works if the are no semicolons in the text, otherwise use a character that does not occur in the text..
The lines between the input records need to be completely empty, there can be no spaces. Try:

awk '{$1=$1}1' FS='\n' OFS=\; RS= infile | sort -t\; -k3,3 -u | awk '{$1=$1}1' ORS='\n\n' FS=\; OFS='\n'

if your sort combines -u with the -k option

--
Otherwise:

awk '{$1=$1}1' FS='\n' OFS=\; RS= infile | sort -t\; -k3,3 | awk '!A[$3=$3]++' ORS='\n\n' FS=\; OFS='\n'

RudiC · March 22, 2016, 3:04pm

Are hotels with same name in two or more towns possible? Try

sort -t\; -k3,3 -k5,5 -u

, then.

Scrutinizer · March 22, 2016, 3:11pm

Good point, or likewise with the second approach:

.... | sort -t\; -k3,3 -k5,5 | awk '!A[$3=$3, $5]++' ORS='\n\n' FS=\; OFS='\n'

Aia · March 23, 2016, 2:45am

perl -00 -alnF'\n' -e '@{$h{"$F[2]$F[4]"}}=@F; END{for(sort {$h{$a}->[4] cmp $h{$b}->[4]} keys %h){print join "\n", @{$h{$_}}}}' freddie50.hotels

21/03
Hotel:
The Bush Hotel
Nice Street
Farnham
UK
Tel:+44-xxxxxx
Rate: 90

22/03
Hotel:
The Bear Hotel
Honey Street
Woodstock
UK
Tel:+44-xxxxxx
Rate: 100