Hi, I have some data I have taken from the internet in the following scheme:
name
direction
webpage
phone number
open hours
menu url
book url
name
...
Of course the only line that is mandatory is the name wich is the one I want to sort by.
I have the following sed & awk script that its working but I would like to know my mistakes or if there is another (better) way:
#!/bin/sh
set -e
rnb=/tmp/res_no_blank
rns=/tmp/res_names_sorted
[ $# -ne 1 ] && echo "need file as argument" && exit 1
[ ! -s "$1" ] && echo "file is empty" && exit 1
# Delete duplicate empty lines
awk '/^$/{ if (! blank++) print; next } { blank=0; print }' "$1" > "$rnb"
# Sort name of restaurant
awk '/^$/{getline; if($0 != null) print $0}' "$rnb" | sort | uniq > "$rns"
while read rest
do
sed -n -E '/'"$rest"'/,/^$/p' "$1" | sed 's/Cerrado hoy/'$(date +%a)' cerrado/'
done < "$rns" > "$1.sorted"
sample of the data
Cardamomo Tablao Flamenco
Calle Echegaray, 15, 28014 Madrid
cardamomo.com
918 05 10 38
reservas: https://cardamomo.com/es/comprar-entradas-flamenco/?utm_source=google%20my%20business&utm_medium=google%2B&utm_campaign=link%20a%20comprar%20entradas
Verm�
Calle de Jes�s, 6, 28014 Madrid
914 21 55 65
Cerrado hoy
Rodilla
Calle de Alcal�, n� 67, local Izquierdo, 28014 Madrid
rodilla.es
917 55 53 22
8:00�21:30
El Patio Vertical
Calle de Almad�n, 26, 28014 Madrid
elpatiovertical.es
914 20 16 63
8:30�21:00
Restaurante La Tragantua
Calle de la Ver�nica, 4, 28014 Madrid
latragantua.es
sorry I messed the script copy pasting.
Will try your solution (and try to understand it), looks it will fit better my goals
---------- Post updated at 01:32 PM ---------- Previous update was at 01:02 PM ----------
As always man pages have the tips:
The input is normally made up of input lines (records) separated by
newlines, or by the value of RS. If RS is null, then any number of blank
lines are used as the record separator, and newlines are used as field
separators (in addition to the value of FS). This is convenient when
working with multi-line records.
What I don't understand its why its needed to do '$1=$1'
So - $1 is assigned to, but without modification, and the new OFS (<TAB>, \t) replaces the old one (<new line>, \n) to result in a new, one line record prepared for sorting.