Enhance existing script: Extract Multiple variables & Input in an echo string

Hi Experts

I need your help to optimize my script to execute better as I have nearly 1M records & the script is taking close to 40 minutes to execute, so would need support on a faster alternative.

Input: file

{"house":"1024","zip":"2345","city":"asd","country":"zzv"}
{"city":"asd","house":"1024","zip":"2845","country":"zzv"}
{"house":"1028","zip":"2645","city":"asd","country":"zzv"}
{"zip":"2545","house":"1021","city":"asd","country":"zzv"}
{"city":"asd","house":"1020","zip":"2345","country":"zzv"}

Script:

for i in `cat file`
do
	HNO=`echo $i | awk -F"\"house\":" '{print $2}'| cut -d"," -f1 | sed 's/\"//g'`
	ZIP=`echo $i | awk -F"\"zip\":" '{print $2}'| cut -d"," -f1 | sed 's/\"//g'`
	CIT=`echo $i | awk -F"\"city\":" '{print $2}'| cut -d"," -f1 | sed 's/\"//g'`
	echo 'House number is '$HNO', Zip Code is '$ZIP', City is '$CIT' and Country is zzv'
done

Output:

House number is 1024, Zip Code is 2345, City is asd and Country is zzv
House number is 1024, Zip Code is 2845, City is asd and Country is zzv
House number is 1028, Zip Code is 2645, City is asd and Country is zzv
House number is 1021, Zip Code is 2545, City is asd and Country is zzv
House number is 1020, Zip Code is 2345, City is asd and Country is zzv

Thanks for your help in advance
Regards
Nk

Hi, the reason that the script is so slow, is that it is firing off 12 different sub shells per line in the input file, so that is 12 million sub shells in total, which is quite costly.
Examples of sub shells in the script are:

  • awk -F"\"house\":" '{print $2}'
  • cut -d"," -f1
  • sed 's/\"//g'
  • `echo $i | awk -F"\"house\":" '{print $2}'| cut -d"," -f1 | sed 's/\"//g'
  • awk -F"\"zip\":" '{print $2}'
  • etcetera

Try eliminating those sub shells, by using a JSON parser:

jq -r '"House number is " + .house + ", Zip Code is " + .zip + ", City is " + .city + " and Country is " + .country' file
House number is 1024, Zip Code is 2345, City is asd and Country is zzv
House number is 1024, Zip Code is 2845, City is asd and Country is zzv
House number is 1028, Zip Code is 2645, City is asd and Country is zzv
House number is 1021, Zip Code is 2545, City is asd and Country is zzv
House number is 1020, Zip Code is 2345, City is asd and Country is zzv

--
If you do not have a json parser you could try using a single script without sub shells for this particular sample:

awk -F\" '
  {
    split(x,F)
    for(i=2; i<=NF; i+=4)
      F[$i]=$(i+2)
    printf "House number is %s, Zip Code is %s, City is %s and Country is %s\n", F["house"], F["zip"], F["city"], F["country"]
  }
' file
4 Likes

Thanks for your quick help.

It works absolutely fine & the execution Time is improved significantly.

Regards
Nk