Reformat MLS Data - Use AWK?

I am helping my wife set up a real estate site and I am starting to integrate MLS listings. We are using a HostGator level 5 VPS running CentOS and have full root and SSH access to the VPS.

Thus far I have automated the daily FTP download of listings from our MLS server using a little sh script. It is a 90MB text file, pipe-delimited with about 150 fields.

Also, I have settled on using the IProperty component from TheThinkery which is a Joomla extension as the basis for the property search engine. IP uses a comma-delimited text file for input/output and has a different order of the fields.

Basically I just need to

  1. convert pipe-delimited to comma-delimited and add double-quotes for all text fields.
  2. re-order the fields.

Is AWK or Perl the way to go? I have confirmed AWK is installed. I used to have an O-Reilly AWK book fifteen years ago and actually read the whole thing believe it or not, but I haven't written a line of AWK in probably 10 years. :smiley:

Would someone be willing to help me with this for a fee? I assume if you know what you're doing it would take less than an hour. I will do the grunt work re-ordering fields.

By the way, I came across this forum when searching and landed on this thread --
"Convert CSV file (with double quoted strings) to pipe delimited file"

That's sort of like what I want to do except the opposite. (Also I need to re-order fields).

Thanks!
Eric

You can start with this skeleton.

Consider "mx" as text fields - do the same for each one.

#!/usr/bin/ksh
IFS='|'
while read m1 m2 m3 ... mn; do
  mx='"'${mx}'"'
  echo ${m1}','${m2}','${m3}','...${mn}','
done < Inp_File
1 Like

Almost anything will work. awk would probably be simpler for such straightforward translation:

awk -v FS="|" -v OFS="," { for(N=1; N<=NF; N++) $N="\"" $N "\"" } 1' < input > output

As for reordering, depends entirely on what you want to put where.

---------- Post updated at 02:17 PM ---------- Previous update was at 02:11 PM ----------

Here's an improved version which you can just feed the order of fields you want into:

awk 'BEGIN {
    FS="|"
    split("5,4,3,2,1", O, ","); # The order of fields you want, 1 being first
}
{ 
        P=""
        for(N=1; O[N]; N++)
        {
                M=O[N];
                printf("%s\"%s\"", P, $M);
                P=","
        }
        printf("\n");
}' < input > output

That will print field "field 5","field 4","field 3","field 2","field 1". Change "5,4,3,2,1" into whatever you want.

1 Like

Fantastic! How much do I owe you's? Or I can donate to your favorite charities.

Eric

I'm not picky. Wasn't technically charging for the work.