I have a set of free-form phone numbers that are not uniform and I want to reformat them into a standard uniform string. These are embedded at the end of a colon seperated file built by a large nawk + tr piping like such:
Is there a way to build a template with arrays or something similar. I have tried several things with awk and IFS, but can't seem to adequately break up these into array's by each byte. Solving this is step 1.
Step 2 is how do I incorporate this solution into a 3 step piping so that only 1 file is created during the script. No temporary files allowed. So it would need to occur like this:
nawk 'large string of operations to join 2 files' File1 File2 | tr -d ' ' | "this phone number solution" > output.txt
Is this the best way to approach this? I don't want to do nawk + "tr" to remove whitespace and create output.txt, then come back through and do the phone number solution on output.txt to create new_output.txt. It all needs to be done in one swoop without the temp file unless you can rewrite the new phone number to the output.txt file after its generated.
I think that would work for any of the crap encountered when a person has the full number and area code formatted in 20 different ways. The remaining issue is when they just have their extension there. It needs to be exploded based off of the first digit of the extension to include the full number + area code. For example, if the extension is 66666 then the first 6 would translate to adding XXX-XX6-6666 to make the full number. Likewise for 33333 it would morph to XXX-XX3-3333.
Would this be possible with a larger awk and conditional statements? Oh and this is AIX so no gawk...only nawk and awk. Even though your gawk just has "substr" which should be fine with nawk.
As you can see, the numbers where the area code was provided are easier to figure out. The lines that just have an extension require an additional set of numbers based off the first byte of the extension. So the 12345 extension becomes 736-251-2345 because the 1 of the extension signifies a certain constant of 5 digits to go in front of the extension. There is no calculation just a constant value based on the number of the extension. Something similar to this:
if extension starts with "1" append 736-25 on the front of extension and add dash after 1.
if extension starts with "2" append 854-32 on the front of extension and add dash after 2.
...
if extension starts with "7" append 655-62 on the front of extension and add dash after 7.
...
etc..
I will try yours out in a bit. I need some time to figure it out.
dr. house,
Anything after the first 10 numeric digits (or 12 with dashes) can be thrown away, such as line 1 of your output. Would you add maybe another conditional (length() maybe) to your "awk" or can the "sed" be modified to trim this?