Hi there,
say I have a line with multiple columns but with different separation formats: spaces, tabs..
Is it possible to have AWK print the separation format between each column?
Hi there,
say I have a line with multiple columns but with different separation formats: spaces, tabs..
Is it possible to have AWK print the separation format between each column?
If you supply any other means to identify columns and/or tell them from each other.
Thank you, but I dont think I got your point
You didn't supply any way for a program looking at your input file to determine whether a one or more spaces or one or more tabs found in a line in that input file is supposed to be interpreted as a field separator or as data within a field. And since you said the separation formats include spaces, tabs, ...; we don't know whether any given character in the input file is a field separator or data.
If you can't explicitly explain the difference between data and field separators in your input, there is nothing we can do to help you write an awk
script that will determine the format for you.
Ok, thank you. The file looks like this:
ABCD A L BBB J HHH 9495994 4458902 HJFGGJK SDFR 22222
Can awk determine which are the separators between ABCD and A, or between BBB and J?
Based on the assumption your file contains only upper case chars and digits, this would yield all separators used:
awk '{gsub(/[A-Z0-9]/,"");for (i=1; i<=length; i++) C[substr($0,i,1)]++} END {for (c in C) print c}' file
Thank you RudiC. I tried it but the output is nothing, which I dont think is wrong. The answer is there, its just that whether the answer are spaces or tabs, these are not being printed as " " or "t". In other words, for the file
ABCD A L
which is ABCDtabAspacespacespaceL
I would expect awk to print these separators, something like:
awk_command
output:
t " "
The output is there. Pipe it through hexdump
or od
.
Although there were no <tab> characters in any of your samples, you could try something like:
awk '
{ printf("Input is:\n%s\nFormat is:\n", $0)
gsub(/[^[:space:]]+/, "DATA")
gsub(/\t/, "<tab>")
gsub(/ /, "<space>")
print
}' file
If file
contains:
ABCD A L BBB J HHH 9495994 4458902 HJFGGJK SDFR 22222
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10
it produces the output:
Input is:
ABCD A L BBB J HHH 9495994 4458902 HJFGGJK SDFR 22222
Format is:
DATA<space><space><space><space><space><space>DATA<space><space>DATA<space><space><space><space><space>DATA<space>DATA<space>DATA<space><space><space><space><space>DATA<space>DATA<space><space>DATA<space><space>DATA<space>DATA<space><space>
Input is:
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10
Format is:
DATA<tab>DATA<space><space>DATA<tab>DATA<space>DATA<space>DATA<space>DATA<tab>DATA<space>DATA<space><tab><space>DATA
If you want to try this on a Solaris/SunOS system, change awk
to /usr/xpg4/bin/awk
.
Brilliant, thank you all.
@DonCragun, why do you say there were no tabs in my sample? I made it with a vi editor, and I actually used a tab separator. Just curious..
I say there were no tabs in your sample because the text you included in post #5 and post #7 in this thread did not contain any <tab> characters.
Note that if you copy text from a terminal emulator window in which you are running vi
, vi
uses spaces instead <tab>s to display <tab> characters. On most systems, if you exit vi
and cat
the file to the screen, you can then copy the text from your terminal emulator window and paste it into a post on this forum to have <tab>s appear as <tab>s (ass long as you post those sample inside CODE tags).