How to determine column separation format?

Hi there,

say I have a line with multiple columns but with different separation formats: spaces, tabs..

Is it possible to have AWK print the separation format between each column?

If you supply any other means to identify columns and/or tell them from each other.

Thank you, but I dont think I got your point

You didn't supply any way for a program looking at your input file to determine whether a one or more spaces or one or more tabs found in a line in that input file is supposed to be interpreted as a field separator or as data within a field. And since you said the separation formats include spaces, tabs, ...; we don't know whether any given character in the input file is a field separator or data.

If you can't explicitly explain the difference between data and field separators in your input, there is nothing we can do to help you write an awk script that will determine the format for you.

Ok, thank you. The file looks like this:

ABCD      A  L     BBB J HHH     9495994 4458902  HJFGGJK  SDFR 22222 

Can awk determine which are the separators between ABCD and A, or between BBB and J?

Based on the assumption your file contains only upper case chars and digits, this would yield all separators used:

awk '{gsub(/[A-Z0-9]/,"");for (i=1; i<=length; i++) C[substr($0,i,1)]++} END {for (c in C) print c}' file

Thank you RudiC. I tried it but the output is nothing, which I dont think is wrong. The answer is there, its just that whether the answer are spaces or tabs, these are not being printed as " " or "t". In other words, for the file

ABCD    A   L

which is ABCDtabAspacespacespaceL

I would expect awk to print these separators, something like:

awk_command
output:
t "   "

The output is there. Pipe it through hexdump or od .

Although there were no <tab> characters in any of your samples, you could try something like:

awk '
{	printf("Input is:\n%s\nFormat is:\n", $0)
	gsub(/[^[:space:]]+/, "DATA")
	gsub(/\t/, "<tab>")
	gsub(/ /, "<space>")
	print
}' file

If file contains:

ABCD      A  L     BBB J HHH     9495994 4458902  HJFGGJK  SDFR 22222  
f1	f2  f3	f4 f5 f6 f7	f8 f9 	 f10

it produces the output:

Input is:
ABCD      A  L     BBB J HHH     9495994 4458902  HJFGGJK  SDFR 22222  
Format is:
DATA<space><space><space><space><space><space>DATA<space><space>DATA<space><space><space><space><space>DATA<space>DATA<space>DATA<space><space><space><space><space>DATA<space>DATA<space><space>DATA<space><space>DATA<space>DATA<space><space>
Input is:
f1	f2  f3	f4 f5 f6 f7	f8 f9 	 f10
Format is:
DATA<tab>DATA<space><space>DATA<tab>DATA<space>DATA<space>DATA<space>DATA<tab>DATA<space>DATA<space><tab><space>DATA

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk .

Brilliant, thank you all.

@DonCragun, why do you say there were no tabs in my sample? I made it with a vi editor, and I actually used a tab separator. Just curious..

I say there were no tabs in your sample because the text you included in post #5 and post #7 in this thread did not contain any <tab> characters.

Note that if you copy text from a terminal emulator window in which you are running vi , vi uses spaces instead <tab>s to display <tab> characters. On most systems, if you exit vi and cat the file to the screen, you can then copy the text from your terminal emulator window and paste it into a post on this forum to have <tab>s appear as <tab>s (ass long as you post those sample inside CODE tags).