How to determine column separation format?

la2015 · October 16, 2015, 1:11pm

Hi there,

say I have a line with multiple columns but with different separation formats: spaces, tabs..

Is it possible to have AWK print the separation format between each column?

RudiC · October 16, 2015, 1:17pm

If you supply any other means to identify columns and/or tell them from each other.

la2015 · October 16, 2015, 1:43pm

Thank you, but I dont think I got your point

Don_Cragun · October 16, 2015, 2:38pm

You didn't supply any way for a program looking at your input file to determine whether a one or more spaces or one or more tabs found in a line in that input file is supposed to be interpreted as a field separator or as data within a field. And since you said the separation formats include spaces, tabs, ...; we don't know whether any given character in the input file is a field separator or data.

If you can't explicitly explain the difference between data and field separators in your input, there is nothing we can do to help you write an awk script that will determine the format for you.

la2015 · October 16, 2015, 3:22pm

Ok, thank you. The file looks like this:

ABCD      A  L     BBB J HHH     9495994 4458902  HJFGGJK  SDFR 22222

Can awk determine which are the separators between ABCD and A, or between BBB and J?

RudiC · October 17, 2015, 6:06am

Based on the assumption your file contains only upper case chars and digits, this would yield all separators used:

awk '{gsub(/[A-Z0-9]/,"");for (i=1; i<=length; i++) C[substr($0,i,1)]++} END {for (c in C) print c}' file

la2015 · October 17, 2015, 6:23am

Thank you RudiC. I tried it but the output is nothing, which I dont think is wrong. The answer is there, its just that whether the answer are spaces or tabs, these are not being printed as " " or "t". In other words, for the file

ABCD    A   L

which is ABCDtabAspacespacespaceL

I would expect awk to print these separators, something like:

awk_command
output:
t "   "

RudiC · October 17, 2015, 6:30am

The output is there. Pipe it through hexdump or od .

Don_Cragun · October 17, 2015, 7:49am

Although there were no <tab> characters in any of your samples, you could try something like:

awk '
{	printf("Input is:\n%s\nFormat is:\n", $0)
	gsub(/[^[:space:]]+/, "DATA")
	gsub(/\t/, "<tab>")
	gsub(/ /, "<space>")
	print
}' file

If file contains:

ABCD      A  L     BBB J HHH     9495994 4458902  HJFGGJK  SDFR 22222  
f1	f2  f3	f4 f5 f6 f7	f8 f9 	 f10

it produces the output:

Input is:
ABCD      A  L     BBB J HHH     9495994 4458902  HJFGGJK  SDFR 22222  
Format is:
DATA<space><space><space><space><space><space>DATA<space><space>DATA<space><space><space><space><space>DATA<space>DATA<space>DATA<space><space><space><space><space>DATA<space>DATA<space><space>DATA<space><space>DATA<space>DATA<space><space>
Input is:
f1	f2  f3	f4 f5 f6 f7	f8 f9 	 f10
Format is:
DATA<tab>DATA<space><space>DATA<tab>DATA<space>DATA<space>DATA<space>DATA<tab>DATA<space>DATA<space><tab><space>DATA

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk .

la2015 · October 17, 2015, 7:55am

Brilliant, thank you all.

@DonCragun, why do you say there were no tabs in my sample? I made it with a vi editor, and I actually used a tab separator. Just curious..

Don_Cragun · October 17, 2015, 9:30am

I say there were no tabs in your sample because the text you included in post #5 and post #7 in this thread did not contain any <tab> characters.

Note that if you copy text from a terminal emulator window in which you are running vi , vi uses spaces instead <tab>s to display <tab> characters. On most systems, if you exit vi and cat the file to the screen, you can then copy the text from your terminal emulator window and paste it into a post on this forum to have <tab>s appear as <tab>s (ass long as you post those sample inside CODE tags).