Split records

Hi

I have a file

$cat test
a,1;2;3
b,4;5;6;7
c,8;9

I want to split each record to multiple based on semicolon in 2nd field.
i.e

a,1
a,2
a,3
b,4
b,5
b,6
b,7
c,8
c,9

Can someone assist ?

Try,

awk -F"[,;]" '{for (i=2;i<=NF;i++) {print $1 "," $i}}' file
a,1
a,2
a,3
b,4
b,5
b,6
b,7
c,8
c,9
1 Like

Thanks,that helps.
Can you explain what does the below do ?


awk -F"[,;]"

May I try to explain it?

awk -F"[,;]" -> You are telling to awk what you want awk to consider what is going to be the Field Separator (FS). You have to tell awk what is going to be the Field Separator in order to know to consider what strings are going to be considered as Fields in a Record (Record=Line).

As in /etc/passwd you have ":" you could tell awk just to consider each Field Separator ":" that would be fine.

In this case you have two different characters "," and ";" to consider either as a FS . How to tell this to awk? Fortunately awk supports Regex even when defining internal variables.

In order to tell awk to evaluate a regex you have to use [ ] and inside define your regex.

Here is a good place to see that your regex is working: RegExr: Learn, Build, & Test RegEx eventhough there are many of them. Give it a try with: /[,;]/g and see it finds either "," or ";" in the text.

You can tell which is going to be the FS either using the -F option or telling awk in the BEGIN block which is used when you want awk to do things before starting to process the file.

awk 'BEGIN{FS="[,;]"} {for (i=2;i<=NF;i++) {print $1 "," $i}}' test

Then if you use a for loop and you use to limit it with the NF (Number of Fields), awk is so smart that understands that you are going through each Field without even telling.

for (i=2;i<=NF;i++)

The action to perform for each time it is in a field is that you want to print the first field $1 and each field that takes its value in the $i variable.

for (i=2;i<=NF;i++) {
  print $1 "," $i
}

Notice that eventhough you are telling awk to consider which is going to be your FS , it will not use this information to print your output. That is, if you do this

awk 'BEGIN{FS="[,;]"} {for (i=2;i<=NF;i++) {print $1,$i}}' test

It will output:

a 1
a 2
a 3
b 4
b 5
b 6
b 7
c 8
c 9

because awk will be using the default OFS if not telling the opposite (Output FS).

So as you see, awk makes a difference with FS and OFS . It won't print the output using the defined FS eventhough you stated a specific one. FS variable is defined to tell awk how you want awk to evaluate your strings but it has nothing to do with the way you want them to be printed.

Another way of telling awk to print the same thing:

awk 'BEGIN{FS="[,;]";OFS=","} {for (i=2;i<=NF;i++) {print $1,$i}}' test
2 Likes