Shell - Read a text file with two words and extract data

hi I made this simple script to extract data and pretty much is a list and would like to extract data of two words separated by commas and I would like to make a new text file that would list these extracted data into a list and each in a new line.

Example that worked for me with text file having 3 fields only:

#!/bin/bash
IFS=", "
while read f1 f2 f3
do
	echo "$f1"
	echo "$f2"
	echo "$f3"
done < test2.txt > output.txt

and output.txt shows this

cheek
gingivae
hard
palate
inferior
lip
soft
palate
superior
lip
tongue
uvula

cat test2.txt shows this:

cheek, gingivae, hard,
palate, inferior, lip,
soft, palate, superior
lip, tongue, uvula

BUT NOW I have a list with more fields and have two words that I need:

new test2.txt shows this:

right, kidney, right ureter,
urethra, urinary, bladder,
left, adrenal gland, abdominal aorta, inferior vena cava, right renal vein, right renal artery, rectum, uterus

how can I list these 2 word terms into a list with each in its own individual line

THANKS!

Hi, try using arrays:

while IFS=', ' read -a F
do
  printf "%s\n" "${F[@]}"
done <  newtest2.txt > output.txt

But there is another issue:
There are fields that contain spaces, so the IFS value will not work properly.

So try this instead:

while IFS=, read -a F
do
  printf "%s\n" "${F[@]# }"
done < newtest2.txt > output.txt

Here the IFS is set to a single comma and a spurious leading space gets cut off through parameter expansion ( # )

May I ask what read -a do and F means too?

Also I why did you use printf instead of echo now? I don't know much about printf may I ask if its like grep or awk?

Then, what does %s mean I know \n is newline

Then, what does ${F[@]# } mean?

while IFS=, read -a F
do
  printf "%s\n" "${F[@]# }"
done < newtest2.txt > output.txt

Thanks much for prompt reply!

You're welcome,

-a means read into an array (when using bash ). F is the name of the array

printf is the preferred and standardized alternative to echo . The first field to printf is the "format string" . "%s" means "string" and "\n" means new line". See: printf or the bash man page.

To get all the element of the array one normally uses: "${F[@]}"
"${F[@]# }" does the same, but in addition it uses parameter expansion and # means cut off a leading space if it exists.

Since F is an array it will work on every element of the array (more about this in the bash man page).

With the while read loop, for every line of the input file, the array F gets filled anew.

1 Like

Wouldn't

tr ',' '\n' <file

do the same (except for the leading spaces)?

Hi RudiC, yes, except for leading spaces and extra new lines, and:

awk '{$1=$1}1' RS=, newtest2.txt > output.txt

would be a full solution.

But I assumed the OP wanted a shell solution, since that is what the OP was using and I assumed he was going to do some further processing within the loop.