Split file into rows in UNIX

Hi Team,
I have delimited file with 6gb of data and it is not hsve newline seperator.

Ask : I would like to split file into rows based on columns.

Example:
File.txt contains below data

abc-per-qwe-xyz-triv-amz-try-utr-ytu-uio-pou-cvf-bht-grt-qwe

Expected output:

abc-per-qwe
xyz-triv-amz
try-utr-ytu
uio-pou-cvf
bht-grt-qwe

Please show attempts to complete this task. Otherwise a moderator might have you do it manually with a text editor. Also, search the site for similar or exact solutions to accomplish this task.

Hi
If you use regexp you can hang the computer.

tr '-' '\n' <file | pr -3tas'-' >new_file

it takes 10 to 20 minutes
You can look at the progress.
In another window, run

progress -c pr -M

or

pv file | tr '-' '\n' | pr -3tas'-' >new_file

We do encourage users to post their own attempts before helping.
Please in the future, show your effort, as we are not your scripting service.

Give this a shot :

awk -F"-" ' { for ( i=3;i<=NF;i+=3) print $(i-2) FS $(i-1) FS $i }' input

Regards
Peasant.

1 Like
<file tr - $'\n' | paste -sd'--\n'
abc-per-qwe
xyz-triv-amz
try-utr-ytu
uio-pou-cvf
bht-grt-qwe
1 Like

It prints the line occurrence of column 1 ($1). a[$1]++ fails for the first occurrence increment since the array value for that key does not exist. The ! means not true so the expression is evaluated to true . The default action for true in awk is to print the line. In other terms: if the previous value of column 1 in array "a" cannot be incremented then print the line. The line could have been written as: '{if (! a[$1]++) print $0}'

Speed increased 5-10 times!

6GB single line file completely froze the system

1 Like

Hello nez,

How about this one then? I am sorry I couldn't test it since I don' have my logged into any server so tested on a online editor with 2 lines. Pretty sure should be faster :slight_smile:

xargs -d'-' -n 3 <  Input_file | sed -E 's/ /-/g;/^$/d'

Thanks,
R. Singh

1 Like

Hi
Even a little slower than in my version

1 Like

I completely missed the 6GB part, sorry for the wrong suggestion.

Regards
Peasant.

awk 'ORS=(NR % 3) ? RS : "\n"' RS=- File.txt > outfile.txt
1 Like

Hi @rdrtx1
In your version, progress is 2.7MiB/s
If a little change

awk '{getline b; getline c; print $0"-"b"-"c}' RS='-'
  • speed will rise a little 3.7MiB/s
  • version from @RavinderSingh13 4.5MiB/s as mine 5.0MiB/s
  • version from @RudiC 28.0-40.0MiB/s process speed is volatile.

so while 'tr' + 'paste' is not in competition
Thanks