awk concatenate every line of a file in a single line

sdf · January 17, 2012, 4:22pm

I have several hundreds of tiny files which need to be concatenated into one single line and all those in a single file. Some files have several blank lines. Tried to use this script but failed on it.

awk 'END { print r } r && !/^/ { print FILENAME, r; r = "" }{ r = r ? r $0 : $0 }' *.txt >concatenated_lines.txt

Corona688 · January 17, 2012, 4:29pm

# Concatenate many files, strip out newlines to make one line, write to file
cat *.txt | tr -d '\n' > file
# Add final newline to the newline-less line
echo >> file

If you wanted them to be separated by spaces instead of nothing, you could do this:

cat *.txt | tr -s '\n' ' ' > file
echo >> file

sdf · January 17, 2012, 4:32pm

corona688:

cat *.txt | tr -d '\n' > file
# Add final newline to the newline-less line
echo >> file
If you wanted them to be separated by spaces instead of nothing, you could do this:
cat *.txt | tr -s '\n' ' ' > file
echo >> file

Would you mind posting the code in awk. I would also need the FILENAME ahead of the concatenated line.

Corona688 · January 17, 2012, 4:38pm

It would've been nice to know if those did what you wanted. If they don't, neither will the same thing written in awk! :wall:

ORS controls the output record separator, which is a newline by default. Setting it blank tells it to print nothing between records, so it'll squeeze it all together. Then at the very end, print one newline, since most things need a newline on the very end to acknowledge it as a line.

awk -v ORS="" '{$1=$1} 1 END { printf("\n"); }' *.txt > output

sdf · January 17, 2012, 4:55pm

corona688:

It would've been nice to know if those did what you wanted. If they don't, neither will the same thing written in awk! :wall:

ORS controls the output record separator, which is a newline by default. Setting it blank tells it to print nothing between records, so it'll squeeze it all together. Then at the very end, print one newline, since most things need a newline on the very end to acknowledge it as a line.
awk -v ORS="" '{$1=$1} 1 END { printf("\n"); }' *.txt > output

Sorry can't test cat & tr. I am using windowz:wall:

Have tried your code but am still in trouble with the FILENAME in front of each line. I also changed {$1=$1} into {$0=$0} and wondering if this makes it faster?

awk -v ORS=" " 'BEGIN{ print FILENAME} {$0=$0} 1 ;END { printf("\n"); }' *.txt >output.txt

Corona688 · January 17, 2012, 5:01pm

You realize these are the UNIX forums, yes? :wall:

Either way is a complete no-op. The only point of it is to inform awk that the data has changed, so that it should translate the newlines into the output record separator of nothing. I'd expect both ways to use nearly no time...

It's always better to show what you want than to post code for a routine which doesn't do what you want; broken code can't be used to show what you do want. Now that I understand your needs a little better:

awk -v ORS=" " '$1 { print FILENAME, $0; } END { printf("\n"); }' *.txt

Doing an explicit print means not needing to do $1=$1 to get translation. We print exactly what we want instead -- sometimes that's easier than manipulating $0 into what you want printed, and sometimes it's not...

The $1 before the block avoids processing blank lines.

FILENAME is never set in the BEGIN block. BEGIN means the beginning of the program, not the beginning of the line -- and having not yet read any data, there's no FILENAME set inside...

sdf · January 17, 2012, 5:15pm

awk -v ORS=" " '$1 { print FILENAME, $0; } END { printf("\n"); }' *.txt

This code produces the FILENAME for every line in the input File. (at least in gawk).

Thanks for the explanation!

Corona688 · January 17, 2012, 6:02pm

What other lines? I thought each file was one line...

If not, how about this?

awk -v ORS=" " 'LF != FILENAME { print FILENAME; LF=FILENAME }; {$1=$1} 1; END { printf("\n"); }' *.txt

sdf · January 17, 2012, 6:07pm

awk -v ORS=" " 'LF != FILENAME { print FILENAME; LF=FILENAME }; {$1=$1} 1; END { printf("\n"); }' *.txt

Yes this works, no double entry. Thanks a lot!