Issue in Concatenation/Joining of lines in a dynamically generated file

Hi,

I have a file containing many records delimited by pipe (|).
Each record should contain 17 columnns/fields. there are some fields having fields less than 17.So i am extracting those records to a file using the below command

awk 'BEGIN {FS="|"} NF !=17 {print}' feedfile.txt >feedfilebadrecs.txt

Then i am concatenating all lines in feedfilebadrecs.txt to one single line.

tr -d '\n' < feedfilebadrecs.txt

but its not working. The concatenation is not happening.I have tried concatenating with other commands like paste,sed , awk etc. but not working(giving some random line as output from the input file)

I applied these commands for concatenating on some other manually created files and its working .

I am not able to understand why concatenation is not happening on dynamically created file using awk.

Please help me to figure this out..

Thank You

awk 'BEGIN {FS="|"} NF !=17 {printf $0} END {print ""} ' feedfile.txt >feedfilebadrecs.txt

and forget about tr .
If this doesn't work, post the output of: cat -vet feedfile.txt using the code tags.

1 Like

What operating system are you using?
How many lines are you concatenating?
After concatenating those lines, how long is your single output line supposed to be?
How have you determined that the concatenation is not working?
What output do you get if you run the commands:

awk 'BEGIN {FS="|"} NF !=17' feedfile.txt >feedfilebadrecs.txt
tr -d '\n' < feedfilebadrecs.txt > longline
wc feedfile.txt feedfilebadrecs.txt longline

P.S. If you are trying this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk .

1 Like

Hi Don Cragun,

1)I am using HP-UX
2) Depends on the requirement, right now I am concatenating only few lines as shown in the screen shot.
3)The single output line can be as long as 300 characters.
4)Its giving only single line from a group of lines in the input file as shown in the screenshot
5)Please see the screen shot

Hi vgersh99,

Awk is working fine, The problem is I am not able to concatenate lines in file created by awk.

cat -vet feedfile.txt

status|Date|Owner|BigDealID|OPGcode|Customer|Product|GandalfDiscoDatedd-mm-yy|Endforecastdatedd-mm-yy|Comments|ExclusioninGandalf?|Completiondate|Comments|Forecastcoomunicatedtoplanning?|Comments|Lastrefreshmentdate|Forecaststillvalid?^M$
Ok|22-10-2008|Alina^M$
Bobirca|73696227||ROYALDUTCHSHELLPLC|SD128AA||01-12-2014||Yes||||||^M$
Ok|22-10-2008|AlinaBobirca|73696227||ROYALDU^M$
TCHSHELLPLC|SD129AA||01-12-2014||Yes||||||^M$
Ok|22-10-2008|AlinaBobirca|73696227||ROYALDUTCHSHELLPLC|SD130AA||01-12-2014||Yes||||||^M$
Ok|22-10-2008|AlinaBobirca|73696227||ROYALDUTCHSHELLPLC|SD13 ^M$
1AA||01-12-2014||Yes||||||^M$
Ok|22-10-2008|AlinaBobirca|73696227||ROYALDUTCHSHELLPLC|SD132AA||01-12-2014||Yes||||||^M$
Ok|22-10-2008|AlinaBobirca|73696227||ROYALDUTCHSHELLPLC|SD142AA||01-12-2014||Yes||||||^M$
Ok|22-10-2008|AlinaBobirca|73696227||ROYALDUTCHSHELLPLC|SD143AA||01-12-2014||Yes||||||^M$
Ok|22-10-2008|AlinaBobirca|73696227||ROYALDUTCHSHELLPLC|SD144AA||01-12-2014||Yes||||||^M$
Ok|22-10-2008|AlinaBobirca|73696227||ROYALDUTCHSHELLPLC|SD145AA||01-12-2014||Yes||||||^M$

Everything is working just fine.

Your problems are that you have a DOS file that you're working on with UNIX utilities (get rid of the carriage return characters and you would get something closer to what you were expecting), and you want to join pairs of truncated lines from feedfile.txt ; not all truncated lines.

Try (untested since you haven't given us a usable copy of feedfile.txt ):

awk '
BEGIN {		FS = OFS = "|" }
{		sub(/\r/, "") }
NF < 17 {	$1 = save $1; save = $0 }
NF >= 17 {	print; save = "" }
' feedfile.txt > fixed_feedfile.txt
1 Like

Hi Don,

please find the screenshot.
I executed your command,now awk output is all records having number of fields as 17. My aim is to extract all records which are less than 17 fields into a new file and then concatenate all lines in that new file.
Also please find the attached feed file.

Thanks

Hi Don,

I understood the idea with this code,

NF < 17 {	$1 = save $1; save = $0 }

but how to print the concatenated line

Thanks

I was missing a step to get it to recalculate NF. Fixing the line you quoted:

NF < 17 {       $0 = save $0; save = $0; $1 = $1 }

combines adjacent partial lines and recalculates the number of fields on the newly combined line, and then the next line in the script:

NF >= 17 {      print; save = "" }

prints the reconstructed partial lines (as well as printing any lines that were complete to start with.

This updated script:

awk '
BEGIN {         FS = OFS = "|" }
{               sub(/\r/, "") }
NF < 17 {       $0 = save $0; save = $0; $1 = $1 }
NF >= 17 {      print; save = "" }
' feedfile.txt > fixed_feedfile.txt

when given the feedfile.txt that you uploaded, saves the following in fixed_feedfile.txt :

status|Date|Owner|BigDealID|OPGcode|Customer|Product|GandalfDiscoDatedd-mm-yy|Endforecastdatedd-mm-yy|Comments|ExclusioninGandalf?|Completiondate|Comments|Forecastcoomunicatedtoplanning?|Comments|Lastrefreshmentdate|Forecaststillvalid?
Ok|22-10-2008|AlinaBobirca|73696227||ROYALDUTCHSHELLPLC|SD128AA||01-12-2014||Yes||||||
Ok|22-10-2008|AlinaBobirca|73696227||ROYALDUTCHSHELLPLC|SD129AA||01-12-2014||Yes||||||
Ok|22-10-2008|AlinaBobirca|73696227||ROYALDUTCHSHELLPLC|SD130AA||01-12-2014||Yes||||||
Ok|22-10-2008|AlinaBobirca|73696227||ROYALDUTCHSHELLPLC|SD13 1AA||01-12-2014||Yes||||||
Ok|22-10-2008|AlinaBobirca|73696227||ROYALDUTCHSHELLPLC|SD132AA||01-12-2014||Yes||||||
Ok|22-10-2008|AlinaBobirca|73696227||ROYALDUTCHSHELLPLC|SD142AA||01-12-2014||Yes||||||
Ok|22-10-2008|AlinaBobirca|73696227||ROYALDUTCHSHELLPLC|SD143AA||01-12-2014||Yes||||||
Ok|22-10-2008|AlinaBobirca|73696227||ROYALDUTCHSHELLPLC|SD144AA||01-12-2014||Yes||||||
Ok|22-10-2008|AlinaBobirca|73696227||ROYALDUTCHSHELLPLC|SD145AA||01-12-2014||Yes||||||

which I would think would be more useful than stripping out the 3 lines that were reconstructed from the partial lines in your input file and producing a single, concatenated, partial line containing 49 (not 51) fields (with no trailing <newline> character).

The space shown in the middle of the text shown in red above is because there is a space before the carriage return character in the 7th line in feedfile.txt which can be seen in the output from the cat -vet feedfile.txt you showed us in post #5 in this thread.