New Member - First Question

Here is my situation...

System - HP UNIX (HP-UX hq5 B.11.00 U 9000/800 (td))

I have an HL7 (Health Level Seven) pipe-delimied file that does not have any carriage returns/line feeds. I need to insert a line feed before each segment type (MSH, PID, PV1, OBX, etc.) so that my PROGRESS program can read each line and determine the segment type for processing.

Input file looks like this:

MSH|data|data|data|data|data|data|data|PID|data|data|data|data|PV1|data|data|OBX|data|data|...

I need a command (I've tried sed) to convert the file to look like this:

MSH|data|data|data|data|data|data|data|
PID|data|data|data|data|
PV1|data|data|
OBX|data|data|...

If anyone can give me a command to do this, I will be most grateful.

sed, awk, and other such line-based tools probably aren't going to be much help, they can't handle lines over a relatively small maximum size.

If your shell's read command supports the -d option:

while read -d '|' DATA
do
        if [ "$DATA" == "MSH" ] || [ "$DATA" == "PID" ] ||
                [ "$DATA" == "PV1" ] || [ "$DATA" == "OBX" ]
        then
                echo -en "\n${DATA}"
        else
                echo -n "|${DATA}"
        fi
done < datafile

echo

Dear Corona,

Thanks for the help but my system does not support the -d option on the read command.

I think you are correct as to why the 'sed' command doesn't work. My HL7 file is almost 1.5 MB.

Any other suggestions?

Could you explain how can we distinguish between segment type and data?
Is it the case - upper/lower?

I don't think sed has any such limit. It's a stream editor, not a line editor.

sed 's/|\([MSH|PID|PV1]\)/\|\
\1/g' file1

MSH|data|data|data|data|data|data|data|
PID|data|data|data|data|
PV1|data|data|OBX|data|data|
....

This is implementation-dependent. GNU sed doesn't. The sed FAQ details what versions have what limitations but it's not clear which version is what HPUX has.

Don't you need -r for brackets and backreferences?

HPUX 11.00

what /usr/bin/sed

/usr/bin/sed:

         $Revision: 80.5 $

:smiley:

I don't think my original sed worked that great!

sed -E 's/(PID|PV1|...|...)/\
&/g' file1

But I just ran a test on a line with 7 million characters with no problems, so I imagine the limitations to which you refer are for old versions of sed (so it probably doesn't work on Solaris!!)

Edit: Completely over-looked (i.e. didn't see) first line of question : "System - HP UNIX (HP-UX hq5 B.11.00 U 9000/800 (td))". :o Sorry.

It should be straightforward with Perl.

/usr/bin/sed:
$Revision: 80.5.1.1 $
PATCH_11_00: sed0.o sed1.o hpux_rel.o 00/11/27

If you don't mind an amateurish attempt, how about something like:

awk '{gsub("MSH","\nMSH",$0) ; gsub("PID","\nPID",$0) ; gsub("PV1","\nPV1",$0) ; gsub("OBX","\nOBX",$0) ; print}' filename

Or, a little easier to expand as needed, perhaps:

awk 'BEGIN {str[1] = "MSH" ; str[2] = "PID" ; str[3] = "PV1" ; str[4] = "OBX"} \ 
{for (i = 1 ; i <= length(str) ; i++) {gsub(str,"\n"str,$0)}} \ 
END {print}' filename

Of course, that'll get very tedious depending on how many possible str values therer are...

The first sed posted by scottn works on HPUX 11.00 for the sample file albeit with the trailing pipe character on all but the last record.
The later version with "-E" gives a syntax error.

Now we need the O/P to try it on a large file.

Knowing the Progress version would help. There are other approaches using a recent Progress 4GL.

If I understood the problem correctly, this is something that should work in PHP:

<?php
$data=file_get_contents('./your_data.txt');
$stuff=preg_replace("/\|([A-Z0-9]{3})/","|\n$1",$data);
print $stuff;
?>

Output on sample file:

MSH|data|data|data|data|data|data|data|
PID|data|data|data|data|
PV1|data|data|
OBX|data|data|...

Trying treesloth's approach using:

awk '{gsub("MSH","\nMSH",$0) ; gsub("PID","\nPID",$0) ; gsub("PV1","\nPV1",$0) ; gsub("OBX","\nOBX",$0) ; gsub("ORC",\nORC",$0) ; gsub("OBR","\nOBR",$0) ;
gsub("NTE","\nNTE",$0) ; print}' /care/misc/rad_report

I get the error message:

syntax error The source line is 1.
The error context is
{gsub("MSH","\nMSH",$0) ; gsub("PID","\nPID",$0) ; gsub("PV1","\
nPV1",$0) ; gsub("OBX","\nOBX",$0) ; >>> gsub("ORC",\ <<< nORC",$0) ; gsub("OBR
","\nOBR",$0) ; gsub("NTE","\nNTE",$0) ; print}
awk: The statement cannot be correctly parsed.
The source line is 1.

My version of PROGRESS is 8.3B

Any other suggestions?

---------- Post updated at 02:06 PM ---------- Previous update was at 01:49 PM ----------

I found the error in my previous code (missing ").

Now the error reads:

awk: Input line MSH|^~\&|HNAM|HNA|CL cannot be longer than 3,000 bytes.

Any other suggestions?

Use PHP... did you see my script? I think it will work for you. If your file is too large (over 128 MB ) and you get a PHP memory error, I can show you how to fix that as well.

Well,
you didn't answer my question, but I'll give you an idea anyway:

perl -ple's/(?<!\A)(MSH|P(ID|V1)|OBX)/\n$1/g' infile

Or you can use python: Parsing HL7 with Python
Or Ruby: ruby-hl7