New Member - First Question

varefump · February 18, 2010, 2:15pm

Here is my situation...

System - HP UNIX (HP-UX hq5 B.11.00 U 9000/800 (td))

I have an HL7 (Health Level Seven) pipe-delimied file that does not have any carriage returns/line feeds. I need to insert a line feed before each segment type (MSH, PID, PV1, OBX, etc.) so that my PROGRESS program can read each line and determine the segment type for processing.

Input file looks like this:

MSH|data|data|data|data|data|data|data|PID|data|data|data|data|PV1|data|data|OBX|data|data|...

I need a command (I've tried sed) to convert the file to look like this:

MSH|data|data|data|data|data|data|data|
PID|data|data|data|data|
PV1|data|data|
OBX|data|data|...

If anyone can give me a command to do this, I will be most grateful.

Corona688 · February 18, 2010, 2:23pm

sed, awk, and other such line-based tools probably aren't going to be much help, they can't handle lines over a relatively small maximum size.

If your shell's read command supports the -d option:

while read -d '|' DATA
do
        if [ "$DATA" == "MSH" ] || [ "$DATA" == "PID" ] ||
                [ "$DATA" == "PV1" ] || [ "$DATA" == "OBX" ]
        then
                echo -en "\n${DATA}"
        else
                echo -n "|${DATA}"
        fi
done < datafile

echo

varefump · February 18, 2010, 2:38pm

Dear Corona,

Thanks for the help but my system does not support the -d option on the read command.

I think you are correct as to why the 'sed' command doesn't work. My HL7 file is almost 1.5 MB.

Any other suggestions?

radoulov · February 18, 2010, 2:42pm

Could you explain how can we distinguish between segment type and data?
Is it the case - upper/lower?

Scott · February 18, 2010, 2:43pm

I don't think sed has any such limit. It's a stream editor, not a line editor.

sed 's/|\([MSH|PID|PV1]\)/\|\
\1/g' file1

MSH|data|data|data|data|data|data|data|
PID|data|data|data|data|
PV1|data|data|OBX|data|data|
....

Corona688 · February 18, 2010, 2:53pm

This is implementation-dependent. GNU sed doesn't. The sed FAQ details what versions have what limitations but it's not clear which version is what HPUX has.

Don't you need -r for brackets and backreferences?

methyl · February 18, 2010, 3:16pm

HPUX 11.00

what /usr/bin/sed

/usr/bin/sed:

         $Revision: 80.5 $

Scott · February 18, 2010, 3:22pm

I don't think my original sed worked that great!

sed -E 's/(PID|PV1|...|...)/\
&/g' file1

But I just ran a test on a line with 7 million characters with no problems, so I imagine the limitations to which you refer are for old versions of sed (so it probably doesn't work on Solaris!!)

Edit: Completely over-looked (i.e. didn't see) first line of question : "System - HP UNIX (HP-UX hq5 B.11.00 U 9000/800 (td))". :o Sorry.

radoulov · February 18, 2010, 3:31pm

It should be straightforward with Perl.

varefump · February 18, 2010, 3:45pm

/usr/bin/sed:
$Revision: 80.5.1.1 $
PATCH_11_00: sed0.o sed1.o hpux_rel.o 00/11/27

treesloth · February 18, 2010, 3:56pm

If you don't mind an amateurish attempt, how about something like:

awk '{gsub("MSH","\nMSH",$0) ; gsub("PID","\nPID",$0) ; gsub("PV1","\nPV1",$0) ; gsub("OBX","\nOBX",$0) ; print}' filename

Or, a little easier to expand as needed, perhaps:

awk 'BEGIN {str[1] = "MSH" ; str[2] = "PID" ; str[3] = "PV1" ; str[4] = "OBX"} \ 
{for (i = 1 ; i <= length(str) ; i++) {gsub(str,"\n"str,$0)}} \ 
END {print}' filename

Of course, that'll get very tedious depending on how many possible str values therer are...

methyl · February 18, 2010, 4:00pm

The first sed posted by scottn works on HPUX 11.00 for the sample file albeit with the trailing pipe character on all but the last record.
The later version with "-E" gives a syntax error.

Now we need the O/P to try it on a large file.

Knowing the Progress version would help. There are other approaches using a recent Progress 4GL.

Neo · February 18, 2010, 4:21pm

If I understood the problem correctly, this is something that should work in PHP:

<?php
$data=file_get_contents('./your_data.txt');
$stuff=preg_replace("/\|([A-Z0-9]{3})/","|\n$1",$data);
print $stuff;
?>

Output on sample file:

MSH|data|data|data|data|data|data|data|
PID|data|data|data|data|
PV1|data|data|
OBX|data|data|...

varefump · February 19, 2010, 2:06pm

Trying treesloth's approach using:

awk '{gsub("MSH","\nMSH",$0) ; gsub("PID","\nPID",$0) ; gsub("PV1","\nPV1",$0) ; gsub("OBX","\nOBX",$0) ; gsub("ORC",\nORC",$0) ; gsub("OBR","\nOBR",$0) ;
gsub("NTE","\nNTE",$0) ; print}' /care/misc/rad_report

I get the error message:

syntax error The source line is 1.
The error context is
{gsub("MSH","\nMSH",$0) ; gsub("PID","\nPID",$0) ; gsub("PV1","\
nPV1",$0) ; gsub("OBX","\nOBX",$0) ; >>> gsub("ORC",\ <<< nORC",$0) ; gsub("OBR
","\nOBR",$0) ; gsub("NTE","\nNTE",$0) ; print}
awk: The statement cannot be correctly parsed.
The source line is 1.

My version of PROGRESS is 8.3B

Any other suggestions?

---------- Post updated at 02:06 PM ---------- Previous update was at 01:49 PM ----------

I found the error in my previous code (missing ").

Now the error reads:

awk: Input line MSH|^~\&|HNAM|HNA|CL cannot be longer than 3,000 bytes.

Any other suggestions?

Neo · February 19, 2010, 2:13pm

Use PHP... did you see my script? I think it will work for you. If your file is too large (over 128 MB ) and you get a PHP memory error, I can show you how to fix that as well.

radoulov · February 19, 2010, 2:15pm

Well,
you didn't answer my question, but I'll give you an idea anyway:

perl -ple's/(?<!\A)(MSH|P(ID|V1)|OBX)/\n$1/g' infile

fpmurphy · February 19, 2010, 4:46pm

Or you can use python: Parsing HL7 with Python
Or Ruby: ruby-hl7