Grabbing data between 2 points in text file

Mikey · August 3, 2013, 7:54pm

I have a text file that shows the output of my solar inverters. I want to separate this into sections. overview , device 1 , device 2 , device 3. Each device has different number of lines. but they all have unique starting points. Overview starts with 6 #'s, Devices have 4#'s and their data starts with 2 #'s. I know the answer is in front of me. How to separate them so I can import them to a box at home that is running appache.

/var/www/tmp $ cat sunnywebbox-summary.txt
###### Overview:
         GriPwr (         GriPwr): 192 W
      GriEgyTdy (      GriEgyTdy): 17.55920136064 kWh
      GriEgyTot (      GriEgyTot): 27014.58717195136 kWh
          OpStt (          OpStt):
            Msg (            Msg):

  #### Device WR18UW8E:1573704651 (WR18UW8E:1573704651):

    ## Process data:
                Error (          Error):
              E-Total (        E-Total):  kWh
                  Fac (            Fac):  Hz
              h-Total (        h-Total):  h
                  Iac (            Iac):  mA
                  Ipv (            Ipv):  mA
                 Mode (           Mode):
                  Pac (            Pac):  W
             Power On (       Power On):
        Serial Number (  Serial Number):
          Temperature (    Temperature):  grdC
                  Vac (            Vac):  V
                  Vpv (            Vpv):  V

  #### Device WR25UW8E:1374125280 (WR25UW8E:1374125280):

    ## Process data:
                Error (          Error): -------
              E-Total (        E-Total): 20193.09417195136 kWh
                  Fac (            Fac): 60.000 Hz
              h-Total (        h-Total): 24283.4258315628 h
                  Iac (            Iac): 218 mA
                  Ipv (            Ipv): 363 mA
                 Mode (           Mode): Mpp-Search
                  Pac (            Pac): 53 W
             Power On (       Power On): 3179
        Serial Number (  Serial Number): 1374125280
          Temperature (    Temperature): 43.4 grdC
                  Vac (            Vac): 245 V
                  Vpv (            Vpv): 333 V

  #### Device WRHU0U5A:2120048843 (WRHU0U5A:2120048843):

    ## Process data:
             A.Ms.Amp (       A.Ms.Amp): 0.414 A
             A.Ms.Vol (       A.Ms.Vol): 359.200 V
            A.Ms.Watt (      A.Ms.Watt): 148 W
                Error (          Error): -------
              E-Total (        E-Total): 6821.493 kWh
            GridMs.Hz (      GridMs.Hz): 60.010 Hz
      GridMs.PhV.phsA (GridMs.PhV.phsA): 247.890 V
      GridMs.PhV.phsB (GridMs.PhV.phsB): 0.000 V
      GridMs.PhV.phsC (GridMs.PhV.phsC): 0.000 V
        Inv.TmpLimStt (  Inv.TmpLimStt): NoneDrt
            MainModel (      MainModel): Solar-WR
                 Mode (           Mode): MPP
          Mt.TotOpTmh (    Mt.TotOpTmh): 5546.575025102 h
            Mt.TotTmh (      Mt.TotTmh): 5806.537352761 h
         Op.EvtCntUsr (   Op.EvtCntUsr): 54
             Op.EvtNo (       Op.EvtNo): 0
          Op.GriSwStt (    Op.GriSwStt): Cls
            Op.TmsRmg (      Op.TmsRmg): 0.000 s
                  Pac (            Pac): 139 W
        Serial Number (  Serial Number): 2120048843

>>>>> UPDATE <<<<<<
I have been working with SED all evening.. I just can't grab one data set from the file without grabbing a part of another. see below what I have been doing.

sed -n '/Overview/,/\           /p' sunnywebbox-summary.txt

sed -n '/#### Device WR25/,/\                     /p' sunnywebbox-summary.txt

 sed -n '/#### Device WRH/,/\                     /p' sunnywebbox-summary.txt

/var/www/tmp $ sed -n '/#### Device WR25/,/\####/p' sunnywebbox-summary.txt


  #### Device WR25UW8E:1374125280 (WR25UW8E:1374125280):

    ## Process data:
                Error (          Error): -------
              E-Total (        E-Total): 20193.09417195136 kWh
                  Fac (            Fac): 60.000 Hz
              h-Total (        h-Total): 24283.4258315628 h
                  Iac (            Iac): 218 mA
                  Ipv (            Ipv): 363 mA
                 Mode (           Mode): Mpp-Search
                  Pac (            Pac): 53 W
             Power On (       Power On): 3179
        Serial Number (  Serial Number): 1374125280
          Temperature (    Temperature): 43.4 grdC
                  Vac (            Vac): 245 V
                  Vpv (            Vpv): 333 V

  #### Device WRHU0U5A:2120048843 (WRHU0U5A:2120048843):

Just_Ice · August 3, 2013, 11:00pm

try awk instead using the device names as reference points ...

awk "/WR25/,/WRHU/" sunnywebbox-summary.txt

Mikey · August 3, 2013, 11:57pm

gives me what I have been looking at all evening

#### Device WR25UW8E:1374125280 (WR25UW8E:1374125280):

\#\# Process data:
            Error \(          Error\):
          E-Total \(        E-Total\):  kWh
              Fac \(            Fac\):  Hz
          h-Total \(        h-Total\):  h
              Iac \(            Iac\):  mA
              Ipv \(            Ipv\):  mA
             Mode \(           Mode\):
              Pac \(            Pac\):  W
         Power On \(       Power On\):
    Serial Number \(  Serial Number\):
      Temperature \(    Temperature\):  grdC
              Vac \(            Vac\):  V
              Vpv \(            Vpv\):  V

#### Device WRHU0U5A:2120048843 (WRHU0U5A:2120048843):

Just_Ice · August 4, 2013, 3:43am

the problem here is that your input does not have easily delimited paragraphs where sed or awk can quickly work ... it would have been quiet easy to awk "/WR25/,/^$/" sunnywebbox-summary.txt but the empty line immediately after the Device line kills that ... doing the original awk command line i suggested would work if you could just remove the second Device line like in awk "/WR25/,/WRHU/" sunnywebbox-summary.txt | grep -v WRHU but that would only work if you know which device follows in the report and the listed device is not the last in the list ...

anyways, the script below should work with or without an argument ... it really only uses sed to get the desired output but needed all the other lines to determine the sed reference addresses ... somebody here could probably script this in perl or awk better but at least you have something to start with ...

#! /bin/ksh
PATH=/usr/bin:/bin:/usr/local/bin:/usr/sbin:/sbin
device=$1
testfile=testfile1

grep -n -i device $testfile > /tmp/$$
if [ ! $device ]
then
    print "\n--- Devices ---"
    awk '{print $4}' /tmp/$$ | awk -F: '{print $1}'
    print "\nWhich device to report? \c"
    read device
fi

linecnt1=$(wc -l < $testfile)
linecnt2=$(wc -l < /tmp/$$)
startcnt=$(grep -n $device /tmp/$$ | awk -F":" '{print $1}')
if [ $startcnt -eq linecnt2 ]
then
    endline=$linecnt1
else
    nextcnt=$(expr $startcnt + 1)
    tempcnt=$(sed -n "${nextcnt}p" /tmp/$$ | awk -F":" '{print $1}')
    endcnt=$(expr $tempcnt - 1)
    endline=$endcnt
fi
startline=$(grep -n $device $testfile | awk -F":" '{print $1}')

echo
sed -n "${startline},${endline}p" $testfile

rm -f /tmp/$$ 2> /dev/null

exit 0

MadeInGermany · August 4, 2013, 4:12am

This one exits before it prints

awk '($1=="####" && p) {p=0} $0~s {p=1} p' s=WR25 sunnywebbox-summary.txt

Mikey · August 5, 2013, 6:58am

Thank everyone, I devised my own fix. I converted the txt file to xml and that made extraction much easier.

Mike

RudiC · August 5, 2013, 7:41am

You could try this to separate the overview and the devices into their own .txt files:

awk '/^..#### / {fn = substr ($NF, 1, 9)".txt"; gsub (/[)(:]/, "", fn)} {print >fn}'