How to extract every 6-th character in a line?

Hi, I have a file consist of 25000 lines and each line has 2 columns 1 column with 6 numeric characters and 2nd one with 45000 numeric characters (not delimited). I want to take every 7th character form the 2nd column, keeping the first column. I made it in several steps but it run for 7 hours to solution. Does anyone has faster solution? Thanks!

example
File dd0.txt 

100001 1100000000011000011000012222111111111100000000000001122222112221111110012222122222220100000000 ... up to 45000
100002 1100000000011000011000012222111111111100000000000001122222112221111110012222122222220100000000 ... up to 45000

125000 1100000000011000011000012222111111111100000000000001122222112221111110012222122222220100000000 ... up to 45000


result
100001 10001100220120 ... up to 6429
100002 10001100220120 ... up to 6429

125000 10001100220120 ... up to 6429

Hmm. Sounds not that complicated. Would you like to show us your solution?

Your desired result doesn't seem to match your specification. Shouldn't it be more like

100001 10001100210200... 
100002 10001100210200...
125000 10001100210200...

?

... I'm curious how fast the awk solutions will be.

Here the test data generator (about 1 GB Data):

#!/bin/bash

tr -dc '0-9' </dev/urandom | fold -w45000 | while read line ;do 
        ((c=$c+1))
        [[ $c -gt 25000 ]] && break
        echo $(($c + 100000)) $line 
done
1 Like

Hi,
Maybe with sed below but with same resultat give by RudiC:

sed -e 's/.\{7\}\|.\{1,6\}$/_&/g;s/^_//;s/_\(.\)[0-9]*/\1/g' file

Regards.

1 Like

Hi, thanks for reply it is much faster, I tested even on much bigger files it is working fine.

------ Post updated at 03:24 AM ------

It seem that is not working properly.

Not much that we can work with. In WHAT way above "is not working properly"? And, please answer post#2!

Just to dump my lines here before I clean it from my disk ....

This is in bash:

#!/bin/bash

compact() {
        local compacted=""
        local length=${#1}
        for((char=0;$char<=$length;char=$char+7)) ; do
                compacted="$compacted${1:$char:1}"
        done
        _res=$compacted
}

while read line ;do 
        compact "${line:7}"
        printf "%6s %s\n" ${line:0:6} $_res
done <$1

3 seconds per processed line (~21 Hours total)

same in lua:

#!/usr/bin/env lua

local data = io.open(arg[1],"r")

function compact(string)
        local compacted=""
        local length = string:len()
        for char=1,length,7 do
                compacted=compacted..string:sub(char,char)
        end
        return compacted
end

for line in data:lines() do
      print(line:sub(1,7)..compact(line:sub(8)))
end

2 minutes runtime total