rmv
October 13, 2009, 1:51am
1
Hello ,
I need to extract data from specific byte positions of a file.
I have tried the below command
awk ' { printf "%s", substr($0, 642363,642369}' filename
to extract data between byte positions
642363 and 642369 .
However I did not get the expected result.
I am new to awk and sed , please help to achieve the results that I want.
I am not sure if there any Unix command that would extract these details.
while(<>) {
$txt .= $_;
last if ( scalar split //, $txt ) >= ( 642369 );
}
$txt =~ s/(.{642363})(.{6})/\2/s;
print $txt;
Might be process intensive ?! Not tested, but expected to work.
rmv
October 13, 2009, 3:27am
3
Well , I am new to sed and awk
please let me know how I run this script ?
Yep, sorry for not mentioning it. It is actually a perl script.
Store this in a file, (any name), script1.pl
$ cat script1.pl
while(<>) {
$txt .= $_;
last if ( scalar split //, $txt ) >= ( 642369 );
}
$txt =~ s/(.{642363})(.{6})/\2/s;
print $txt;
Execute it as,
$ perl script1.pl input-file
It will extract the particular data, and prints it.
rmv
October 13, 2009, 5:38am
5
Please can you make the byte positions as also variables
which can be passed along with file name.
Coz I need to do it several times over many files.
Thanks in advance.
---------- Post updated at 03:08 PM ---------- Previous update was at 02:56 PM ----------
Hello ,
I ran the below code :
#!/usr/bin/perl
while(<>) {
$txt .= $_;
last if ( scalar split //, $txt ) >= ( 72551 );
}
$txt =~ s/(.{72545})(.{6})/\2/s;
print $txt;
and got the below message when I ran it like
perl byte_pos_script.pl file_name
Quantifier in {,} bigger than 32766 before HERE mark in regex m/(.{ << HERE 72545})(.{6})/
Please help.
aigles
October 13, 2009, 6:03am
6
Another way using dd :
dd if=filename bs=1 skip=642362 count=8
An example with the following input file (which contains a single record)
[001][002][003][004][005][006][007][008][009][010][011][012][013][014][015][016]
[017][018][019][020][021][022][023][024][025][026][027][028][029][030][031][032]
[033][034][035][036][037][038][039][040][041][042][043][044][045][046][047][048]
[049][050][051][052][053][054][055][056][057][058][059][060][061][062][063][064]
[065][066][067][068][069][070][071][072][073][074][075][076][077][078][079][080]
[081][082][083][084][085][086][087][088][089][090][091][092][093][094][095][096]
[097][098][099][100][101][102][103][104][105][106][107][108][109][110][111][112]
[113][114][115][116][117][118][119][120][121][122][123][124][125][126][127][128]
[129][130][131][132][133][134][135][136][137][138][139][140][141][142][143][144]
[145][146][147][148][149][150][151][152][153][154][155][156][157][158][159][160]
[161][162][163][164][165][166][167][168][169][170][171][172][173][174][175][176]
[177][178][179][180][181][182][183][184][185][186][187][188][189][190][191][192]
[193][194][195][196][197][198][199]
To extract bytes from 501 to 510 :
> dd if=inputfile bs=1 skip=500 count=10 2>/dev/null
[101][102]>
As you can see there is no line terminator in the output.
Jean-Pierre.
Bad luck, that is your platforms max quantifier thing as said here perlre - perldoc.perl.org
Does this helps for you:
$ tr '\n' ' ' < input-file | cut -c 72456-72551
translate new line to space
cut the required bits