Date wise calculations?

POST_DATE	CHECK_NUMBER	TYPE	LOGIN_NAME
2015.09.09	XXXXXXXXXX	mark XXXXXXXXXX
2015.09.09	XXXXXXXXXX	fsadf	XXXXXXXXXX
2015.10.05	XXXXXXXXXX	defaa	XXXXXXXXXX
2015.10.05	XXXXXXXXXX	dewe	XXXXXXXXXX
2015.10.06	XXXXXXXXXX	dqwe	XXXXXXXXXX
2015.09.14	XXXXXXXXXX	dt4e	XXXXXXXXXX
2015.10.26	XXXXXXXXXX	dpwp	XXXXXXXXXX
2015.10.19	XXXXXXXXXX	dfsef	XXXXXXXXXX
2015.11.18	XXXXXXXXXX	deew	XXXXXXXXXX


[/CODE]

I want to find the dates for any 2months, for example if i ask for sept and October it should list me as below,

POST_DATE	CHECK_NUMBER	TYPE	LOGIN_NAME
2015.09.09	XXXXXXXXXX	mark XXXXXXXXXX
2015.09.09	XXXXXXXXXX	fsadf	XXXXXXXXXX
2015.10.05	XXXXXXXXXX	defaa	XXXXXXXXXX
2015.10.05	XXXXXXXXXX	dewe	XXXXXXXXXX
2015.10.06	XXXXXXXXXX	dqwe	XXXXXXXXXX
2015.09.14	XXXXXXXXXX	dt4e	XXXXXXXXXX
2015.10.26	XXXXXXXXXX	dpwp	XXXXXXXXXX
2015.10.19	XXXXXXXXXX	dfsef	XXXXXXXXXX

[/CODE]

my parameters would be explicit.

Yes... And what have you tried so far?

awk -F "|" '{print $1}' file name | head  | sort -u  | egrep "2015.09.??|2015.10.??"

[/CODE]

The above is giving me only the date part, but for the content i tried using other methodologies but no luck hence turned up to this forum :slight_smile:

One more thing, those months i'll be passing as parameter.

Why all that piping? Did you know that (e)grep by default prints the entire line if either pattern is matched?

actually few of the pipes is for head, my file is vast with million records, i just gave the dummy content.. also it has some 40 columns with many having values as date. :frowning:

Hello Nikhil,

As a simple and starting point you could try with following code and let me know if this helps you, if you have more requirements then please do show us expected output with full requirement details on same.

awk 'NR==1{print;next}{split($1, A,".");if(A[2]==09 || A[2]==10){print}}'  Input_file

Output will be as follows.

POST_DATE       CHECK_NUMBER    TYPE    LOGIN_NAME
2015.09.09      XXXXXXXXXX      mark XXXXXXXXXX
2015.09.09      XXXXXXXXXX      fsadf   XXXXXXXXXX
2015.10.05      XXXXXXXXXX      defaa   XXXXXXXXXX
2015.10.05      XXXXXXXXXX      dewe    XXXXXXXXXX
2015.10.06      XXXXXXXXXX      dqwe    XXXXXXXXXX
2015.09.14      XXXXXXXXXX      dt4e    XXXXXXXXXX
2015.10.26      XXXXXXXXXX      dpwp    XXXXXXXXXX
2015.10.19      XXXXXXXXXX      dfsef   XXXXXXXXXX
 

Thanks,
R. Singh

You might also want to try something like:

#!/bin/ksh
IAm=${0##*/}
if [ $# -ne 3 ]
then	printf 'Usage: %s year.month1 year.month2 filename\n' "$IAm" >&2
	exit 1
fi
awk -F'.' -v ym1="$1" -v ym2="$2" 'NR==1 || $1 FS $2==ym1 || $1 FS $2==ym2' "$3"

(I usually use the Korn shell, but you can use any shell that recognizes POSIX parameter expansions (e.g., ash , bash , dash , and many others in addition to ksh .)
If the sample data shown in post #1 in this thread is stored in a file named data and the above script is in an executable file named tester , then the command:

./tester 2015.09 2015.10 data

produces the output you requested and the command:

./tester 2015.09 2015.11 data

produces the output:

POST_DATE	CHECK_NUMBER	TYPE	LOGIN_NAME
2015.09.09	XXXXXXXXXX	mark XXXXXXXXXX
2015.09.09	XXXXXXXXXX	fsadf	XXXXXXXXXX
2015.09.14	XXXXXXXXXX	dt4e	XXXXXXXXXX
2015.11.18	XXXXXXXXXX	drew	XXXXXXXXXX

If you want to try this on a Solaris/SunOS system, change awk in the script to /usr/xpg4/bin/awk or nawk .

For exactly your sample and request, this might do:

egrep "2015.(09|10)|POST" file
POST_DATE    CHECK_NUMBER    TYPE    LOGIN_NAME
2015.09.09    XXXXXXXXXX    mark XXXXXXXXXX
2015.09.09    XXXXXXXXXX    fsadf    XXXXXXXXXX
2015.10.05    XXXXXXXXXX    defaa    XXXXXXXXXX
2015.10.05    XXXXXXXXXX    dewe    XXXXXXXXXX
2015.10.06    XXXXXXXXXX    dqwe    XXXXXXXXXX
2015.09.14    XXXXXXXXXX    dt4e    XXXXXXXXXX
2015.10.26    XXXXXXXXXX    dpwp    XXXXXXXXXX
2015.10.19    XXXXXXXXXX    dfsef    XXXXXXXXXX

hey don,

thanks for the solution, how does it interprets the POST_DATE column, can u plz guide me?

Hey Ravinder

Thanks for the help, but it is not working as expected.

---------- Post updated at 11:35 AM ---------- Previous update was at 11:15 AM ----------

Rudi,

It is not working, It is even filtering the 11 and 12 month rows as well.

For 09 & 10 it works as expected, but as i change the parameter to 11 & 12 it does not

egrep -w "2015.(11|12)|POST" EXTRACT.txt | awk -F "|" '{print $12}' | head
2015.09.09
2015.11.02
2015.10.05
2015.10.05
2015.10.06
2015.09.14
2015.10.26
2015.10.19
2015.11.18
2015.09.15

[/CODE]

rudi,

On further analysis i found out that it is actually grepping some 03 available in that particular line, I wanted only those available in POST_DATE Column.
the month wise calculations, i'll be doing it there.

Some creativity with the regex might yield the correct result for column 1:

egrep "^2015.(11|12)|POST"  file

my POST_DATE is not the column 1, It is column 12, Even this is not helping.

egrep "^2015.(03|04)|POST" EXTRACT.txt | head | awk -F "|" '{print $12}'
POST_DATE
POST_DATE
2015.12.02
 
2015.12.10
2015.12.28
2015.12.03
2015.12.21
2015.12.09

[/CODE]

Where did you specify that we're dealing with field 12?

sorry my bad :frowning: I forgot to.

PLEASE consider giving a clear, crisp, and detailed specification in the very first post!
Try

egrep "^([^ ]+ +){11}2015.(11|12)|POST"  file

I'm really sorry for that rudi, but still no luck

egrep "([^ ]+ +){11}2015.(03|04)|POST"  EXTRACT.txt | awk -F "|" {'print $12'}
POST_DATE
POST_DATE
2015.12.02
 
2015.12.10
2015.12.28
2015.12.03
2015.12.21
2015.12.09
2015.12.09
2015.11.25
2015.12.01
2015.12.24
 
2015.12.09

[/CODE]

Without a decent real life sample I can't help further.

Rudi,

PF the data below, I have masked lot of things, we have some more columns, dint included him here thinking it may not be required.

1008	116823	33	A	118	1431240091	15	ABCD 6510		2015.09.09	XXXXXXXXXX	d	XXXXXXXXXX	2015.12.01	1	33
1008	116823	33	A	118	-1142149389	23.05	ABCD 6510		2015.09.09	XXXXXXXXXX	d	XXXXXXXXXX	debit	XXXXXXXXXX	2015.12.01
1008	116823	33	A	118	-392122611	18.9	ABCD 6510		2015.10.05	XXXXXXXXXX	d	XXXXXXXXXX	2015.12.01	1	39
1008	116823	33	A	118	-54753935	84.34	ABCD 6510		2015.10.05	XXXXXXXXXX	d	XXXXXXXXXX	2015.12.01	1	39
1008	116823	33	A	118	1966358768	2.91	ABCD 6510		2015.10.06	XXXXXXXXXX	d	XXXXXXXXXX	2015.12.01	1	22
1008	116823	33	A	118	469200831	13.47	ABCD 6510		2015.09.14	XXXXXXXXXX	d	XXXXXXXXXX	2015.12.01	1	23
1008	116823	33	A	118	1284265141	2.76	ABCD 6510		2015.10.26	XXXXXXXXXX	d	XXXXXXXXXX	2015.12.01	1	1
1008	116823	33	A	118	-1911676106	4	ABCD 6510		2015.10.19	XXXXXXXXXX	d	XXXXXXXXXX	2015.12.01	1	1
1008	116823	33	A	118	-2064109421	40	ABCD 6510		2015.11.18	XXXXXXXXXX	d	XXXXXXXXXX	2015.12.01	1	11
1008	116823	33	A	118	2079801662	25	ABCD 6510		2015.09.15	XXXXXXXXXX	d	XXXXXXXXXX	2015.12.01	1	28
1008	116823	33	A	118	1954930603	7	ABCD 6510		2015.10.19	XXXXXXXXXX	d	XXXXXXXXXX	2015.12.01	1	1
M	Mqq	BAN	DISPLAY_NAME	IT_A_I	TR_SEC_I	ANT	DESCRIPTION	TRANSACTION_DATE	POST_DATE	CHECK_NUMBER	Methid	LNAME	ASOF	Tstatus	Tcat

[/CODE]

Expecting this will help ... Have included even the column name below

Listen, this is getting lengthy. There's no | delimiter as indicated in post#15. Using the default field separators, field 12 is

awk '{print $12}' file
d
d
d
d
d
d
d
d
d
d
d

Unless you take a decent step back, start over and present a meaningful and useful specification, I won't be in a position to help.

rudi,

Really sorry, I had just copied it from excel just noticed that some of the column i had to wrap it to mask.

Plz chk below if it helps

1008	116823	33	A	118	1431240091	282483235	1431240091	15	ABCD 6510		2015.09.09	XXXXXXXXXX	d	XXXXXXXXXX
1008	116823	33	A	118	-1142149389	282483125	-1142149389	23.05	ABCD 6510		2015.09.09	XXXXXXXXXX	d	XXXXXXXXXX
1008	116823	33	A	118	-392122611	282483183	-392122611	18.9	ABCD 6510		2015.10.05	XXXXXXXXXX	d	XXXXXXXXXX
1008	116823	33	A	118	-54753935	282483182	-54753935	84.34	ABCD 6510		2015.10.05	XXXXXXXXXX	d	XXXXXXXXXX
1008	116823	33	A	118	1966358768	282483181	1966358768	2.91	ABCD 6510		2015.10.06	XXXXXXXXXX	d	XXXXXXXXXX
1008	116823	33	A	118	469200831	282483225	469200831	13.47	ABCD 6510		2015.09.14	XXXXXXXXXX	d	XXXXXXXXXX
1008	116823	33	A	118	1284265141	282483140	1284265141	2.76	ABCD 6510		2015.10.26	XXXXXXXXXX	d	XXXXXXXXXX
1008	116823	33	A	118	-1911676106	282483161	-1911676106	4	ABCD 6510		2015.10.19	XXXXXXXXXX	d	XXXXXXXXXX
1008	116823	33	A	118	-2064109421	282483043	-2064109421	40	ABCD 6510		2015.11.18	XXXXXXXXXX	d	XXXXXXXXXX
1008	116823	33	A	118	2079801662	282483253	2079801662	25	ABCD 6510		2015.09.15	XXXXXXXXXX	d	XXXXXXXXXX
1008	116823	33	A	118	1954930603	282483160	1954930603	7	ABCD 6510		2015.10.19	XXXXXXXXXX	d	XXXXXXXXXX
M	Mqq	BAN	DISPLAY_NAME	IT_A_Id	Aitem_id	Btransid	TR_SEC_I	ANT	DESCRIPTION	TRANSACTION_DATE	POST_DATE	CHECK_NUMBER	Methid	LNAME

[/CODE]

Here is a commented version of the script I suggested:

#!/bin/ksh
# Set IAm to the final component of the pathname used to invoke this script.
IAm=${0##*/}

# If the number of operands given to this script is not 3, print a usage
# message to the standard error output and exit.
if [ $# -ne 3 ]
then	printf 'Usage: %s year.month1 year.month2 filename\n' "$IAm" >&2
	exit 1
fi
# Use awk to print any line in the file whose pathname is given by the 3rd
# operand to this script ("$3") where the field separator for lines in the file
# is set to a <period> character (-F'.') and:
# 1.  the input line number is 1 (NR==1): to print the header, or (||)
# 2.  the 1st field followed by the field separator followed by the 2nd field
#	is the string specified by the 1st operand to this script (-v ym1="$1")
#	and ($1 FS $2==ym1): to print lines matching the 1st given year
#	and month, or (||)
# 3.  the 1st field followed by the field separator followed by the 2nd field
#	is the string specified by the 2nd operand to this script (-v ym2="$2")
#	and ($1 FS $2==ym2): to print lines matching the 2nd given year
#	and month.
awk -F'.' -v ym1="$1" -v ym2="$2" 'NR==1 || $1 FS $2==ym1 || $1 FS $2==ym2' "$3"

Is this enough explanation, or is there something that still isn't clear?

If the date you want to process is in the 12th field and the field separator is a <tab> character and there is no header line (or the header line does not appear as the 1st line and doesn't need to be copied to the output) as in your latest sample input, you could try the following script instead:

#!/bin/ksh
# Set IAm to the final component of the pathname used to invoke this script.
IAm=${0##*/}

# If the number of operands given to this script is not 3, print a usage
# message and exit.
if [ $# -ne 3 ]
then	printf 'Usage: %s year.month1 year.month2 filename\n' "$IAm" >&2
	exit 1
fi

awk -F'\t' -v ym1="$1" -v ym2="$2" '
#NR == 1 {
# The following code to print the header has been removed because the sample
# data does not have a header line.
#	# Print the header line and skip further processing for this line.
#	print
#	next
#}
{	# Split the 12th tab separated field into the ymd[] array using
	# <period> as the field # separator within this field.
	split($12, ymd, /[.]/)
	#printf("$12=%s, ymd[1]=%s, ymd[2]=%s\n", $12, ymd[1], ymd[2])

	# Print this line if ym1 or ym2 is the same string as ymd[1] "." ymd[2].
	if(ym1 == ymd[1] "." ymd[2] || ym2 == ymd[1] "." ymd[2])
		print
}' "$3"

If someone wants to try either of the above scripts on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk .

As has been said before, giving us data that is not at all representative of the data you want to process, not telling us what the field separator is in the data you want to process, and continually changing your requirements is a huge waste of everyone's time.

I hope the above helps you, but if this still doesn't meet your requirements, we have little incentive to try to help if your requirements change again.

1 Like