April 7, 2011 at 12:15 pm
I have dataset that looks similar to the one below with 3 fields. There are additional fields, but I've stripped out those that don't matter.
STUDENT_ID MONTH DURATION
829384 201006 NULL
829284 201007 NULL
829284 201008 NULL
829284 201010 NULL
845290 201006 NULL
938402 201007 NULL
938402 201010 NULL
938402 201011 NULL
The dataset is relatively large - about 10 million rows. I need to write a query that will set the duration on the first date for a student to 0, then 1, then 2, etc. for each subsequent month - unless there's a gap of 3 months. At that point it starts over at 0. Also, when it gets to the next student, it starts over at 0. So the dataset above should look like this:
STUDENT_ID MONTH DURATION
829384 201006 0
829284 201007 1
829284 201008 2
829284 201010 3
845290 201006 0
897372 201007 0
897372 201009 1
897372 201010 2
938402 201007 0
938402 201010 0
938402 201011 1
Right now, this is accomplished using a cursor and takes a very long time to run. In the hope of speeding it up, I'm trying to find a way to do it without the cursor, using queries with self joins and maybe temp tables, but I'm having trouble achieving it. If anyone has any suggestions, I'd really appreciate it.
Thanks in advance,
Hari
April 7, 2011 at 12:26 pm
Try this bit of "spackle" by Jeff Moden ... I believe it might be what you need.
April 7, 2011 at 3:13 pm
Thanks for the tip. I was able to get it done.
Hari
April 7, 2011 at 4:43 pm
Thanks for your reply. Now in the spirit of being a community, to help others, would you mind posting your code, so that others may learn how to solve their own but similar problems.
Again thanks for your reply
Viewing 4 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply