CASS stage performance

betterthanever · Post by **betterthanever** » Sat Oct 18, 2014 7:09 pm

I am using this job design that involves a CASS stage. The max through put i get is 1500 rows/sec. Is there a way to make it tun better??

DS--->Xmr-->CASS-->DS.

what i tried to improve the performnace.
1. played with bumping up the config from 2,4, then 6. 2 to 4 made a little difference after that no difference
2. Partition the data and include multiple CASS stages like below and no differnce.
DS --------xmr------> CASSS ----> FUNNEL --->DS
split the data|------> CASS----->
split the data|------->CASS ------>
split the data|-------> CASS------>
does the installation location of CASS DB makes any difference? I see it is installed on the same mount where IIS is installed.
can you share your experience..I really appreciate any inputs on this.

Thanks.

betterthanever · Post by **betterthanever** » Tue Oct 28, 2014 5:27 pm

update: I tried the below also
1. Made the CASS db installed on multiple mounts and made multiple CASS stages with in the same job and still the same.
2. Tried to do the same in another GRID env making the job to run only on the conductor node still no use

at this point we raised a PMR with IBM and see what they have to say.

The challenge is the fact that we don't have a general ball park of how efficiently this stage runs. I really appreciate your experiences using CASS stage.

qt_ky · Post by **qt_ky** » Mon Nov 10, 2014 7:12 am

Did you try it on 3 nodes or 5 nodes?

Are you, or your server admin, able to monitor the server resources while the job runs and identify any bottleneck?

rjdickson · Post by **rjdickson** » Mon Nov 10, 2014 6:16 pm

Have you tried partitioning (hash and sorting) your input on ZIP (if present) or State/City (if ZIP is not present)?

betterthanever · Post by **betterthanever** » Thu Dec 11, 2014 12:36 pm

Coming back here to update:

we had a call with IBM and they mentioned that CASS scales with number of CPUs and very CPU intensive, so we went back and tried with a 8 cpu config file and there was no difference when compared with 4 cpu. The reason is that with 4Node config file(our hardware is 4 cpu in this case), when we issue NMON the cpus are already clocking close to 95-97% and so CASS wasn't scaling after 4 Node config file.

To verify the scalability, we ran CASS in an environment with 24 CPU and ran with 4,8 and up 16 Node config files and we saw CASS scaling with increase in number of Nodes until the CPUs max out. we saw upto 7k/sec with 16 cpu.

I hope this helps. btw, in IBM lab testing they mentioned that expecting 200rows/sec per cpu is the average benchmark. It may vary with each individual environment.

@rjdickson, what difference does it make if we do hash and sort on ZIP? i am curious. It hasn't made any difference though.

qt_ky · Post by **qt_ky** » Fri Dec 12, 2014 6:14 am

Thanks for the update! Please take the 200 rows/sec average with a grain of salt. For comparison:

On 4 CPUs, IBM Power7 hardware likely dated 2011 or 2012, with a 4 node config file, we get nearly 300 rows/sec on a bad day, and 2600 rows/sec on a good day (over 10x the "average"). This is using the same input file to compare against a baseline. We repeat this same test at least every time the reference files get updated, month after month, year after year.

Our performance variation is consistently inconsistent, meaning it is either fast or slow, with nothing ever in between. I have not spent any time yet to dive in and examine the job or the server. My burning question is why the extreme variation?

I would of course prefer to get the faster performance every time. I can imagine maybe SAN disk caching making the difference but it would some time and effort to prove that one way or another.

DSXchange