Sign in with
Sign up | Sign in
Your question

Hyperthreading Optimization

Last response: in CPUs
Share
April 25, 2004 12:03:46 PM

My system:
ABIT IC7-G 533/800 FSB
P4 3.2G HT 512K 800FSB
2GB DDR400 PC3200
1-80GB Maxtor SATA 8MB
1-160GB Maxtor SATA 8MB
Win 2K Pro SP4

I'm trying to coax some more performance out of my system, and was wondering if anyone here had any suggestions about how to improve the HT, or actually just the performance in general of my system. I think I've done most of the obvious things to optimize HT, but I've got a job that takes over 30 hours to run on my system, and I'm trying to speed it up. Basically, the job analyzes an input file of over 70 million records and calculates statistics to find correlations between the records. It gets CPU bound a lot. Any thoughts? Do you think changing to XP would make much difference? Thanks.......
April 25, 2004 2:05:51 PM

Run two jobs. That's what Hyper-Threading is for.
April 25, 2004 3:24:04 PM

There's one application program processing one input file. I want it to run quicker. Unless you're joking, or I'm misunderstanding how HT works, I don't see any way to run it as two jobs. It is odd that when it is crunching numbers though, that the task manager cpu utilization is usually right at 50%?
Related resources
April 25, 2004 3:37:16 PM

Unless there's something I'm missing, you have NO support for Hyper-Threading in Windows2K: From INTEL website:

The following desktop operating systems are not recommended for use with Hyper-Threading Technology. If you are using one of the following desktop operating systems, it is advised that you should disable Hyper-Threading Technology in the system BIOS Setup program:

Microsoft Windows 2000 (all versions)
Microsoft Windows NT* 4.0
Microsoft Windows Me
Microsoft Windows 98
Microsoft Windows 98 SE


"I am become death, the destroyer of worlds. Now, let's eat!
April 25, 2004 4:05:48 PM

Basically only Win XP offers HT support.

Win 2K will treat the CPU as two physical CPUs and that may reduce performance.

But is the analyzing program optimized for HT? If not then just disable it.
April 25, 2004 6:05:13 PM

Also, it wouldn't hurt to be sure that that program was compiled with an Intel compiler. If you have the source, you probably know that... Just a thought...

<i><font color=red>You never change the existing reality by fighting it. Instead, create a new model that makes the old one obsolete</font color=red> - Buckminster Fuller </i>
April 26, 2004 1:37:28 AM

cogito, is this vendor-supplied software or something developed in-house? If it's vendor-supplied - what company/program are we talking about?

If it's developed in-house, I could imagine spinning off multiple threads to process "sections" of the file. But then I/O might become an issue if the entire file can't be loaded into memory.

And can you try it on a machine with XP? Win2k doesn't properly support HT, because it didn't exist when 2k was written.
April 26, 2004 8:15:50 AM

I don't want to shatter any illusions, but 2000 does support HT. Here is a comparison test of HT running on 2000 and XP.
http://babelfish.altavista.com/babelfish/urltrurl?lp=de...

The application is a program called called the apr-drg grouper, and it's from 3M. I don't have source, so that's not an option. And splitting up the input file for parallel processing and merging is not possible.

I was thinking about upgrading to XP Pro, basically because it might help, but shouldn't hurt. But the other thing was getting another 160GB SATA, putting the 2-160's into a RAID0, and leaving Windows on the 80GB HDD. But I've never done RAID before. My MOBO is supposed to support it, but I don't know if it would work with 3 HDDs, as I mentioned. I don't know if this is a good idea, bad idea, or not even possible. I don't want to end up creating more problems than I solve!:)  Any thoughts on this?
April 26, 2004 10:01:30 AM

If you don't have the source, I don't think there is a whole lot you can do if indeed the app is cpu bound. But by reading your description, I really have a hard time believing it could be that cpu bound, unless either those calculations are a LOT more complex than what I'd think, or more likely, the program is poorly written.

I would WAG you are more I/O bound than anything else, faster disks might, more memory might help, but its not impossible or even unlikely you are really hitting the 32 bit VM addressing wall, and that is what is holding you back. How much memory does windows allocate while running that job ? It might also be interesting to see how your setup performs with much less records, if possible test it with like 10k records, 20k, 100k, see how it scales. I would guess it scales *terribly* with record size meaning, the cpu is not really the bottleneck.

= The views stated herein are my personal views, and not necessarily the views of my wife. =
April 26, 2004 12:53:00 PM

We're talking about a complex analysis of a lot of attributes for records of hospital stays. The program runs complex algorithms analyzing diagnoses, procedures, patient demographics etc., and quantifying patient expectations estimating likelihood of death during stay and for periods after stay, as well as estimating how long they should need to stay in hospital for each visit. It does thousands of regressions on the data with 50+ variables in each regression.

I can understand your reasons for suspecting disk i/o bottlenecks, but I don't think so. I can tell by watching task manager and the disk light, and from what I know about what the app does. There were about a trillion bytes i/o during run but all were basically to system cache in memory. Page faulting wasn’t happening. BTW, memory usage peaks around 500MB.
April 26, 2004 1:42:57 PM

You could possibly overclock your CPU, but I suspect the greatest speed increase could come from the original code being re-written/tidied up & re-compiled. obviously you can't do this though.

Have you tried contacting the software supplier and asking if there's anything they can do, or any configuration stuff you can do which would help?

You'd be amazed how inefficient some programs can be. What you describe sounds like a fair amount of data, but I don't think it should take 30+ hours on a 3.2Ghz P4 system.

---
Epox 8RDA+ rev1.1 w/ Custom NB HS
XP1700+ @205x11 (~2.26Ghz), 1.575Vcore
2x256Mb Corsair PC3200LL 2-2-2-4
Sapphire 9800Pro 420/744
April 26, 2004 1:56:52 PM

taskmanager is a very poor indication to determine what is bottlenecking you; even if the cpu is stalled waiting for data from the disk or memory, taskmanager may well indicate 100% cpu usage, even if its not really doing much if anything. Using PerfMon (have to enable the service to make good use of it), its already a bit easier to analyse.

also what you descibe sounds complex, but without knowing what sort of algorithms are used, its still perfectly possible the cpu really only has to do some basic math, and build some huge arrays but the number of actual cpu instructions executed per record could be rather low. How big is the dataset, and how much memory does the OS allocate while the app is running ? Like I said, try running it with a small subset, and evaluate the scaling. If the cpu is the bottleneck, you should get a similar throughput (processed records/second) even with a much smaller dataset. IF that is the case, you're in trouble :) 

Anyway, to come back to your original question; there is precious little you can do. Its obvious the app isnt multithreaded, so HT is not going to much good (besides perhaps allowing you run other apps simultaneously without a big performance hit). if upgrading the hardware is an option (ie you have a budget), you would really have to determine the real bottlenecks. I still doubt its raw (I/O independant) processing power. But if it is, you will not have a lot of options. If the app can't make use of HT, it won't make use of SMP either. One thing that could help (especially when virtual memory address space is bottlenecking you), is splitting the app over two machines; one running the DB, the other doing the processing. If that is not possible, it could be worth trying it on a K8. If random memory access latency is killing you (like loading the record from memory), ODMC could give a huge performance benefit. If its memory>->cpu bandwith, a K8 won't bring any benefit in a single cpu machine, and you probably already have one of the fastest platforms for your application.

A last semi-random thought.. why not ask your ISV ? After all, they wrote the app, they should know what is bottlenecking it most under such workloads. While you're asking, ask them if they don't support a 2 tiered setup (Db and app server), SMP or multithreading.

= The views stated herein are my personal views, and not necessarily the views of my wife. =
April 26, 2004 2:07:16 PM

>You could possibly overclock your CPU,

Overclockin ? You're nuts really, suggesting this for such an application. What's next, overclocking your banks mainframe to process your money transfers 5% quicker ? As it is, I already hope they are using a real server, and not a glorified desktop with a cheap MB and without even registered, ECC memory. 30 hours worth of processing, 70 millon records, you really, REALLY don't want bit flips in there. How are you gonna catch them ?

Nurse: "I'm sorry m'am, but you'll have to leave your room for another patient, the computer states you officially died yesterday. We have reserved a table for you in the mortuary.".

or

"We're sorried mister, we can not allow you for the operation of your broken leg, as the computer has calculated you will have a 99.9% chance of not surviving the operation."

Seriously, only THGC members would even think of suggesting overclocking for this sort of apps. Anyone else would consider running the batch twice to validate the correctness of the results :) 

= The views stated herein are my personal views, and not necessarily the views of my wife. =
April 26, 2004 2:35:59 PM

Sounds more to me like it's simply compiling Statistics - nothing that will actually affect a patient directly..

Of course I wouldn't recommend overclocking where there was any danger for anyone in it :smile:

I wasn't 100% serious anyway - it's just that he's already got the fastest P4 CPU (within reason - I wouldn't go recommending an EE to anyone), on an excellent, fast motherboard with plenty of RAM, and he said it's processor intensive, so what more could be done?

70 million records isn't really <i>that</i> much. 30 hours seems a very long time, although obviously we don't know exactly how much crunching each record requires. That's why I said contact the guys who wrote it and complain :smile:

---
Epox 8RDA+ rev1.1 w/ Custom NB HS
XP1700+ @205x11 (~2.26Ghz), 1.575Vcore
2x256Mb Corsair PC3200LL 2-2-2-4
Sapphire 9800Pro 420/744
April 26, 2004 3:01:14 PM

>70 million records isn't really that much. 30 hours seems a
>very long time, although obviously we don't know exactly
>how much crunching each record requires. That's why I said
>contact the guys who wrote it and complain

I completely agree here. That is also why I said I doubt its the cpu itselve that is holding him back. I'm fairly certain a slower (or faster) cpu (equal bandwith, identical harddisk and memory setup) would not meaningfully impact processing time.

I suspect its a lousy coding job, maybe something never created to handle such large datasets, so it may for instance waste tons of VM space, which would not hurt with a couple of hundred thousand records, but is killing his performance now. intelligent coding can speed up such apps by an order of magnitude (literally 10 to 100x), compared to that, upgrading your hardware seems like such a waste.

= The views stated herein are my personal views, and not necessarily the views of my wife. =
April 26, 2004 3:17:21 PM

Quote:
I suspect its a lousy coding job, maybe something never created to handle such large datasets

My thoughts exactly. Having had to deal with precisely this kind of problem myself a number of times. People write stuff, test it with 500 records and it seems fine so they don't bother trying to make it as fast as possible. It's been the bane of my life for years :frown: . although it is extremely satisfying to watch <i>your</i> code do the task in a 1/10th the time the old stuff did :smile:

---
Epox 8RDA+ rev1.1 w/ Custom NB HS
XP1700+ @205x11 (~2.26Ghz), 1.575Vcore
2x256Mb Corsair PC3200LL 2-2-2-4
Sapphire 9800Pro 420/744
!