Archived from groups: alt.comp.periphs.videocards.ati,alt.comp.periphs.videocards.nvidia,alt.lang.asm,comp.soft-sys.matlab,hr.comp.hardver (
More info?)
"Herman Dullink" <hd7@hetnet.nl> wrote in message
news:d2s7bm$og1$1@reader13.wxs.nl...
>>> That's a lot of data.
>> It is exactly 1GB = (2^(9+9+11+1)) bytes. It is very affordable today
> Yes, for system memory. But I haven't seen many consumer class graphics
> adapters yetvwith at least this size of memory. A graphics adapter also
> needs some on-screen memory for the GUI, and some off-screen buffers for
> the
> result view(s).
>
>>> Are you sure you want to do this on a video card?
>> No, I'm not sure, and this is the reason I'm looking for the
>> opinion of people who has an expertise in programming of video card to
>> do a custom job.
> I have some expertise, but not in 3D (yet).
> So I can't really help you with the implementation details using modern 3D
> GPUs, but I know a bit about busses, the data channels in a system. The
> main
> problem with most architectures is that it performs best when you 'push'
> the
> data through a channel (e.g from CPU to graphics adapter). Pulling data is
> very bad for performance, the CPU has to wait many cycles for one fetch to
> complete. A cache helps (and only helps) if certain data is fetched
> multiple
> times, and as long no more data is used than the cache size (ie. it's very
> effective with 'looping' algorithms).
> DMA techniques are used for some better performace; a device is then
> programmed to push the data through a channel without further CPU
> intervention.
> If you use a large sequential stream of data, prefetching can be used. You
> probably know about that, MMX/SSE/3Dnow have some prefetch instructions.
>
> Maybe somebody else can give you some info about 3D specifics. You might
> even contact the manufacturers of GPUs. Theoretically, the newest
> generation
> of 3D GPUs should be able to address more than a GB of data. These GPUs
> are
> programmable, so it should be possible to implement the whole algorithm in
> GPU code.
> All that's needed is (somebody with) the right (programming)
> information...
> Because of competition of these manufacturers, it'll very hard to get
> detailed info. Maybe someone working there can see the challenge
>
> Another approach is to look at the implementation of your algorithm.
> Rewrite
> (parts of) it so that memory cycles and caches are used optimal.
> You might e.g. split up the volume in to smaller subvolumes, and/or use
> tile-based rendering, ie split the screen up in smaller rectangular (or
> square) parts, so that chances of data still in cache is higher.
There is one big problem with most GPUs, and that CRC, or to say, the lack
of it. AGP doesn't have hardware CRC, so in order to keep the data safe
drivers have huge tables (that's a big part of the 20+ MB you have to
download every time a new driver is out) that check the respose to every
command given to the GPU. That's why it's very hard to make programs that
would use GPUs huge power. I tried adding two numbers using ATi SDK and I
can tell you it's hard work. However PCI Express (I'm 99% positive of this)
has hardware CRC so it may be easier to make a program for PCI Express GPU.
The problem Herman mentioned (getting the info from the GPU's memory) is not
as big with PCI Express since upload and download are pretty much same
speed. Then there are TurboCache models that use system memory and only have
16MB of memory onboard. Those models are not as powerful as the top models,
but they show that there might be a way to use system memory for GPU data.
Hope this information helps you with your project
Greetz