It all depends on the context. In the context of capacity, the first number is the size of each chip (typically measured in megabits or megabytes) and the second number is the number of chips in the array. In the context of transmission, the first number is the size of the channel in bits, x86 processors have a native memory channel width of 64 bits. The second number is the number of channels, what this is relative to could vary. It could be the number of channels per controller, total number of channels per processor, etc...
All in all, 64x4 vs 128x2 vs 256x1 is entirely meaningless without context. In the context of transmission 64x4 is preferable because data can be interleaved so that individual bytes are read faster, yet the total transmission capacity remains unchanged. Lets say that we have a controller which transfers 1 bit per clock cycle at a clock rate of 1mhz. We take 8 of these controllers and put them in an array so that we have a channel width of 8 bits operating at a 1 million transfers per channel bit per second. The total capacity is now 8megabits per second, or 1 megabyte per second. Now we can examine two different methods of how we transfer data. We can transfer data on each bit of the link serially such that we read the first bit of byte one from channel bit one on clock cycle one and the first bit of byte two from channel bit two on clock cycle one, etc... until we get bit one of byte eight from channel bit eight on clock one. We then repeat the process until we have all eight bits of all 8 bytes; in total this process takes eight clock cycles and at an operating frequency of 1Mhz this means that we had to wait eight microseconds before our transfer was completed and then all eight bytes were received simultaneously.
This is functionally similar to how a lineup works. If a family of 8 all wait in the same lineup, they have to wait until the first person gets served, and then those people are all stuck waiting until the last person is finished being served as well.
Another way of transmitting data involves interleaving. Interleaving is the process of transposing our data so that instead of sending it serially on the same link like we did before, we send it serially on individual links. So rather than getting bit one of byte one from link one and bit one of byte two from link two, we get bit one of byte one from link one and bit two of byte one from link two, and so on. This means that we can get our first byte after only one microsecond, and another byte one microsecond later. So rather than having our device sit idle and wait for 8 microseconds in order to get a whole bunch of data at once it waits for only one microsecond and can start working immediately.
This is the method used most commonly now and when you see PCI-E x1/4/8/16 this is referring to the interleaved channel width. Each PCI-E link is a serial connection, but the data can be interleaved across multiple serial connections in order to form what is functionally a parallel connection. The same deal occurs with memory channels. Each memory channel is 64 bits wide and reads from 8 bit registers such that each channel returns 8 bytes. When a computer has multiple memory channels, the memory controller will do its best to ensure that memory is distributed across all those available. So rather than having to wait 2 cycles to obtain 128 bits of data from a single channel, it will be able to obtain 128 bits from 64x2 channels in a single cycle.
Hope this helped a bit.