The question for Cluster size on SolidState Disks and Memory Cards, pendrives, etc... is not just as simple as use the less size allowed.
A lot of SSD, Pendrives and memory cards has a physical block size when writting to them, so the best choice is to figure out what that value is (no way to consult it thill what i know).
The efect is the next:
-If Operating System writes a block of 512Bytes and the pendrive really writes blocks of 4096 bytes, it will make that 8 blocks of 512 bytes each that are written by the operating system will overwrite the same 4096 bytes 8 times because each time the pendrive hardware will overwrite 4096 bytes on each write operation
That is just because the SolidState memories do writes by blocks, but such block size is not the same than the one used by the Operating system when formatting a parttition, etc..
The best is to make the OS to use the same block size for formatting than the real physical block size that will use the hardware.
That is not allways possible... i found some pendrives that each write they do is a 1MB (yes 1 megabyte)... if you just write 512Bytes... it will read 1megabyte to on a small internal ram, then will overwrite that 512 bytes, then the whole megabyte will be written to solid memory... so any time the operating system writes a 512 bytes block, the pendrive is overwritting a whole megabyte... in that case the best for performance would be to format with a 1megabyte block, but since i know no actual FAT32 / NTFS can do that... would need UDF-HDD type one, and that is not a Standard for Windows nor for Linux... i am still searching drivers for Linux that can mount UDF-HDD parttitions (i have one SetTopBox Media Player that records live tdt on such parttition type).
So the best to do is apply a trick like this (it takes a lot of time, but just once):
-Format it as NTFS (no compression) with block size 64KB and do speed tests
-Format it as NTFS (no compression) with block size 32KB and do speed tests
-Format it as NTFS (no compression) with block size 16KB and do speed tests
-etc
The speed tests for each clusters size must be the next ones (all on each cluster size):
-Write on blocks of 1KB, 2KB, 4KB, ... 64KB, 128KB,... 1MB
Now you will have a table with two axis, one the Cluster size you formatted it, the other de block size for write operations... each cell will have the speed
Try to Look for the best speed... and lucky if you find it on same block size written than cluster size (i did not yet find any pendrive so fine).
Normally you will see that 16KB cluster and 1MB write blocks will get the best speed for pendrives >4GB and <16GB (i do not have any one of 32GB or bigger to test), but for lower than 4GB (what i found by my self) is that normally the best performance is with a Cluster size of 2KB or 8KB.
Surprise, the "default" 4KB cluster size that proposes Windows by default normally gives the worst speed results.
Please have in mind this:
-The solid memory (pendrive, SSD, etc) devices do not write the bytes that Operating system ask for... they allways writes by blocks and the size of that blocks is hard-coded on it by the manufacturer, has nothing to do with cluster size.
So the best trick is to make the cluster size equal to such block hard coded size and if not possible, make it the highest as possible but allways in a way that the block hard coded value is a multiply of the cluster size.
Hope it is understand.
Normally cluster size of 512Bytes is the best for non-solid stage hard disk (magnetic ones), for optical media 2048 is the most common used as best... but for solid-stage (pendrives, memory cards, etc...) there is no direct answer...
Some pendrives have a hard-coded value of 16384, others i found 8192... as i told the value can not be readed, but performance when you hit the value can be more than double speed...
I mean:
-Pendrive A, Cluster size 4096 (hard coded 8192), each write of 4KB will cause 8KB to be overwritten, so two contiguous clusters (4KB each) will overwrite twice the same 8KB
-Pendrive A, Cluster size 8192 (hard coded 8192), each write of 8KB will cause 8KB to be overwritten, so one cluster (8KB each) will overwrite only once the same 8KB
-If on it you use 4KB cluster, speed will be (more or less) the half than if you use 8Kb clusters
Beware that on such example if you use 16KB cluster... each writen cluster will make two blocks to be overwritten... so for small files it will be slower then using 8KB cluster.
Think as this and you may hit the best performance at first attempt:
-Format it with 32KB cluster and test speed by copying one big file
-Format it with 32KB cluster and test speed by copying a lot of small files
You will see the speed between that two test gives a very different speed... that is not related to cluster nor to hard coded block size... when writting a lot of files the table with the list of files (FAT, MFT, etc...) is writted once per file and so the speed is lower, very lower... no write cache is been used.
Just because of that i told to test it by writting clusters, not files... to ensure that file table listing (directory, etc) has nothing to do on the proccess.
Once you had the best cluster size for that Pendrive, write it down onto the pendrive with care not to damage it... so the next time you have it!!!
Doing such test i could make a Pendrive that was NTFS formatted with default values be much more faster... with normal 4KB cluster it was 1.5MB/s when writting to it a big file... after reformatting it with 32KB cluster it was 5.8MB/s... and when formatting it with a 16KB cluster it was 5.9MB/s... and when done with 8KB cluster size it was 3MB/s, that gives me the clue... the hard coded block value is 16K (8K was twice as slow and 32K was quite similar speed)...
The test can be done (normally) using a divide and win algoritm...
Start testing the lowest cluster size allowed... then the highest... then just the value in the middle... then stay where speeds are different.
The idea is to make the cluster as small as possible but never lower than hard-coded value, if not possible because hard-coded a bigger value than the top most allowed for cluster size, just try values and find the best.
To reduce the test to miimun... first try with 64KB, then with 2KB... if they give the same speed, test the 32KB ... if 32 and 64 are near the same... just use 32KB... if 2KB gives very different speed than 64KB, test with other values till you find the lowest possible while at highest speed.
Yes, it is so complicated to find the top most speed for a Pendrive.
If write implementaion inside hardware was linear (same bytes asked to be writted as waht is really done), then the less cluster size would give the bert performance... but since internal hardware makes all writes operations of the same hard-coded size, it is very difficoult to get it to most performance.
With a microSDHC i found something very interesting... when cluster size was 512Bytes (NTFS) it writes at 0.3MB/s, when cluster size is 32KB or 64KB it writes at 20MB/s (64 times faster)... that is a huge difference!!!
Test if you want the top most performance are needed to be done.
The best answer... test each possible and make performance tests... then select the best one!!
Ah! if you want the fastest MemoryCards, etc... search for ClassType (Write warranted stable speed as less as Class type value in MB/s)... for example a Class 4 will write at no less than 4MB/s, Class 6 -> 6MB/s, etc... so far i have seen Class 8 on shows... Class 12 on Internet (i did not have any of Class 12 nor 16 in my hands, but some microSDHC i have ar Class8 and with cluster fitting test i can write on some of them at 20MB/s)... so do the tests!!!
Hope this can help others.