Most of the time, the values for ‘Size’ and ‘Size on disk’ will be very close to matching when checking a folder or file’s size, but what if there is a huge discrepancy between the two? Today’s SuperUser Q&A post looks at the answer to this confusing problem.
Today’s Question & Answer session comes to us courtesy of SuperUser—a subdivision of Stack Exchange, a community-driven grouping of Q&A web sites.
The Question
SuperUser reader thelastblack wants to know why there is such a huge difference between ‘Size’ and ‘Size on disk’ for a folder on his phone’s SD card:
Looking at the screenshot, there is definitely a huge discrepancy between ‘Size’ and ‘Size on disk’, so what has happened here to cause this?
I know that ‘Size on disk’ should be a little more than ‘Size’ because of allocation units in Windows, but why is there that much difference? Could it be because of the large number of files?
BTW, this folder is on my Android phone’s SD card. Inside this, my maps app stores its cached maps, and the app gets its maps from Google Maps.
The Answer
SuperUser contributor Bob has the answer for us:
Have something to add to the explanation? Sound off in the comments. Want to read more answers from other tech-savvy Stack Exchange users? Check out the full discussion thread here.
If you have a lot of small files, this is certainly possible. Consider this:
50,000 files 32 KB cluster size (allocation units), which is the max for FAT32
Ok, now the minimum space taken is 50,000 * 32,000 = 1.6 GB (using SI prefixes, not binary, to simplify the maths). The space each file takes on the disk is always a multiple of the allocation unit size – and here we’re assuming each file is actually small enough to fit within a single unit, with some (wasted) space left over.
If each file averaged 2 KB, you’d get about 100 MB total – but you’re also wasting 15x that (30 KB per file) on average due to the allocation unit size.
In-Depth Explanation
Why does this happen? Well, the FAT32 file system needs to keep track of where each file is stored. If it were to keep a list of every single byte, the table (like an address book) would grow at the same speed as the data – and waste a lot of space. So what they do is use “allocation units”, also known as the “cluster size”. The volume is divided into these allocation units, and as far as the file system is concerned, they cannot be subdivided – those are the smallest blocks it can address. Much like you have a house number, but your postman doesn’t care how many bedrooms you have or who lives in them.
So what happens if you have a very small file? Well, the file system doesn’t care if the file is 0 KB, 2 KB, or even 15 KB, it’ll give it the least space it can – in the example above, that’s 32 KB. Your file is only using a small amount of this space, and the rest is basically wasted, but still belongs to the file – much like a bedroom you leave unoccupied.
Why are there different allocation unit sizes? Well, it becomes a trade-off between having a bigger table (address book, e.g. saying John owns a house at 123 Fake Street, 124 Fake Street, 666 Satan Lane, etc.), or more wasted space in each unit (house). If you have larger files, it makes more sense to use larger allocation units – because a file doesn’t get a new unit (house) until all others are filled up. If you have lots of small files, well, you’re going to have a big table (address book) anyway, so may as well give them small units (houses).
Large allocation units, as a general rule, will waste a lot of space if you have lots of small files. There usually isn’t a good reason to go above 4 KB for general use.
Fragmentation?
As for fragmentation, fragmentation shouldn’t waste space in this manner. Large files may be fragmented, i.e. split up, into multiple allocation units, but each unit should be filled before the next one is started. Defragging might save a little space in the allocation tables, but this isn’t your specific issue.
Possible Solutions
As gladiator2345 suggested, your only real options at this point are to live with it or reformat with smaller allocation units.
Your card might be formatted in FAT16, which has a smaller limit on table size and therefore requires much larger allocation units in order to address a larger volume (with an upper limit of 2 GB with 32 KB allocation units). Source courtesy of Braiam. If that is the case, you should be able to safely format as FAT32 anyway.