Thursday, May 4, 2023

File Systems and Archives: An Anecdotal Usability Note

In my day job as a database administrator, objects like tables are stored in extents (blocks of contiguous bytes), typically grouped with those of related objects in tablespace datafiles. If and when an object needs to grow to accommodate more records it tries to find available space in already allocated resources; if not, it tries to find big enough free spaces in preallocated existing datafiles. Traditionally we had to resize or add datafiles to accommodate database growth. Oracle made things more user-friendly for DBA's by allowing existing datafiles to autoextend on storage devices subject to available capacity/quota constraints, subject to certain design constraints, or add new datafiles across storage devices. By design constraints, I'm referring to classic structures like up to 4 million or so datafile blocks (lowest unit of bytes), with a default 8K block size. This led to de facto maximum 32 GB datafiles, even if there was additional storage capacity on relevant devices. So typically to avoid a showstopper failure of an object not being able to grow, I might define new extendible datafiles within or across accessible storage devices. I still needed to monitor storage utilization and fragmentation of free spaces in tablespace datafiles, but less "busy work" of manually maintaining datafiles than in earlier times, not to mention reorganizing datafiles across devices.

There are multiple approaches to securing sensitive files; for example, you can password-protect Word documents and pdf files.  There are system volume tools like Bitlocker. And then there are virtual encryption volume software tools like VeraCrypt and SafeHouse Explorer. Now there are a couple of analogous concepts at play, the volume, roughly comparable to concept of a tablespace datafile, and file system support for very large files vs. the 32 GB limit. In the case of the former, you create a volume or logical disk of a specified size, e.g., 10G, which you can mount on a drive letter by supplying a predefined password. In the latter case you may need a file system like NTFS if you're trying to store very big (>4 GB) on the volume. Some media, e.g., flash drives, may by default use a legacy file system NOT supporting very large files. Usually, I try to store redundant backup and/or volumes. I noticed VeraCrypt explicitly during volume creation asked if I needed very large file support .SafeHouse Explorer didn't.

So, here's the setup for my usability incident. I use a very nice freeware product, MailStore Home as my email archive. My principal email client is Thunderbird which uses an MBOX format for email folders. There are Thunderbird add-ons which enable import and export MBOX files. Mailstore has an archiving solution which is proprietary but can archive newer emails.  Usually, I'll do a daily archival into Mailstore and manually consolidate them into categories monthly. Mailstore also allows me to export/restore email folders back to email clients (including Outlook and its similar PST file construct).

So, I recently exported some of my consolidated folders for dedupping and trimming, saving the results in MBOX format. It turned out, and I didn't notice, that 2 of those files were over 6 GB .

I created a 40 GB VeraCrypt volume, mounted it , and copied my relevant MBOX and PST files to it without incident. I created a slightly larger SafeHouse volume. So, I was puzzled why Windows argued  my finance file was "too big" for my virtual SafeHouse  volume; I had more than enough slack space on the volume but the file was "too big" for the file system; I then checked the file systems under the two volumes; indeed the SafeHouse volume was using a legacy MS file system without very large file. Support.

There are some easy workarounds. For example, I could use the Thunderbird export/import add-on to dump folder eml files and run a simple bash script assigning even and odd numbered emails to separate directories and then importing the directories and exporting them as separate MBOX files.

But is there a way to change the volume to a different, nondestructive, accommodative file system? Yup.

C:\Windows\system32>convert J: /fs:ntfs
The type of the file system is FAT32.
Enter current volume label for drive J: MBOX23A
Volume MBOX23A created 5/3/2023 6:23 PM
Volume Serial Number is 2CCC-EAD4
Windows is verifying files and folders...
File and folder verification is complete.
Windows has scanned the file system and found no problems.
No further action is required.
   52,415,984 KB total disk space.
   29,836,240 KB in 66 files.
   22,579,728 KB are available.
       16,384 bytes in each allocation unit.
    3,275,999 total allocation units on disk.
    1,411,233 allocation units available on disk.
Determining disk space required for file system conversion...
Total disk space:              52428800 KB
Free space on volume:          22579728 KB
Space required for conversion:   131576 KB
Converting file system
Conversion complete

I was then able to copy over my two 6 GB MBOX files without issue;