By now, you've probably all heard about my universal file format project that encodes binary data into ASCII symbols. At this point in time, my project is beginning to mature from simply encoding messages to encoding entire files (multiple files in fact) into an encoded archive. The clearest advantage to this format is to be able to transfer this data to paper. However I'm beginning to get a new idea for this format. Up here on Tom's Hardware, we can't post pics within the forum (we can only paste links). On MSN, AIM and ICQ, sometimes it's impossible to get a file or send a file to someone with a router. Also, some people reject e-mail with attachments. What if you could simply post these symbols or send these symbols in an IM or email and then copy and paste them in my Archive utility, where it will reconstitute the data and let you manipulate it and extract it as you would a zip archive. Best of all, my format is far more reliable than Zip but just as secure if not more.
<b>Note:</b> Unlike Zip, UFF's main objective is versatility not maximum compression.
Intelligence is not merely the wealth of knowledge but the sum of perception, wisdom, and knowledge.
Also, unlike attachments, this format is compatible with pretty much any PC online. The only disadvantage is that you'd need a reader for every OS and I don't have much experience programming for Linux or Mac but I can always tell others the structure of the file format and have others write the multiplatform readers. I think this might make a good open source project....hmmm....
Intelligence is not merely the wealth of knowledge but the sum of perception, wisdom, and knowledge.
Isn't that how newsgroup binaries work?
The Yenc reader i got says to cut'n'paste the message into it and out pops a pic.
Yea, I suppose so. It's a similar idea but its file structure is in the form of an archive like RAR and ZIP files. However, the actual file is neither RAR nor ZIP it's a proprietary format that I made that includes ECC, encryption and compression.
Intelligence is not merely the wealth of knowledge but the sum of perception, wisdom, and knowledge.
cool...can't wait...not all of us have our own web page or ftp sites.
Yup, that was my thinking exactly. It should allow for the transfer of files over the forum in a much easier fashion. You will need the .NET Framework from Windows Update if you don't have it yet though.
Intelligence is not merely the wealth of knowledge but the sum of perception, wisdom, and knowledge.
Well, I've just finished support for augmenting the file...that is adding more files to the archive without touching the data currently in the archive. I know this may sound like an easy task but it was less than completely straightforward. A good archive utility should be able to augment archive with very little disk access. Believe it or not, when augmented the file is not erased and then rewritten. It is simply extended with extra data. It then adds the file information at the end of the file (a file trailer, if you will). So some file information may be stored in a file header while augmented information will be stored at the end of the file. This also allows the program to know when you don't have an "original" archive, that is an archive that was modified.
I still have to finish file removal from the archives. Again, it'll probably do very little disk activity. Simply resize the archive, and convert the file info in the header or trailer to all 0s (deleting anything in the file header will cause a lot of disk activity).
Intelligence is not merely the wealth of knowledge but the sum of perception, wisdom, and knowledge.
Well, work on WinUFF (that's what I decided to call it) is well under way. All the basic functionality is complete. You can create archives, modify archives by adding and deleting files within them, load archives to view the files within them, and extract files from within the archives. The program associates itself with .UFF files.
The current limitations of the file format are as follows:
Maximum file header + trailer size = 1MB
Maximum file size (that can be added to the archive)= 2GB
Maximum archive size = 2^23 GB
Maximum number of files within archive = over 2 billion (2^31 to be exact) files
As you can see, the most major limitation here is the size of the file header + trailer so that limiation will be reached far before the others. This limitation exists because I'm using 2 unsigned 16-bit integers, one to represent the length of the header and one to represent the length of the trailer. However, because each byte only represents a single bit, then a 16-bit int allows for a file header of the size of 2^16 * 8. The same applies for the trailer. However, I can easily raise that limit to 2^32 * 8.
The following features I'm in the process of finishing:
Redo/Undo action
Test/Repair Archive
Adding Comments
Renaming files within archive
Adding complete folders to archive
Password protection
Spanning and reconstitution
Finally, the user interface is nearly identical to Winzip.
EDIT: I forgot to mention that WinUFF automatically creates a backup of the archive if you delete files from the archive.
Intelligence is not merely the wealth of knowledge but the sum of perception, wisdom, and knowledge.
Let us know if you need any beta testers or mirrors. I will gladly beta test for you.
As for mirroring, well my upload bandwidth is only 128K (600K down), so it's hardly fit for the job. But I'm sure we could find some spare bandwidth somewhere.
When you release the final version of this program, do you propose to make it freeware, shareware, adware or trialware?
Try and rid the need for installing .NET if possible. I don't like the thought of downloading an approx 20mb Windows Update if I can help it.
<font color=orange><b> If you are honest because honesty is the best policy, then surely your honesty is corrupt </b></font color=orange><P ID="edit"><FONT SIZE=-1><EM>Edited by basmic on 07/20/03 00:57 AM.</EM></FONT></P>
Try and rid the need for installing .NET if possible. I don't like the thought of downloading an approx 20mb Windows Update if I can help it.
Err, I can't. I've written the entire thing in Visual Studio .NET platform. I think installing the .NET Framework will soon be a must once all the developers move over to .NET. I recommend downloading it.
Intelligence is not merely the wealth of knowledge but the sum of perception, wisdom, and knowledge.
Ok, there are now two forms of UFF. Hexadecimal UFF and Binary UFF. Hexadecimal UFF's are smaller in size (exactly 1/4 the size of Binary UFF), but binary is better for print. Performance wise, both are about the same.
Speaking of which, performance optimization is a huge issue here. I have to find the right balance between performance and memory usage which is very difficult. I can get it to encode a 4MB file in 3 or 4 sec, and extract that same file in 40sec. In my opinion that's not acceptable so I'll continue optimizing until I can cut those times in half.
Intelligence is not merely the wealth of knowledge but the sum of perception, wisdom, and knowledge.
Well, I've just finished writing a semi-successful OCR algorithm for the Binary UFF file format so you can save the UFF archives to paper and then scan them in again and convert them back to working files. Still needs lots of refining though. My goal is to get a 100% accurate OCR on an inkjet printer. Considering it only has to recognize one of two characters, it's not as impossible as it may seem but it still needs a bit of work. I'm not using an artificial neural net for those AI fanatics out there. Just plain simple pattern recognition.
EDIT: I don't think it's possible to get an accurate reading with under 300dpi
Intelligence is not merely the wealth of knowledge but the sum of perception, wisdom, and knowledge.
Well, I can scan in UFF encoded archives and OCR them but dust is a nightmare. My stupid OCR algorithm misdetects a small bit of dust on the glass bed of the scanner as a symbol. It's ANNOYING! I have no idea how to write a subroutine to remove dust pixels. The best I can do is only accept the very dark black colours. That should hopefully remove 99% of dust particles in scans but I can't be sure.
Intelligence is not merely the wealth of knowledge but the sum of perception, wisdom, and knowledge.
Well, I'm 99% done with the OCR subroutines. It was a challenge, but it can now automatically remove suspect dust particles in the scan and fill in small gaps if there are any in a single character. It now also has an EXTREMELY accurate OCR engine no matter how small or how big the characters are provided there isn't too much distortion or dust. I'm confident it can handle well over 900bytes per square inch at 600 dpi. At 1200dpi, it can handle 4X that amount. Given reasonable page borders, that's roughly up to 400KB of data on a single 8.5" * 11" page! You can store a small report or a relatively long essay on a single paper!
Intelligence is not merely the wealth of knowledge but the sum of perception, wisdom, and knowledge.
Well, I've just had an inspiration in developing a new dust removal filter that should improve accuracy SIGNIFICANTLY for higher density information storage on paper.
This was my idea:
#1: The idea is when you print out a paper archive produced by WinUFF all the horizontal spacing between characters and the vertical spacing between lines MUST exist.
#2: Under that assumption, the filter basically takes the scanned image and converts it into a grid based on currently apparent spacing.
#3: For each square region in the grid it performs a horizontal and vertical pass.
#4: The vertical pass counts the number of vertical pixels each line and finds the average number of pixels per line in the region minus one (as a safety precaution ). This average becomes a theshold that I will explain later.
#5: Now, chances are those number of pixels per line is not constant, but they shouldn't be off more than a couple of pixels. However, chances are also that the top of the characters and the bottom aren't aligned perfectly or there are dust particles linking two lines together. In those lines, the pixel count will be well under the threshold.
#6: Those lines that are under the threshold average will be cleared (converted to white pixels).
#7 The same idea applies for the horizontal pass. It builds a threshold average and removes the anomalous vertical lines between characters.
Basically, this filter will fix problems line \ / looking like a V.
This new filter would probably take a few seconds to complete for a 600dpi scanned page but it will render two older filter I had written obsolete so that will save some time.
Intelligence is not merely the wealth of knowledge but the sum of perception, wisdom, and knowledge.
You are about to answer a thread that has been inactive for more than 6 months. If you still wish to proceed, please ensure that your posting is original and does not duplicate or overlap any prior responses to this thread.