If anyone is actually intending on installing dehexer, here is the manpage:

.\" Dehexer
.TH dehexer 1 "20 April 2008" "1.0" "dehexer"
.B dehexer 
is a simple program that converts a file of ascii characters into
the equivalent binary file. Any character not in the range 0-9 or
letters not A-F or a-f are simply ignored.
dehexer < 
dehexer is used to convert a human-readable (i.e. description of
the bytes) into an actual binary file. Thus we get "0a" is
transformed into the byte 00001010 on disk (of course, it would
actually be the bits themselves).
None, just do normal I/O redirection.
.I /usr/share/man/man1/dehexer.1.gz
.BR hexer(1)
No known bugs at this time. 
Chris Wilson (christopher.j.wilson@gmail.com)
This program is dedicated to the public domain.
2008 - Written as a compliment to Hal Canary's hexer program.

Gzip the above text and store it in

(at least on my system).



I was looking through Hal's blog when I came across his hexer program. I thought that I should write the complimentary program, that is, given an ascii file of hex characters (e.g. "01CAFEBABE") it will write actual bytes to stdout.

chris@papaya:c$ cc helloworld.c
chris@papaya:c$ ./a.out 
Hello World!
chris@papaya:c$ cat a.out | ./hexer | ./dehexer > helloworld
chris@papaya:c$ chmod a+x helloworld
chris@papaya:c$ ./helloworld 
Hello World!

It seems to work (I changed the name to 'helloworld' because doing

cat a.out | ./hexer | ./dehexer > a.out
seems to clobber the file in a bad way.

Without further ado:
/* dehexer - Convert an ascii file of hex characters into the
   corresponding binary file. Any non-hex characters are silently
   skipped (newlines, tabs, etc.)
   Copyright 2008 Christopher Wilson, based in part on hexer by Hal
   Canary (also DTPD)
   Dedicated to the Public Domain */
/* cc -o dehexer dehexer.c */
int main (int argc, char *argv[])
  char x;
  int char_out = 0;
  int low = 0; // LSBs, 1 for true
  while (fread(&x, sizeof(x), 1, stdin) == 1) {
    if (x > 47 && x < 58) { 
      // digit is 0-9
      x = x - 48;
      handle_byte(x, &char_out, low);
    } else if (x > 64 && x < 71) {
      // digit is A-F
      x = x - 55;
      handle_byte(x, &char_out, low);
    } else if (x > 96 && x < 103) {
      // digit a-f
      x = x - 87;
      handle_byte(x, &char_out, low);
    } else {
      // skip anything that isn't 0-9A-Z, or a-z
    low = (low + 1) % 2; // flip the high/low bit marker

/* handle_byte - if low is true then it prints the full byte. If low
     is false, set char_out to 16 times x.
int handle_byte(int x, char *char_out, int low)
  if(low) {
    *char_out += x;
    putc(*char_out, stdout);
  } else {
    *char_out = x * 16;
  return 0;
/* EOF */



So I've had this idea bouncing around in my head and I'm not sure why. Well that's not true, if I were so unsure I don't think that I'd post this. Maybe it is the idea of what's going on with a project called StupidFilter. It is pretty much what its name would suggest. Its aim is to create a software filter that would remove stupid blog comments (or any other content that you'd like to filter). Note that by my use of 'stupid' above, I'm not really talking about stupidity per se, (because that's very likely a hard problem). No, what I'm talking about may be more akin to a symptom of stupidity? Or maybe just gross violations of proper English, which doesn't really speak to a person's intelligence... I had been thinking about what would be the simplest, most brain-dead method for identifying stupidity in text. The first thing that sprang to mind was entropy. Entropy is the measure of how much uncertainty there is in something (at least as it applies to information theory). A coin toss has 1 bit of entropy, it is calculated like so:
So we get -1/2 * log(1/2) + -1/2 * log(1/2) = 1 I wrote the following code: here (which you can actually run, thanks codepad!). You'll notice that the text from a YouTube comment has a higher entropy than some text that I typed in. I'm actually just going by the letters, no punctuation is included. My hypothesis here is that badly mangled text will have a higher entropy than normal English. Perfectly random text (i.e. text with all 26 letters equally likely) has an entropy of 4.7 bits. This makes sense, since if you'd want to encode all the letters of the alphabet in binary, you'd need at least 5 bits (2^5 = 32, first power of two greater than 26). I think if I include punctuation and all that I may get a better "reading" because an exclamation point, being rare in normal text at least, would have a longer Huffman coding (I haven't really thought about this, could be wrong) and thus lend more to the entropy. My next idea was taken from my cryptography class. English has a certain frequency distribution for the letters (and numbers, punctuation etc.) that we can exploit. A string of 25 'z's in a row doesn't "look" like any standard sentence. We expect, roughly, that as the length of an English text increases the frequencies of the letters should approach 13% 'e', 9% 't', 8% 'a', and so on (List here). I wrote a very simple program here. I took the sum of the squared differences from the "normal" distribution as a "distance" measurement. We would expect a very long text to get very close to zero (i.e. the frequency distributions will tend to match). Both of these are really simple and would need a lot of work (multiplying together, weighting?) to be in in any way practical. But both constitute a very simple test of English-ness, basically, does the target text resemble English (at least statistically).


Catching up!

Here are some posts that I've done recently. I'm moving them here because of the simplicity of using Blogger.

Fri, 04 Apr 2008

Stories of the Earth's Demise...

It may be the case that the Earth isn't doomed if the LHC produces a tiny black hole as the product of a high-energy collision. Read about the lawsuit that sparked the rebuttal.

Posted 2008-Apr-04 18:27

Wed, 02 Apr 2008

Learning a new language can be hard

I decided that I needed to try and pick up a new "language." Language deserves the scare quotes here because I'm referring to Vim. I know that emacs will probably remain my editor of choice, but I didn't want the whole vi side of the earth to remain an editor of last resort. I should be able to get around in vimopolis even if I can't converse fluently with the locals.

That said. I do find some of its features pretty appealing. It just seems to get out of your way in a fashion that emacs doesn't do. And I like the idea that if you know a movement command, say 'w' for moving over a word and 'c' for changing something then you can put them together to have vim delete the word that you're sitting on and drop the cursor right in place to type a new one.

So it is nice to see how the other half lives.

Posted 2008-Apr-02 13:42

Fri, 28 Mar 2008

I've been googled!

I just saw that my house is now (sort of) visible with Google street view. You can check it out here.

The driveway in the foreground leads up to my place. This will be nice for hosting parties, or anytime someone needs to know what the area around my house looks like. I have one of those places that is tucked away off the main road and so is usually hard to find.

Posted 2008-Mar-28 15:41

Tue, 25 Mar 2008

Entropy function

I find myself re-typing this into lisp all the time, so here it is, chiseled into digital stone:

(defun entropy (probs)                                                
         (* -1 (apply #'+ (mapcar #'(lambda (p) (* p (log p 2))) probs))))
CL-USER> (entropy '(0.5 0.5))1.0

Just don't expect it to make sure the probabilities sum to 1!

Posted 2008-Mar-25 23:16

Mon, 24 Mar 2008

Affine cipher

I wrote a little program here to do simple affine encryption. I saw something very much like it elsewhere (can't think of where right now). It gives you the option to do a simple affine cipher, but be careful it will take the function y = ax + b (mod 26) so you can very easily find an a with no multiplicative inverse, caveat emptor.

If you're interested, and really how could you not be, here is the source code:

#!/usr/bin/env python
from sys import argv, exit

letters = "abcdefghijklmnopqrstuvwxyz"

def usage():
 print """affine
 Do affine encryption (y = ax + b (mod 26)). Case insensitive."""

def to_num(c):
 >>> to_num('a')
 >>> to_num('z')
 c = c.lower()
 if c in letters:
  return letters.find(c)
  return -1

def to_letter(num):
 >>> to_letter(0)
 >>> to_letter(25)
 >>> to_letter(34)
 if num >= 0 and num <= 25:   return str(letters[num])  else:   return None  def e(a,b,msg):  """  >>> e(1,1,"cat")
 out = ""
 for c in msg:
  x = to_num(c)
  if x == -1:
    out = out + c
    out = out + to_letter( (to_num(c) * a + b) % 26 )
 return out

def main():
 if len(argv) != 4:
 a = int(argv[1])
 b = int(argv[2])
 message = str(argv[3])
 print e(a,b,message)

def _test():
 import doctest

if __name__ == "__main__":

Posted 2008-Mar-24 19:28

Thu, 20 Mar 2008

Emacs autosave

Emacs, the best text editor in the world, has this distressing habit of making backup files all over the place. If I'm writing something about the Square Root of Christmas, say sqrtxmas.txt, then I'll get a little file like sqrtxmas.txt~ in the same directory. It is nice if I lose the file for some reason, but otherwise it can be something of a nuisance.

I found this website with the remedy. Props to you dude for making the best text editor in the world universe even betterer. You can visit the website for more details, but just to reproduce this little gem in one more place, here it is:

;; Put autosave files (ie #foo#) in one place, *not*
;; scattered all over the file system!
(defvar autosave-dir
(concat "/tmp/emacs_autosaves/" (user-login-name) "/"))

(make-directory autosave-dir t)

(defun auto-save-file-name-p (filename)
(string-match "^#.*#$" (file-name-nondirectory filename)))

(defun make-auto-save-file-name ()
(concat autosave-dir
(if buffer-file-name
(concat "#" (file-name-nondirectory buffer-file-name) "#")
(concat "#%" (buffer-name) "#")))))

;; Put backup files (ie foo~) in one place too. (Thebackup-directory-alist
;; list contains regexp=>directory mappings; filenames matching a regexp are
;; backed up in the corresponding directory. Emacs will mkdir it if necessary.)
(defvar backup-dir (concat "/tmp/emacs_backups/" (user-login-name) "/"))
(setq backup-directory-alist (list (cons "." backup-dir)))

Posted 2008-Mar-20 10:56

Voting machines

I have a bit of a problem with voting machines. Not the paper system because that seems to be a problem that has already been solved. What concerns me is the seeming lack of transparency of electronic voting machines. There is a recent story about New Jersey voting officials being told that they may not seek independent security audits of their voting machines. Over on Ed Felten's blog, Freedom to Tinker he has posted the e-mail that the voting company sent him. He has previouly demonstrated that some voting machines can be hacked.

Posted 2008-Mar-20 10:41

Wed, 19 Mar 2008


Yet another "chris blog"? What's up?

I guess I've decided that I write enough stuff on my Facebook, Myspace and even Orkut (remember that?) that it justifies just keeping a semi-regular blog. Besides, my cs account allows for cgi scripts and an easy shell-access so that it allows me to write in my favorite CMS, Blosxom.

I'll see how long I keep at it, but if my past record is anything to go by, I've done okay.

Posted 2008-Mar-19 19:47


About Me

My photo
A sciency type, but trying to branch out into other areas. After several years out in the science jungle, I'm headed back to school to see what I can make of the other side of the brain.