Chris Blog

2009-03-03

March 5 is the Square Root of Christmas

I just wanted to make the announcement, if anyone happened to be unaware. The reason for the holiday, if you haven't guessed already, is because sqrt(1225) = 35. Translating those numbers into calendar dates gives us the square root of Christmas 12/25 falling on 3/5. It just so happens that there is another big nerd holiday in March, Pi Day (3/14). Why not connect these two dates with a week(ish) long nerd celebration? I hereby declare that:

March 5th shall be known as the Square Root of Christmas and
The 10-day interval between the Square Root of Christmas and Pi Day shall be known as Nerdigras

Update (2010-11-14): ~~The official page for the~~ Square Root of Christmas ~~and~~ Nerdigras ~~has moved to~~ http://sencjw.com/the-square-root-of-christmas/" (These links have been changed to point to the official location given below -- chris)

Update (2012-02-05): The Square Root of Christmas finally has a domain name to call its own! Use either: http://squarerootofchristmas.com/ or (for shorter links) http://sqrtxmas.com/ if you mention it on twitter, use #sqrtxmas

Tweet #sqrtxmas

2009-01-26

Lisp is full of teh WIN

I know that the title is lame, verging on cringe-inducing, but I had to say it; I'm not sorry. There has been a few threads of discussion over on Hacker News about what's wrong with Lisp. The argument seems to hinge on the lack of a literal syntax for "hashes" in lisp the way that there is in python or, in a more appropriate comparison, clojure. Clojure is great, by the way, and it is certainly deserving of its own post sometime soon. Suffice it to say that the next time I have to write something in Java I think I'm going to write it in clojure instead. Clojure has great interoperability with Java. Amazingly, there are fewer parenthesis in the Lisp version of that little GUI program than in the Java version! But on to the topic of the day. In many languages you can do something like this:

>>> d = {"a":1, "b":2, "c":3}
>>> d
{'a': 1, 'c': 3, 'b': 2}
>>> d['b']
2

You can see that we can explicitly give a representation of that hash that python knows how to parse into the corresponding data structure. Clojure has a very similar literal syntax:

user=> (get {:a 1 :b 2 :c 3} :b)
2

So the complaint is that Common Lisp doesn't have anything like that. Using hashes in CL consists of a bunch of function calls that manage the hash:

CL-USER> (let ((hash (make-hash-table)))                                       
          (setf (gethash :a hash) 1)                                          
          (setf (gethash :b hash) 2)                                          
          (setf (gethash :c hash) 3)                                          
          (gethash :b hash))
2
T

There is never any literal representation of the hash, in fact, if we evaluate the name of the hash we'll just get something like this "#<HASH-TABLE :TEST EQL :COUNT 3 {AD11A99}>" Which basically tells the lisp reader to signal an error if the form is read. But this doesn't have to be so verbose. We can write a really simple reader macro (with a little helper function) that gives us something like what we have in python or clojure above:

(defun ins-hash (h vals)
  (if (oddp (length vals))
      nil
      (if (not (null vals))
          (progn
            (setf (gethash (first vals) h) (second vals))
            (ins-hash h (cddr vals)))
          h)))

(set-macro-character #\} (get-macro-character #\)))

(set-dispatch-macro-character #\# #\{
  #'(lambda (stream char1 char2)
      (let ((items (read-delimited-list #\} stream t)))
        (ins-hash (make-hash-table) items))))

With this reader macro I can now write this:

CL-USER> (gethash :b #{:a 1 :b 2 :c 3})
2
T

Lisp now interprets everything between '#{' and '}' as key, value pairs and using the helper function inserts them into a new hash. Cool!

2008-11-07

haskell and scrambled text

I've recently been playing around with Haskell, and one of the toy applications that I like to write is a text "munger." It takes text and outputs the same text with the words labeled as to their order and then it ASCIIbetizes them. This article (up to the end of this sentence) would look like this:

"munger."[22] (up[46] ASCIIbetizes[42] Haskell,[7] I've[1] I[15] It[23] This[44] a[20] and[26] and[39] and[8] applications[13] around[5] article[45] as[35] been[3] end[49] is[19] it[41] labeled[34] like[16] like[55] look[54] of[10] of[50] one[9] order[38] outputs[27] playing[4] recently[2] same[29] sentence)[52] takes[24] text[21] text[25] text[30] that[14] the[11] the[28] the[32] the[48] their[37] them.[43] then[40] this:[56] this[51] to[17] to[36] to[47] toy[12] with[31] with[6] words[33] would[53] write[18]

So you could reconstruct that with a little work, but it is nicer to do it by feeding into the "demunge" function. Enjoy:

munge.hs -- Copyright 2008 Chris Wilson
-- Code is licensed under the GNU GPL
--   http://www.gnu.org/licenses/gpl.html

import Data.List

-- Label each string in the order that it is encountered
--   [("Word", 1),..("Lastword", n)]
label :: (Num a, Enum a) => [String] -> [(String, a)]
label [] = []
label sl = zip sl [1..]

-- Add these sequential numbers to the end of words
--   [("Word",1)] -> ["Word[1]"]
attachLabel :: (Num t) => [(String, t)] -> [String]
attachLabel [] = []
attachLabel (s:ss) = (fst s ++ "[" ++ show (snd s) ++ "]") : attachLabel ss

-- Compose all these functions together
munge :: String -> String
munge "" = ""
munge s =  intercalate " " . sort . attachLabel . label $ words s

-- Extract the oder from a tagged word
--   "is[5]" -> 5
getOrder :: String -> Int
getOrder "" = 0
getOrder s = 0 + (read . reverse . tail . fst . span (/='[') . reverse $ s)

-- Extract the word-part of a tagged word
--   "is[5]" -> "is"
getWord :: String -> String
getWord "" = ""
getWord s  = reverse . tail . snd . span (/='[') . reverse $ s

-- Put the words into a list of tuples: (order, word)
reconstruct :: String -> [(Int, String)]
reconstruct "" = []
reconstruct s = [ (getOrder item, getWord item) | item <- words s ]

-- Put all the extraction functions together
demunge :: String -> String
demunge [] = ""
demunge xs = intercalate " " [ snd item | item <- sort (reconstruct xs) ]

2008-08-30

A plan for CAPTCHAs

I had the idea (I'm uncertain if it is original [pdf]) of mixing two distinct elements to make a better CAPTCHA. The problem is that there are essentially two ways to solve a CAPTCHA. The first is by designing some kind of image recognition system, this is an OCR problem and there are indications that nearly every major CAPTCHA has been broken. The other main avenue of attack seems to come down to paying people to crack them. This is a much tougher nut to crack because CAPTCHAs are not designed to prevent humans from cracking them; in some sense, this attack isn't a break of the system at all, but a design goal of CAPTCHAs. But much like the mathematician in that old joke, this definition of acceptable wouldn't sit well with some. The solution is to pose a question in such a way that is very hard for a computer to perceive but at the same time hard for a person to solve by herself. As an example, imagine a cryptographically hard problem posed in a way that a computer couldn't readily interpret, and a person would find difficult (or impossible) to solve by hand. This could be the source code for a simple discrete logarithm solver. The code would be obscured in the normal way that a CAPTCHA is (wavy text, visual noise, etc.) The user is instructed to type the code into a text box (where a built-in solver, client-side, can operate on it), and then use the result for entry into the protected resource. It would be easy to tune the toughness of the discrete log problem (randomly chosen of course) to generate any penalty you want. This penalty would cost an attacker any amount of CPU time that the challenger desires. By mixing a character-recognition task (that a human is good at) with a number-crunching task (which a computer is good at, but can be made arbitrarily CPU-intensive and thus slow) you protect from both types of attacks:

If an OCR program can read your text, they still must compute a computationally expensive value.
If humans are being used to circumvent the OCR task, a computer must be used to compute the expensive task as well, still incurring most of the penalty (by tweaking the exact type and hardness of the one-way function, this part can come to dominate the time needed for a successful break).

Some ideas on how to make the computation time more palatable for a legitimate user:

Make this be the first question on a form that the user has to fill out.
Grant the user provisional access to the site while the computation proceeds in the background.

2008-08-15

Hex Clock

I was reading Hal's blog the other day when I thought that I'd try and implement a clock that uses the hexadecimal time that he talks about. I have seen a bunch of these so-called binary clocks that display the time is some nifty format (blinkenlights). The thing is, what they display is usually binary coded decimal (i.e. 1100:111011 = 12:59). Sometimes they even code each digit: 0001 0010:0101 1001! Hal defined the hexadecimal second to be 1/2^16 of a day, making 65,536 "hexeconds" per day. Since there are 86,400 seconds in a day, there are about 1.318 sec/hxs. Naturally, there are FF hexeconds in a "hexinute." A hexinute is 1/2^8 of a day, making each 337.5 seconds long, or 5.625 minutes (5 min, 37.5 sec). This leads to a nice property (besides being easily convertible to binary), A day then is recursive, there are 256 hxm per day and there are 256 hxs per hxm. Much better than 24 hours, with 60 minutes with 60 seconds! Who could remember that? You just have to get used to your clock displaying a time like: BE|EF (~64,440 seconds into your day or 5:54 PM, clearly not a vegetarian dinner time). Another thing that I thought about was the "stability" of a given digit. Each digit "stays put" for 16 times as long as the digit to it's right. The fastest-changing digit advances each ~1.318 second, to its left the 16's place advances each 16*1.318 = ~21 seconds. In the one's place in the hexinutes each advances 337.5 seconds (or 5.625 minutes), lastly, the most stable digit advances each every 90 minutes. That's great for measuring the orbital period of the space shuttle, each orbit is 10|00 long! That's my proposal for writing the time, by the way, for the programmers out there it will be evocative of a bitwise OR; if you think of the time as something like 0xAB00 | 0xCD. Without further ado:

hexclock.py #!/usr/bin/env python
#
# Based on discussion here: http://halcanary.org/vv/2007/10/31/706/
#
# Copyright 2008 Chris Wilson.
#
# This program is free software: you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the
# Free Software Foundation, either version 3 of the License, or (at your
# option) any later version.
#
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General
# Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program.  If not, see <http://www.gnu.org/licenses/>.

from Tkinter import *
from random import random
from time import sleep
from hs import hexarray, hexdisplay
import sys
import math
import getopt

master = Tk()

def usage():

    print """\nDislpay a Tk-based animated binary clock. The fastest pixel
blinks about once every 1.318 seconds, there are 2^16 such transitions
per day.

%s [options]

\t-h  --help This message
\t-s --size SIZE. make the display SIZE pixels on a side
\t-d --digits display digits in addition to the graphical binary
\t            representation of the time.
\t-f --foreground COLOR. the "lit up" pixels are this color.
\t-b --background COLOR. the background pixels are this
\t            color.
\t-t --textcolor COLOR. the text will appear this color. Obviously,
\t            this option only makes sense with -d""" % sys.argv[0]
    sys.exit(1)

digits = False
HEIGHT = WIDTH = 400
on = "blue"
off = "black"
textcolor = "white"
try:
    opts, args = getopt.getopt(sys.argv[1:],
                               "hs:df:b:t:", ["help", "size=", "digits",
                                              "foreground","background",
                                              "textcolor"])

except getopt.GetoptError, err:
    print str(err)
    usage()
for o, a in opts:
    if o in ("-d", "--digits"):
        digits = True
    elif o in ("-s", "--size"):
        try:
            HEIGHT = WIDTH = int(a)
        except:
            usage()
    elif o in ("-f", "--foreground"):
        on = a
    elif o in ("-b", "--background"):
        off = a
    elif o in ("-h", "--help"):
        usage()
    elif o in ("-t", "--textcolor"):
        textcolor = a
    else:
        usage()

fs = int(math.sqrt(HEIGHT))

w = Canvas(master, width=WIDTH, height=HEIGHT)

def create_grid(canvas, grid, height, width):
    ulx = 0
    uly = 0
    xdimen = width/4
    ydimen = height/4
    item_array = list()
    for square in grid:
            if square == 1:
                i = canvas.create_rectangle(ulx, uly, ulx+xdimen, uly+ydimen,
                                    fill=on)
            else:
                i = canvas.create_rectangle(ulx, uly, ulx+xdimen, uly+ydimen,
                                                    fill=off)
            item_array.append(i)
            if ulx == (width-xdimen):
                uly = uly + ydimen
                ulx = 0
            else:
                ulx = ulx + xdimen
    return item_array

def update(canvas, items, grid):
    for i in range(len(items)):
        if grid[i] == 1:
            canvas.itemconfig(items[i], fill=on)
        else:
            canvas.itemconfig(items[i], fill=off)

if digits:
    def create_timestamp(canvas,h,w):
        return canvas.create_text(h/2,w/2,text=hexdisplay(),fill=textcolor,
                                  justify=CENTER, font="Courier %d bold" % fs)

    def update_time(canvas,text_handle):
        canvas.itemconfig(text_handle, text=hexdisplay())
else:
    def create_timestamp(canvas,h,w): pass
    def update_time(canvas,text_handle): pass


items = create_grid(w, hexarray(), HEIGHT, WIDTH)
text = create_timestamp(w, HEIGHT, WIDTH)
w.pack()
while True:
    update(w, items, hexarray())
    update_time(w,text)
    w.update_idletasks()    # redraw
    w.update()              # process events
    sleep(1)
mainloop()

hs.py #!/usr/bin/env python
#
# Based on blog post on: http://halcanary.org/vv/2007/10/31/706/
#
# Copyright 2008 Chris Wilson
#
# This program is free software: you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the
# Free Software Foundation, either version 3 of the License, or (at your
# option) any later version.
#
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General
# Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program.  If not, see <http://www.gnu.org/licenses/>.

import math, time

# hex timekeeping
hs = 86400.0/(2**16)    # there are 65,536 hexeconds in a day ~1.318 sec.
hm = 86400.0/(2**8) # there are 256 hexeconds in hexinute  ~337.5 sec

# I'm not defining an hour since there wouldn't be many of them (and they'd
# be redundant, with a hexinute being ~5.6 mins, these are pretty stable.
# Plus, hackers are more precise anyway ;) ).

def day_secs():
    h, m, s = time.localtime()[3:6]
    return h*3600 + m * 60 + s

def hextime():
    s = day_secs()
    hexmin = int(math.floor(s/hm))
    s = s - (hexmin*hm)
    hexsec = int(math.floor(s/hs))
    return hexmin,hexsec

def hexarray():
    hexmin, hexsec = hextime()
    total = (hexmin << 8) + hexsec
    out = list()
    for x in range(16):
        out.append(total & 1)
        total = total>>1
    out.reverse()
    return out

def hexdisplay():
    return "%02X:%02X" % hextime()

def bindisplay():
    hm, hs = hextime()
    n1, n2 = nybbles(hm)
    n3, n4 = nybbles(hs)
    format_args = map(to_bin, (n1, n2, n3, n4))
    return "%s\n%s\n%s\n%s" %  tuple(format_args)

def nybbles(byte):
    lower = byte&15
    upper = (byte>>4)&15
    return upper, lower

def to_bin(nybble):
    out = list()
    for x in range(4):
        out.append(str(1 & nybble))
        nybble = nybble>>1
    out.reverse()
    return " ".join(out)

def _test():
    print "16", nybbles(16)
    u, l = nybbles(16)
    print "bin:",u,l, to_bin(u), to_bin(l)

def run_clock():
    import os
    while True:
        os.system("clear")
        print bindisplay()
        print "("+hexdisplay()+")"
        time.sleep(1)

if __name__ == "__main__":
    #_test()
    run_clock()

2008-06-27

Everything cool has already been done

I guess I shouldn't be surprised by anything that turns up within a million-line codebase, but I was. I had been looking around for a way to keep contacts, you know, collections of vital stats: name, e-mail address and if I have it postal address. I've had such a file hanging around for years now my nickname for it is snailmail. I know that the term has traditionally meant more or less: "things one can send through the postal service." But seeing as how e-mail is dead (TIC), it grows increasingly apropos. But anyway, I digress. I have this file of info and I wanted to keep it up to date, generate lists of names, generate those mailto: links and etc. In keeping with what I feel is the the *NIX way, I thought I should create a bunch of shell scripts that operate on my flat file in various ways. So I wrote a few scripts that would dig through the file and spit out whatever it was that I wanted. Okay, I can't resist, I wanted to include one of the ones that I wrote in scheme (guile):

snails.scm

#!/usr/bin/guile -s
!#
(use-modules (ice-9 rdelim))

(define (unpack-snailmail line)
"Currently, this is just ::e-mail"
(let ((line-elts (string-split line #\:)))
(format #t "\"~a ~a\" <~a>~%" (cadr line-elts) (car line-elts)
   (caddr line-elts))))

(define (read-lines port f)
"Read all the lines of a file found at port , applying f"
(let ((current-line (read-line port)))
(if (not (eof-object? current-line))
(begin
 (f current-line)
 (read-lines port f)))))

; rudimentary error-checking on the command line
(if (< (length (command-line)) 2)
    (display "Usage: snail snailmail.db\n")
    (read-lines (open-file (cadr (command-line)) "r") 
                unpack-snailmail))

So it is cake to read out the lines of this file. What I wondered more about was how to update and tend this file? Emacs to the rescue! I found out about forms-mode this does what you would expect, that is to create and fill out forms. There are two components to the form, the first is your data file. In my case it looks like this

.snails

Lastname0:Firstname0:E-mail_address0
Lastname1:Firstname1:E-mail_address1

And so on. Next we need an emacs lisp file to tell Emacs how to parse that file and what the form should look like:

sform.el

;; -*- mode: forms -*-
;; This is the 'Control file' (see 'info forms') for .snails
(setq forms-file "~/.snails")
(setq forms-format-list
 (list
  ",------------------,\n"
  "| Snail Mail Ver 1 |\n"
  "'------------------'\n\n"
  "First:   " 2 "\n"
  "Last:    " 1 "\n"
  "E-mail:  " 3 "\n"))
(setq forms-number-of-fields 3)
(setq forms-field-sep ":")

When you launch emacs and point it at that file emacs -nw sform.el, emacs will ask if you want parse it (answer 'yes') And then there are helpful commands at the bottom of the screen. The only basic one omitted is the one to create new entries, C-c C-o does the trick.

2008-05-16

webindent.c

Sometimes I want to post code to a website, but they have utterly broken filters (I'm looking at you Yahoo! Answers). So that when I paste nicely indented code it goes from this:

def double(x):
return x+x

to this:

def double(x):
return x+x

It is annoying in C or Lisp, but it is wrong in python where the syntax relies on proper indentation. When websites do this it is broken.

So I wrote a little program to rectify the situation. It processes a source file so that it looks the same or similar on the web as in your text editor.

usage: webindent < source.c > page.html

#include <stdio.h>
#define true 1
#define false 0

/* main(): get characters from stdin, replace
* leading spaces with ' '
* and replace leading tabs with 4 ' 's
* This code is donated to the public domain */
int main()
{
    int start = true; /* we're at the start of a line */
    char *space = " ";
    char *fourspaces = "    ";
    char c; /* the current char */

    while ((c = getchar()) != EOF) {
        if (c == ' ') {
            if (start)
                printf(space);
            else
                putchar(c);
        } else if (c == '\t') {
            if (start)
                printf(fourspaces);
            else
                putchar(c);
        } else if (c == '<') {
            printf("<");
        } else if (c == '>') {
            printf(">");
        } else if (c == '\n') {
            printf("<br />");
            putchar(c);
            start = true;
        } else if (c == '&') {
            printf("&");
        } else {
            if (start)
                start = false;
            putchar(c);
        }
    }
    return 0;
}

It does a bit more than is stated in that comment, because it has to remove '&', '<' and '>' for it to be a well-formed HTML fragment. Note that I used this program to process itself.