2008-11-07

haskell and scrambled text

I've recently been playing around with Haskell, and one of the toy applications that I like to write is a text "munger." It takes text and outputs the same text with the words labeled as to their order and then it ASCIIbetizes them. This article (up to the end of this sentence) would look like this:

"munger."[22] (up[46] ASCIIbetizes[42] Haskell,[7] I've[1] I[15] It[23] This[44] a[20] and[26] and[39] and[8] applications[13] around[5] article[45] as[35] been[3] end[49] is[19] it[41] labeled[34] like[16] like[55] look[54] of[10] of[50] one[9] order[38] outputs[27] playing[4] recently[2] same[29] sentence)[52] takes[24] text[21] text[25] text[30] that[14] the[11] the[28] the[32] the[48] their[37] them.[43] then[40] this:[56] this[51] to[17] to[36] to[47] toy[12] with[31] with[6] words[33] would[53] write[18]

So you could reconstruct that with a little work, but it is nicer to do it by feeding into the "demunge" function. Enjoy:

munge.hs -- Copyright 2008 Chris Wilson
-- Code is licensed under the GNU GPL
--   http://www.gnu.org/licenses/gpl.html

import Data.List

-- Label each string in the order that it is encountered
--   [("Word", 1),..("Lastword", n)]
label :: (Num a, Enum a) => [String] -> [(String, a)]
label [] = []
label sl = zip sl [1..]

-- Add these sequential numbers to the end of words
--   [("Word",1)] -> ["Word[1]"]
attachLabel :: (Num t) => [(String, t)] -> [String]
attachLabel [] = []
attachLabel (s:ss) = (fst s ++ "[" ++ show (snd s) ++ "]": attachLabel ss

-- Compose all these functions together
munge :: String -> String
munge "" = ""
munge s =  intercalate " " . sort . attachLabel . label $ words s

-- Extract the oder from a tagged word
--   "is[5]" -> 5
getOrder :: String -> Int
getOrder "" = 0
getOrder s = 0 + (read . reverse . tail . fst . span (/='['. reverse $ s)

-- Extract the word-part of a tagged word
--   "is[5]" -> "is"
getWord :: String -> String
getWord "" = ""
getWord s  = reverse . tail . snd . span (/='['. reverse $ s

-- Put the words into a list of tuples: (order, word)
reconstruct :: String -> [(Int, String)]
reconstruct "" = []
reconstruct s = [ (getOrder item, getWord item) | item <- words s ]

-- Put all the extraction functions together
demunge :: String -> String
demunge [] = ""
demunge xs = intercalate " " [ snd item | item <- sort (reconstruct xs) ]

twopoint718

About Me

My photo
A sciency type, but trying to branch out into other areas. After several years out in the science jungle, I'm headed back to school to see what I can make of the other side of the brain.