2008-11-07

haskell and scrambled text

I've recently been playing around with Haskell, and one of the toy applications that I like to write is a text "munger." It takes text and outputs the same text with the words labeled as to their order and then it ASCIIbetizes them. This article (up to the end of this sentence) would look like this:

"munger."[22] (up[46] ASCIIbetizes[42] Haskell,[7] I've[1] I[15] It[23] This[44] a[20] and[26] and[39] and[8] applications[13] around[5] article[45] as[35] been[3] end[49] is[19] it[41] labeled[34] like[16] like[55] look[54] of[10] of[50] one[9] order[38] outputs[27] playing[4] recently[2] same[29] sentence)[52] takes[24] text[21] text[25] text[30] that[14] the[11] the[28] the[32] the[48] their[37] them.[43] then[40] this:[56] this[51] to[17] to[36] to[47] toy[12] with[31] with[6] words[33] would[53] write[18]

So you could reconstruct that with a little work, but it is nicer to do it by feeding into the "demunge" function. Enjoy:

munge.hs -- Copyright 2008 Chris Wilson
-- Code is licensed under the GNU GPL
--   http://www.gnu.org/licenses/gpl.html

import Data.List

-- Label each string in the order that it is encountered
--   [("Word", 1),..("Lastword", n)]
label :: (Num a, Enum a) => [String] -> [(String, a)]
label [] = []
label sl = zip sl [1..]

-- Add these sequential numbers to the end of words
--   [("Word",1)] -> ["Word[1]"]
attachLabel :: (Num t) => [(String, t)] -> [String]
attachLabel [] = []
attachLabel (s:ss) = (fst s ++ "[" ++ show (snd s) ++ "]": attachLabel ss

-- Compose all these functions together
munge :: String -> String
munge "" = ""
munge s =  intercalate " " . sort . attachLabel . label $ words s

-- Extract the oder from a tagged word
--   "is[5]" -> 5
getOrder :: String -> Int
getOrder "" = 0
getOrder s = 0 + (read . reverse . tail . fst . span (/='['. reverse $ s)

-- Extract the word-part of a tagged word
--   "is[5]" -> "is"
getWord :: String -> String
getWord "" = ""
getWord s  = reverse . tail . snd . span (/='['. reverse $ s

-- Put the words into a list of tuples: (order, word)
reconstruct :: String -> [(Int, String)]
reconstruct "" = []
reconstruct s = [ (getOrder item, getWord item) | item <- words s ]

-- Put all the extraction functions together
demunge :: String -> String
demunge [] = ""
demunge xs = intercalate " " [ snd item | item <- sort (reconstruct xs) ]

3 comments:

rscredits said...

Haha, sunny day again! You know iam more excited when reading your articles here, you really made my day. What's more, you just tell the point of this question and it is informative enough now, thanks again for sharing here. runescape gold for sale , runescape gold

Unknown said...

I've recently been playing around with Haskell, and one of the toy applications that I like to write is a text "munger." It takes text and outputs the same text with the words labeled as to their order and then it ASCIIbetizes them. This article (up to the end of this sentence) would look like this:guild wars 2 gold
buy guild wars 2 gold
cheap guild wars 2 gold
cheapest guild wars 2 gold
guild wars 2 gold for sale

emily said...

whoah this blog is fantastic i love reading your articles.You can learn more: China tour packages | China travel packages | China Travel Agency

twopoint718

About Me

My photo
A sciency type, but trying to branch out into other areas. After several years out in the science jungle, I'm headed back to school to see what I can make of the other side of the brain.