Skip to content

Learning from Golf

March 26, 2011

I admit that I’m addicted to code-golf. Sometimes I feel like a little coding for fun, but I’m not in the mood to get into any of my more involved projects. At these times, a short code-golf puzzle can be just the ticket. Most code-golf entries are often very convoluted and hard to read. But in Haskell, I often find that even distilled to minimal character count, the code still expresses the meaning clearly. Sometimes it even leads me to a deeper appreciation of the code.

Recently, I tackled a code-golf problem to write a function that does a form of glob matching. Here’s the code, un-golf’d:

-- | A simple form of matching expression, given as a string
type GlobPattern = String

-- | Match a pattern element
pattern :: Char -> String -> [String]
pattern '?' (a:z) = [z]
pattern '?'   []  = []
pattern '*'    s  = tails s
pattern '+' (a:z) = tails z
pattern '+'   []  = []
pattern  c     s  = literal c s

When a pattern element is matched to some input, it can match zero, one or more chracters of the input. The result of this pattern match will be a list of possible remainders of the input.

The first five clauses of pattern handle the special matching characters:
— 1) a '?' matches a single character; the rest of the input is the remainder
— 2) if there is no input, then '?' doesn’t match, and there is no remainder
— 3) a '*' matches zero or more characters, so all possible tails of the input
are the remainder
— 4) a '+' matches one or more characters, so all possible tails of the rest
of the input are the remainder
— 5) If there is no input, then '+' doesn’t match, there is no remainder

The final clause covers the catch-all glob rule:
— 6) Any other character is matched literally

Notice that when there is no match, an empty list of possible remainders is the result.

-- | Match a literal pattern character exactly
literal :: Char -> String -> [String]
literal c (a:z) | a == c = [z]
literal _    _           = []

If a literal character matches the front of the input, the rest is the remainder.

Notice that the second clause handles two possible reasons for lack of a match: Either the next character doesn’t match, or the input is empty.

-- | Match a pattern to an input in all possible ways.
-- Return if it matched exactly the whole input for each way.
match :: GlobPattern -> String -> [Bool]
match ('\\':a:z) s = literal a s >>= match z
match (     a:z) s = pattern a s >>= match z
match        []  s = [null s]

This function works down the pattern, element by element. On each element the possible remainders are “fed” into matching on the rest of the pattern.

The three clauses can be read:
— 1) a backslash escapes the next character, which is matched literally
— 2) other characters are pattern matched
— 3) if the pattern is exhausted, return True iff the input is exhausted

The code uses >>= to do the feeding, which will apply the match z operation to each posisble remainder, and concatenate the results. When operating on lists, as the code is here, >>= is just concatMap.

-- | Test if a "glob" pattern matches the whole input
glob :: GlobPattern -> String -> Bool
glob p = or . match p

Finally, a glob pattern matches if there is at least one way to match the whole pattern with the whole input.


It is instructive to cast this code in terms of the type class Alternative, which gives us the operations empty (no alternatives), pure (a single alternative), and <|> which combines alternatives:

patternAlt :: Char -> String -> [String]
patternAlt '?' (a:z) = pure z
patternAlt '?'   []  = empty
patternAlt '*'    s  = pure s <|> patternAlt '+' s
patternAlt '+' (a:z) = patternAlt '*' z
patternAlt '+'   []  = empty
patternAlt  c     s  = literalAlt c s

Even though this code still uses a list as the return type, by using Alternative, it becomes more clear how it is encoding possible ways the match can proceed.

Notice that rather than use tails, '*' and '+' are expressed in terms of each other: '*' is like matching nothing, or like '+'. '+' is like matching '*' on the input after the first character.

Both ways of expressing these operations tell us something about their nature: That both are expressable via tails tells us that they are structurally similar. That they can be expressed in terms of each other tells us that they are each, in a sense, one “step” from each other. Further, each expresses a fundamental quality: '*' is about choice via the <|> operator, '+' is about discrimination via the (a:z) pattern match.

literalAlt :: Char -> String -> [String]
literalAlt c (a:z) | a == c = pure z
literalAlt _    _           = empty

matchAlt :: GlobPattern -> String -> [()]
matchAlt ('\\':a:z) s = literalAlt a s >>= matchAlt z
matchAlt (     a:z) s = patternAlt a s >>= matchAlt z
matchAlt        []  s = if null s then pure () else empty

Rather than produce a Bool, this code produces a list of () alternatives for each match.

globAlt :: GlobPattern -> String -> Bool
globAlt p = not . null . matchAlt p

The code golf’d version now follows from both of the above versions.

'?'%(a:z)=[z]
'*'%s=s:'+'%s
'+'%(a:z)='*'%z
c%s=c&s
c&(a:z)|a==c=[z]
_&_=[]
m('\\':a:z)s=a&s>>=m z
m(a:z)s=a%s>>=m z
m[]s=[null s]
g=(or.).m

There are some basic code-golf tricks at play here:
— no extra whitepsace
— using operators for pattern (%) and literal (&)
— point-free style for glob (g)

Once you can read beyond the simple golf-tricks, I love that even these mere
141 characters read directly as a specification of glob matching!

The only real trick is that two of the clauses were left out of pattern (%)
because the “fall through” to literal (&) handles them.

I hope you enjoyed this small code romp as much as I did.
The code from this post is available as a runnable file: Glob.hs.
My code-golf answer was the smallest of the non-golf language entries..

About these ads

From → Haskell, Software

Comments are closed.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: