Learning from Golf
I admit that I’m addicted to code-golf. Sometimes I feel like a little coding for fun, but I’m not in the mood to get into any of my more involved projects. At these times, a short code-golf puzzle can be just the ticket. Most code-golf entries are often very convoluted and hard to read. But in Haskell, I often find that even distilled to minimal character count, the code still expresses the meaning clearly. Sometimes it even leads me to a deeper appreciation of the code.
-- | A simple form of matching expression, given as a string type GlobPattern = String -- | Match a pattern element pattern :: Char -> String -> [String] pattern '?' (a:z) = [z] pattern '?'  =  pattern '*' s = tails s pattern '+' (a:z) = tails z pattern '+'  =  pattern c s = literal c s
When a pattern element is matched to some input, it can match zero, one or more chracters of the input. The result of this pattern match will be a list of possible remainders of the input.
The first five clauses of
pattern handle the special matching characters:
— 1) a
'?' matches a single character; the rest of the input is the remainder
— 2) if there is no input, then
'?' doesn’t match, and there is no remainder
— 3) a
'*' matches zero or more characters, so all possible tails of the input
are the remainder
— 4) a
'+' matches one or more characters, so all possible tails of the rest
of the input are the remainder
— 5) If there is no input, then
'+' doesn’t match, there is no remainder
The final clause covers the catch-all glob rule:
— 6) Any other character is matched literally
Notice that when there is no match, an empty list of possible remainders is the result.
-- | Match a literal pattern character exactly literal :: Char -> String -> [String] literal c (a:z) | a == c = [z] literal _ _ = 
If a literal character matches the front of the input, the rest is the remainder.
Notice that the second clause handles two possible reasons for lack of a match: Either the next character doesn’t match, or the input is empty.
-- | Match a pattern to an input in all possible ways. -- Return if it matched exactly the whole input for each way. match :: GlobPattern -> String -> [Bool] match ('\\':a:z) s = literal a s >>= match z match ( a:z) s = pattern a s >>= match z match  s = [null s]
This function works down the pattern, element by element. On each element the possible remainders are “fed” into matching on the rest of the pattern.
The three clauses can be read:
— 1) a backslash escapes the next character, which is matched literally
— 2) other characters are pattern matched
— 3) if the pattern is exhausted, return
True iff the input is exhausted
The code uses
>>= to do the feeding, which will apply the
match z operation to each posisble remainder, and concatenate the results. When operating on lists, as the code is here,
>>= is just
-- | Test if a "glob" pattern matches the whole input glob :: GlobPattern -> String -> Bool glob p = or . match p
Finally, a glob pattern matches if there is at least one way to match the whole pattern with the whole input.
It is instructive to cast this code in terms of the type class
Alternative, which gives us the operations
empty (no alternatives),
pure (a single alternative), and
<|> which combines alternatives:
patternAlt :: Char -> String -> [String] patternAlt '?' (a:z) = pure z patternAlt '?'  = empty patternAlt '*' s = pure s <|> patternAlt '+' s patternAlt '+' (a:z) = patternAlt '*' z patternAlt '+'  = empty patternAlt c s = literalAlt c s
Even though this code still uses a list as the return type, by using
Alternative, it becomes more clear how it is encoding possible ways the match can proceed.
Notice that rather than use
'+' are expressed in terms of each other:
'*' is like matching nothing, or like
'+' is like matching
'*' on the input after the first character.
Both ways of expressing these operations tell us something about their nature: That both are expressable via
tails tells us that they are structurally similar. That they can be expressed in terms of each other tells us that they are each, in a sense, one “step” from each other. Further, each expresses a fundamental quality:
'*' is about choice via the
'+' is about discrimination via the
(a:z) pattern match.
literalAlt :: Char -> String -> [String] literalAlt c (a:z) | a == c = pure z literalAlt _ _ = empty matchAlt :: GlobPattern -> String -> [()] matchAlt ('\\':a:z) s = literalAlt a s >>= matchAlt z matchAlt ( a:z) s = patternAlt a s >>= matchAlt z matchAlt  s = if null s then pure () else empty
Rather than produce a
Bool, this code produces a list of
() alternatives for each match.
globAlt :: GlobPattern -> String -> Bool globAlt p = not . null . matchAlt p
The code golf’d version now follows from both of the above versions.
'?'%(a:z)=[z] '*'%s=s:'+'%s '+'%(a:z)='*'%z c%s=c&s c&(a:z)|a==c=[z] _&_= m('\\':a:z)s=a&s>>=m z m(a:z)s=a%s>>=m z ms=[null s] g=(or.).m
There are some basic code-golf tricks at play here:
— no extra whitepsace
— using operators for pattern (
%) and literal (
— point-free style for glob (
Once you can read beyond the simple golf-tricks, I love that even these mere
141 characters read directly as a specification of glob matching!
The only real trick is that two of the clauses were left out of pattern (
because the “fall through” to literal (
&) handles them.