torstai 24. marraskuuta 2011

The Bash Monad

You've probably used some Unix command-line tools and chained them together using the pipe (|). This is very much similar to how monadic Haskell programs work: you chain a bunch of IO actions together using the >>= symbol.

When writing this kind of chains, you are usually interested in these aspects of each of the tools in the chain:
  • Input
  • Output
  • Side-effects
Usually only the last tool in the chain will actually perform any side-effects; the others just read files and process input text, producing some output text.
Let's now try to define the "type signature" of these commands.
So, suppose you have a command-line like
cat lolcats.txt | xargs touch
I should say that
"cat lolcats.txt" :: IO [String]
"xargs touch"     :: [String] -> IO ()
I'm using the Haskell type signature notation here. There [String] means a list of strings and IO means that the function is not pure: it performs some IO, and hence is not necessarily deterministic and may have side-effects. The xargs touch part has a signature IO () (pronounces IO unit) that indicates that it consumes a list of strings and performs an IO action that doesn't return any value.
If you wrote this as a Haskell program, you could have functions like this:
readLolCats :: IO [String]
writeCatFiles :: [String] -> IO ()
Let's just assume that these functions were already implemented and see how we can chain them together. Well, that would be using the >>= operator.
So the whole thing would go like
readLolCats >>= writeCatFiles
I'd like to say that the >>= operator is the equivalent of the shell pipe for Haskell IO!
The functions above could be implemented like
grepCats = return . filter (isInfixOf "cat")
writeCatFiles =  mapM_ (flip writeFile $ "")
Say what? The first function uses . to compose two functions. What are their types? Let's ask GHCI:
*CommandPrompt> :t return
return :: Monad m => a -> m a
*CommandPrompt> :t filter (isInfixOf "cat")
filter (isInfixOf "cat") :: [[Char]] -> [[Char]]
So, return seems to be related to something called Monads. Buy hey, IO is a Monad, so practically return takes any value a and returns IO a. Thus is allows you to inject a value "into IO". Or any monad. Let's stick to IO for now, though.
The latter part says that it filters a list of Strings, keeping only those that include "cat". The type signature [[Char]] -> [[Char]] is equivalent to[String] -> [String] (Strings are just lists of Chars).
When we compose these functions using ., we get a function having a type signature like
*CommandPrompt> :t grepCats
grepCats :: [String] -> IO [String]
We knew that though, right.
How does writeCatFiles work, then? Well, it uses mapM_, which is Haskell's equivalent to forEach in some languages: it does something with each of the elements in the given list.
*CommandPrompt> :t mapM_
mapM_ :: Monad m => (a -> m b) -> [a] -> m ()
As you can see, it takes a function that is applicable to the elements in the given list of values of type a. We are using it in the IO monad, so practially mapM_ applies the given function a -> IO b to all elements. The return type b doesn't matter; mapM_ discards the possible return value. Hence, your function may very well be of type a -> IO ().
In our writeCatFiles function, we provide mapM_ with (flip writeFile $ "") which has the type FilePath -> IO(). Now FilePath is actually a typeclass and there's a String-instance for it, so String can be used instead of FilePath. So why flip? That's because we need to be able to feed file names to this function. It just happens to be that writeFile has file name as its first parameter and file contents as the second. With flip we can reverse the ordering of the parameters, and by currying the flipped function with "", we get a function that accepts a filename and writes the empty string into that file.
Currying? Well, suppose you've got a function with two params, like
*CommandPrompt> :t flip writeFile
flip writeFile :: String -> FilePath -> IO ()
Here the function (flip writeFile) accepts first file contents and then the file name. Now if we apply one parameter, we get
*CommandPrompt> :t (flip writeFile) $ ""
(flip writeFile) "" :: FilePath -> IO ()
.. which is acceptable for our little "for-loop" : we can use this function as the first argument for mapM_ and get the empty string written to all files given in the input list.
So, currying is just applying a parameter to a function, and getting another function with one less param.

Now what exactly is a Monad?

One way to put it is that its a type for actions that can be chained using the >>= operator. Also, a Monad needs to have the return function defined, for injecting values into the monad.
So in terms of bash, you can say that the pipe `|` is the equivalent of >>=.
Now, what would be the bash-equivalent of return? A free T-shirt for the first correct answer!

7 kommenttia:

  1. Vesa, thanks for your comment! I'm afraid stdout is not the answer I'm looking for, though. First, it's not a command, but a "stream" instead. In Haskell, you'd use functions like "putStrLn" to write to stdout.

    I'm looking for a command that will inject a given string into the "pipe".

  2. Kirjoittaja on poistanut tämän kommentin.

  3. Yep I know, but basically by pushing stuff into stdout it can be piped.

    My first instinct would be something like 'echo `command`', which does have its flaws though.

  4. I think you've nailed it. I was thinking echo, too. It's signature would be roughly

    echo :: String -> IO String

    Email me your address, and you'll get a cool t-shirt.

  5. Oleg's take on this: