sunnuntai 4. joulukuuta 2011

Memory blues

I recently started playing around with a Linode instance. It's a really nice service and I wanted to kick the tires with couple of Haskell projects. The smallest Linode instance comes with 512MB of memory, and this should be reasonably enough for everybody. Except for ghc and ld, it seems :)

The problem

I installed Yesod web framework from scratch using cabal
cabal install yesod
I quickly noticed that installing software with cabal caused a huge load spike to host. I whipped out my trusty htop and noticed that process ld (the linker). was sucking up 500MB of memory and compilation was (understandably) stuck since the host was swapping all the time.
The relevant ticket for GHC contained some information. I also came across a brilliant posting outlining some optimizations for limited memory hosts.

The fix

I tested giving the options to ghc, but unfortunately some packages flag out refused to compile with those. At least for some packages the linker parameters given with -optl were passed to gcc. After some twiddling I decided to give up.
Then I came across a Stack Overflow discussion about gold linker, which is experimental optimized linker. There's a Ubuntu package for that,binutils-gold which conveniently replaces the system ld with the experimental optimized linker.
So, it just installed it with command
sudo apt-get install binutils-gold
and rerun the
cabal install yesod
The compilation seemed to go through smoothly, and ld didn't cause any memory spikes. As such, using the binutils-gold package seemed to solve this particular memory blues.

The disclaimer

The gold linker is experimental software, and is known not to compile some software correctly (such kernel modules). As such, using it is a tradeoff. In my case the standard linker just didn't work. Fortunately, the experimental linker can be quite easily removed by uninstalling the binutils-gold package - system will revert back to stock ld.
Anyway, if you happen to run into this particular problem, try the binutils-gold. Rumour is that the optimized ld might even help with boost-heavy C++-projects.

torstai 24. marraskuuta 2011

The Bash Monad

You've probably used some Unix command-line tools and chained them together using the pipe (|). This is very much similar to how monadic Haskell programs work: you chain a bunch of IO actions together using the >>= symbol.

When writing this kind of chains, you are usually interested in these aspects of each of the tools in the chain:
  • Input
  • Output
  • Side-effects
Usually only the last tool in the chain will actually perform any side-effects; the others just read files and process input text, producing some output text.
Let's now try to define the "type signature" of these commands.
So, suppose you have a command-line like
cat lolcats.txt | xargs touch
I should say that
"cat lolcats.txt" :: IO [String]
"xargs touch"     :: [String] -> IO ()
I'm using the Haskell type signature notation here. There [String] means a list of strings and IO means that the function is not pure: it performs some IO, and hence is not necessarily deterministic and may have side-effects. The xargs touch part has a signature IO () (pronounces IO unit) that indicates that it consumes a list of strings and performs an IO action that doesn't return any value.
If you wrote this as a Haskell program, you could have functions like this:
readLolCats :: IO [String]
writeCatFiles :: [String] -> IO ()
Let's just assume that these functions were already implemented and see how we can chain them together. Well, that would be using the >>= operator.
So the whole thing would go like
readLolCats >>= writeCatFiles
I'd like to say that the >>= operator is the equivalent of the shell pipe for Haskell IO!
The functions above could be implemented like
grepCats = return . filter (isInfixOf "cat")
writeCatFiles =  mapM_ (flip writeFile $ "")
Say what? The first function uses . to compose two functions. What are their types? Let's ask GHCI:
*CommandPrompt> :t return
return :: Monad m => a -> m a
*CommandPrompt> :t filter (isInfixOf "cat")
filter (isInfixOf "cat") :: [[Char]] -> [[Char]]
So, return seems to be related to something called Monads. Buy hey, IO is a Monad, so practically return takes any value a and returns IO a. Thus is allows you to inject a value "into IO". Or any monad. Let's stick to IO for now, though.
The latter part says that it filters a list of Strings, keeping only those that include "cat". The type signature [[Char]] -> [[Char]] is equivalent to[String] -> [String] (Strings are just lists of Chars).
When we compose these functions using ., we get a function having a type signature like
*CommandPrompt> :t grepCats
grepCats :: [String] -> IO [String]
We knew that though, right.
How does writeCatFiles work, then? Well, it uses mapM_, which is Haskell's equivalent to forEach in some languages: it does something with each of the elements in the given list.
*CommandPrompt> :t mapM_
mapM_ :: Monad m => (a -> m b) -> [a] -> m ()
As you can see, it takes a function that is applicable to the elements in the given list of values of type a. We are using it in the IO monad, so practially mapM_ applies the given function a -> IO b to all elements. The return type b doesn't matter; mapM_ discards the possible return value. Hence, your function may very well be of type a -> IO ().
In our writeCatFiles function, we provide mapM_ with (flip writeFile $ "") which has the type FilePath -> IO(). Now FilePath is actually a typeclass and there's a String-instance for it, so String can be used instead of FilePath. So why flip? That's because we need to be able to feed file names to this function. It just happens to be that writeFile has file name as its first parameter and file contents as the second. With flip we can reverse the ordering of the parameters, and by currying the flipped function with "", we get a function that accepts a filename and writes the empty string into that file.
Currying? Well, suppose you've got a function with two params, like
*CommandPrompt> :t flip writeFile
flip writeFile :: String -> FilePath -> IO ()
Here the function (flip writeFile) accepts first file contents and then the file name. Now if we apply one parameter, we get
*CommandPrompt> :t (flip writeFile) $ ""
(flip writeFile) "" :: FilePath -> IO ()
.. which is acceptable for our little "for-loop" : we can use this function as the first argument for mapM_ and get the empty string written to all files given in the input list.
So, currying is just applying a parameter to a function, and getting another function with one less param.

Now what exactly is a Monad?

One way to put it is that its a type for actions that can be chained using the >>= operator. Also, a Monad needs to have the return function defined, for injecting values into the monad.
So in terms of bash, you can say that the pipe `|` is the equivalent of >>=.
Now, what would be the bash-equivalent of return? A free T-shirt for the first correct answer!