I recently came up with a pretty nice idea to use type-level literals that arrived in GHC 7.10. I hadn’t directly used type-level literals before so this also served as a pretty gentle introduction to them.
I don’t intend to completely cover type-level literals or type-level naturals here. The Haskell wiki has a decent jumping off point for reference. Here’s how I like to think about it:
Haskell’s type system is powerful and only getting more powerful over time. Beyond just letting you define types and checking them as you construct a program, it is also starting to make it easier for the developer to say “Hey GHC, here’s some type info. Please remember it because I’m going to need you to reference it later.” I do this all the time with phantom types: type variables that have no bearing on the data but can be used for type-level assertions later:
data S3Ref a = S3Ref ObjectKey
storeOnS3 :: (Serialize a) => S3Ref a -> a -> m ()
getFromS3 :: (Deserialize a) => S3Ref a -> m (Either DeserializeError a)
The above pseudocode gives us a type variable to keep information just for the type system: even though an S3 object key is essentially a piece of text, we let the developer associate it with a type. The compiler will remember that type, assert you don’t mix it up with another type and recover it in the implementation to produce the expected value. Pretty neat!
Type-level naturals are a baby step forward: they let us put natural numbers (positive integers) into the type system and recover them at runtime. You can do a lot of fancy stuff with this ability, but it got me thinking: what common functionality do I deal with that have the concept of a positive integer and how could it benefit from lifting that into the type system?
I work on web apps in Haskell quite a bit. In pretty much any API or traditional web app I write, I always have some sort of “listing” endpoint: users, products, etc. In most of those cases I have to implement pagination: the ability to get the data 1 page at a time, because there ends up being too much data to reasonably return it all in one request. Simple pagination tends to have 2 components: the page number (1 and onward) and the number of records per-page to return. We’ll be discussing per-page in this article because it has some interesting properties:
I’m a big fan of smart-constructors. In short, its a technique in Haskell where you do not export the default constructor for a type from the module where you define it. You only export the type and a way to construct it and retrieve its internals. Once you test the constructor thoroughly for correctness, you can be certain that any time outside of the module where you see the value that it maintains the invariants you set forth.
So for per-page we want to make it impossible to set a value too low (0 or lower) or too high. Previously I did this:
module Pagination
PerPage
(
, perPage
, mkPerPagePerPageError(..)
, where
)
newtype PerPage = PerPage
perPage :: Int
{deriving (Show, Eq, Ord)
}
data PerPageError = TooSmall
| TooBig
deriving (Show, Eq)
mkPerPage :: Int -> Either PerPageError PerPage
mkPerPage n| n <= 0 = Left TooSmall
| n >= 20 = Left TooBig
| otherwise = Right (PerPage n)
This is pretty good. It meets our criteria for restricting values but it is inflexible. We cannot specialize it to each use case. In order to do that we’d probably have to:
UserPerPage
, a ProductPerPage
, etc. Also, to get that module separation, we’d have to dump all these pagination constructors into a separate module from where we’d use it.mkPerPage
constructor. Indeed, you may even end up doing this in option 1 to DRY things up. The type would look like mkPerPage :: Int ->
Int -> Either PerPageError PerPage
. That doesn’t seem great since it would be easy to screw up the arguments and pass the user-supplied pagination as the limit!What if we could have just one type to deal with pagination and leave the maximum up to the use site? After all, the only time we reference the maximum is once in the constructor. Here’s what I came up with:
module Pagination
PerPage
(
, perPage
, mkPerPagePerPageError(..)
, where
)
import qualified Data.Proxy as P
import qualified GHC.TypeLits as TL
newtype PerPage (max :: TL.Nat) = PerPage
perPage :: Int
{deriving (Show, Eq, Ord)
}
data PerPageError = TooSmall
| TooBig
deriving (Show, Eq)
mkPerPage :: (TL.KnownNat max) => Int -> Either PerPageError (PerPage max)
mkPerPage n| n <= 0 = Left TooSmall
| n >= (fromInteger (TL.natVal (P.Proxy :: P.Proxy max))) = Left TooBig
| otherwise = Right (PerPage n)
We add a type variable, max
to our PerPage type. Its like a phantom type because it is not used in the actual data structure. We’re asking the compiler to remember max
because we will use it later. We don’t take just any max though, we take a max of type Nat
which is short for Natural. We say that max can be one of many (in fact, infinite) types that fall under the umbrella of natural numbers.
In our constructor, we add the constraint KnownNat
, which is always present for natural numbers. The constraint means that the compiler has remembered which natural number is inhabiting max
and can retrieve it for us whenever we want.
Lastly, we have (fromInteger (TL.natVal (P.Proxy :: P.Proxy
max)))
. Proxy
is basically a general purpose type with 1 phantom type variable:
data Proxy a = Proxy
It is a handy way to refer to things at the type level when you don’t have anything at the value level. natVal
takes a Proxy
referring to a known natural number and gives you at the value level that number as an Integer
. Lastly, we use fromInteger
to convert it from Integer -> Int
.
So how do we use this? Let’s say we’re writing users code and our app decides that 20 is the maximum number of users we can return per page. It would look like this:
type UserPerPage = PerPage 20
That’s it! We can now specialize any of our user-facing code to safely limit the pagination. If we want a function with unrestricted pagination, say for internal scripts, we can do that too!
-- Anything at the web layer that we don't trust must be limited to 20 per page.
untrustedGetUsers :: PageNum -> UserPerPage -> m [User]
= internalGetUsers -- internalGetUsers is just a more generalized version of this
untrustedGetUsers
-- We can use this in any code where we're not concerned about fetching too much per page.
internalGetUsers :: (TL.KnownNat maxpp) => PageNum -> PerPage maxpp -> m [User]
= error "todo" internalGetUsers
Recently I’ve gotten some feedback from a non-Haskelling colleague that from their perspective, the Haskell community does not discuss design patterns very often. On one hand, I do feel that functional programming languages are not as fertile of a breeding ground for patterns as OOP languages. I suspect that languages that tend towards heavy usage of “nouns” also tend towards heirarchical thinking and classification. Who knows?
But this is a bit of a cop out. We do use patterns in Haskell. About a year and a half ago I went from being a hobbyist Haskeller to a full-time Haskeller. This was my first opportunity to see “real” Haskell code in the wild. I remember having the same worries before using Haskell in my day job that I would immediately run into problems that would require a pattern I didn’t know. While I think these worries were definitely overblown, I’d like to discuss some of the techniques I’ve picked up in the hopes that other Haskellers looking to use Haskell in “enterprise” software can at least have a jumping off point.
This article will be discussing what I’ve called the “Lensed Reader” pattern.
First and foremost, not long after having the idea to write about this pattern, I came across a wonderful talk that covers many of the points. If you prefer learning through videos, I highly recommend Next Level MTL by George Wilson.
Most applications I’ve worked on need a big piece of read-only state. Things you’ll commonly find in this state object are:
A ReaderT
-based monad transformer is usually perfect for this. For things like your logging environment you may want to add namespaces or pause logging from time to time, but thankfully, MonadReader
implementations provide a local
combinator which temporarily modifies the reader context and restores it automatically, so you don’t need to necessarily resort to MonadState
. I usually end up defining a newtype transformer stack at the heart of my application. Its important to use a newtype wrapper to define instances for your stack without resorting to orphaned instances. It’ll end up looking something like this:
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
import Control.Monad.Reader
data AppState = AppState -- ..
newtype AppT m a = AppT { unAppT :: ReaderT AppState m a}
deriving ( Functor
Applicative
, Monad
, MonadIO
, MonadReader AppState)
,
runAppT :: AppState -> AppT m a -> m a
= runReaderT (unAppT m) s runAppT s m
A couple of quick notes if any of this looks unfamiliar:
GeneralizedNewtypeDeriving
lets us piggyback on ReaderT
’s instances. By and large, if ReaderT r m a
has an instance, our stack can get it without any boilerplate. If we need a customized instance, we’re free to write it ourselves.runAppT
can be read in the following order:
unAppT
unwraps your AppT m a
to a plain ReaderT AppState m a
.runReaderT
further unwraps it to AppState -> m a
AppState
and get an m a
.Now we’re going to set up a real rinky-dink, useless app to demonstrate the technique. All our app can do is log. And we obviously want to be able to license this groundbreaking tech to any outfit willing to pay, so we’ll be able to configure the app to have a configurable name.
data Config = Config {
companyName :: String
}
data AppState = AppState {
asConfig :: Config
asLogger :: String -> IO ()
, }
Cool! Now we can define some helper functions we’ll need in our app:
logMsg :: String -> AppT IO ()
= do
logMsg msg <- asks asLogger
logger
logger msg
getCompanyName :: AppT IO String
= asks (companyName . asConfig) getCompanyName
All this looks great, but there’s a problem. These functions are very specific about the monad they run in. Sure, you can log a message and get the company name in AppT IO
, but you can make due with a lot less. Its also a code smell that getCompanyName
has IO in its type because it isn’t even doing any IO. There’s virtue in generic functions in Haskell because they communicate the capabilities they require and thus shrink the solution space. No cards hiding in the sleeve as it were. Put another way, you can hide a hell of a lot in IO
. If IO
in a function is any m
that implements Monad
, then even if it resolves to IO
in the end, we can be sure that this particular function doesn’t avail itself of the evils of IO
.
Also, when you’re specific about your monad stack, you have to throw in lots of lifts
when you try to use those functions from deeper in a stack. Its like having a home appliance that only works on the 2nd floor. For instance, say we were using EitherT
to encapsulate some operation that could fail and mix it in with our app’s operations.
import Control.Monad.Trans.Either
-- | Try to download an update for the software
tryUpdate :: IO (Either String ())
= return (Left "Psych! Thats the wrong number!")
tryUpdate
update :: EitherT String (AppT IO) ()
= do
update EitherT tryUpdate -- will abort if there's an error, which there will be
"Update complete") -- never gonna happen lift (logMsg
Yuck. Lifting. What if we are a few more layers deep in a monad transformer stack? What if we refactored some of this code somewhere else in the stack? We constantly have to keep track of how many lifts we’ll need to do. Wouldn’t it be nicer is to say that logging and company name can be accessed wherever you have access to AppState?
import Control.Monad.IO.Class
logMsg :: (MonadIO m, MonadReader AppState m) => String -> m ()
= do
logMsg msg <- asks asLogger
logger
liftIO (logger msg)
getCompanyName :: (MonadReader AppState m) => m String
= asks (companyName . asConfig)
getCompanyName
update :: (MonadIO m, MonadReader AppState m) => EitherT String m ()
= do
update EitherT (liftIO tryUpdate) -- will abort if there's an error, which there will be
"Update complete" logMsg
Great! Here’s what we got:
MonadReader AppState m
says in this monad, we could call ask
and get an AppState
. asks
lets us refine that a bit with a selector function to just grab a piece of the state.logMsg
will run in any monad that has access to AppState and can run IO. These constraints act like capabilities and we only ask for what we need. We could easily create an alternative transformer stack in test that satisfied these constraints.getCompanyName
no longer needs IO, which is great because it has no business doing IO.In one of my real world applications, I wrote a utility for some analysts. It used the large AppState
like record to generate a report. Much to my dismay, I found the analysts were avoiding using it because they didn’t have the databases (like PostgreSQL and Redis) the normal app needed, so when the app loaded up that AppState
, the connection pools failed to establish and the whole thing crashed.
The analysts were being reasonable. The actual task this tool was performing didn’t really need databases. It just needed the config. If all my code was using MonadReader AppState m
, then everything would require the whole AppState
, even if it wasn’t going to use the whole thing. The solution I arrived at was to break down AppState
into just what I needed. So I used classy lenses.
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
import Control.Monad.Reader
data Config = Config {
_companyName :: String
}
data AppState = AppState {
_asConfig :: Config
_asLogger :: String -> IO ()
,
}
'AppState
makeLenses ''Config
makeClassy '
instance HasConfig AppState where
= asConfig config
That makeClassy
gives us something like this:
class HasConfig a where
config :: Lens' a Config
companyName :: Lens' a String
instance HasConfig Config where -- ...
In other words, we now have a way to specify data types that contain Config
. Note that companyName
has a default implementation that pulls it off of Config
. I’ve heard this type of abstraction refered to as a “seam”. It is a line in the fabric of the code that can be easily opened up and modified if need be.
The final piece of the puzzle is view
from lens, which is just like asks
from MonadReader
but it takes a lens.
Now we can have:
getCompanyName :: (MonadReader r m, HasConfig r) => m String
= view (config . companyName) getCompanyName
Take care to note that lenses compose in the opposite direction of functions, so we access config
first, then companyName
from there. Now, in a reporting function, we can be specific about what context each function needs and hook it up to a lighter context or even a totally different transformer stack.
heavyReport :: (MonadReader AppState m) => m String
= do
heavyReport <- getCompanyName
cn return (cn ++ " is the best company!")
lightReport :: (MonadReader r m, HasConfig r) => m String
= do
lightReport <- getCompanyName
cn return (cn ++ " is the best company!")
runReport :: Config -> String
= runReader lightReport runReport
Check that out! We didn’t need AppT
or IO
. lightReport
is just as happy being used in a minimal Reader
as it is in our official AppT
monad.
main
where you actually run the thing.Config
, it carries (MonadReader r m, HasConfig r)
. If it doesn’t have that, GHC will give you a type error and tell you exactly what constraints you’re missing!-Wall
-Werror
(and I strongly recommend that you do), GHC 8.0 will warn you about unnecessary constraints, so as your code evolves, if constraints stop being necessary, GHC will remind you to drop them!Over the last few years I’ve been transitioning most of the software development work I do from dynamic programming languages that offer very little in the way of type safety (Ruby, JavaScript) to ones that offer more (TypeScript, Haskell, PureScript), with plenty of trips back and forth between the two camps. Because the grass is always greener on the other side, it is easy early on to attribute safety to a language and to have an overly simplified, overly generous definition of what safety is. With more experience under my belt, I’ve come to realize that type safety is not a checkbox, but more of a dial, and while the language you use may control how far in either direction that dial goes, it is for better or worse up to the user to turn the knob. While type-safety is supposed to be a tool where a machine assists you in building more reliable software, its effectiveness is still largely controlled by the user’s knowledge and desire to protect themselves and their user. In the immortal words of The Wu:
Aint a damn thing changed boy, protect ya neck.
I may make this a series. Future posts will likely be less rambly and will stick to providing a real world case where the language didn’t automatically protect me, but rather where I recognized I was implementing something risky and chose to protect myself with the tools available.
At work we use an instrumentation tool. It is conceptually pretty simple: it provides you with some simple tools to instrument your codes with a few different types of measurements, namely counters and timers. It provides a generic backend interface to allow you to ship the collected metrics off to external services for analysis. It is not unlike the ekg
package.
The interface reminds me a lot of redis. It offers some dead simple types where you specify an arbitrary string key and it stores the data there for later processing. Some pseudo-typed examples are:
-- | Increment a counter stored at the key by 1
incrementI :: String -> m ()
-- | Time a computation and store it as a sampling to key
timeI :: String -> m a -> m a
This is a usable, easy to understand interface for this library to have. You can’t necessarily expect 3rd party libraries to go much beyond this. As a user though, this is completely unsafe. It would be very easy to mix up the keys and accidentally increment a timer or time a counter. Furthermore, if I had to work with the same timer or counter in multiple places in my code, I could easily mistype the key or change it in one place but not somewhere else and screw up my data.
I don’t indend to explain GADTs from first principles in this post. Plenty of other sources do that better. What I will explain is why I chose them. GADTs offer some nice properties for solving this problem:
Here’s the code I ended up using
data Counter
data Timer
data Metric a where
RequestTime :: Host -> Path -> Metric Timer
ErrorCount :: Metric Counter
toName :: Metric a -> String
RequestTime h p) = "request-time-" <> h <> ":" <> p
toName (ErrorCount = "error-count"
toName
incrementI :: Metric Counter -> m ()
= I.incrementI (toName c)
incrementI c
timeI :: Metric Timer -> m ()
= I.timeI (toName c) timeI c
This requires the GADTs
and EmptyDataDecls
extensions. Take note of a few things here:
I
and reexport its functions with enriched types. Everywhere in my app I would import this module and not the underlying library.Another great thing is you can set up this barrier on the other side of the library as well. Say the underlying library provides these functions:
getCounter :: String -> m CounterValue
getTimer :: String -> m TimerValue
We can wrap those up again with our metric type and be sure that we can’t be looking up a timer with a counter’s key or vice versa:
getCounter :: Metric Counter -> m CounterValue
getTimer :: Metric Timer -> m TimerValue
This is just a quick snippet I’ve been meaning to post for a few weeks. A few weeks ago I needed to add a Postgres UUID column to one of my tables using Persistent. I dug around and all I found were vague, closed tickets, and old irrelevant blog posts on the Yesod site that mentioned UUIDs but didn’t give any good examples. The solution ended up being simple but I hope it helps someone else who is having this problem, or more likely, future me when I forget how this is done.
Let’s say you’ve got some schema TH that looks like:
"migrateAll"] [persistLowerCase|
share [mkPersist sqlSettings, mkMigrate
Person
name String
age Int Maybe
deriving Show |]
We want to:
Our schema TH now looks like:
"migrateAll"] [persistLowerCase|
share [mkPersist sqlSettings, mkMigrate
Person
name String
age Int Maybe
uuid PersonUUID default=uuid_generate_v4()
UniquePersonUUID uuid
deriving Show |]
In a module accessible from your schema’s you’ll also add something like:
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE OverloadedStrings #-}
module MyApp.Schema where
import Control.Error
import Control.Lens
import Data.ByteString (ByteString)
import Data.UUID
import Data.Text (Text)
import qualified Data.Text as T
import System.Random
newtype PersonUUID = PersonUUID {
_personUuid :: UUID
deriving (Show, Eq, Ord, Random)
}
'PersonUUID
makeLenses '
instance PersistFieldSql PersonUUID where
= const $ SqlOther "uuid"
sqlType
instance PersistField PersonUUID where
= toPersistValueUUID personUuid
toPersistValue = fromPersistValueUUID personUuid
fromPersistValue
_ASCIIBytes :: Prism' ByteString UUID
= prism toASCIIBytes $ \bs -> note bs $ fromASCIIBytes bs
_ASCIIBytes
toPersistValueUUID :: Iso' a UUID -> a -> PersistValue
= PersistDbSpecific $ a ^. i . re _ASCIIBytes
toPersistValueUUID i a
fromPersistValueUUID :: Iso' a UUID -> PersistValue -> Either Text a
PersistDbSpecific bs) =
fromPersistValueUUID i ("Could not parse UUID" $ bs ^? _ASCIIBytes . from i
note = Left $ "Invalid value for UUID: " <> showT x
fromPersistValueUUID _ x
showT :: Show a => a -> Text
= T.pack . show showT
Let’s break this down a bit. First, we create a newtype around UUID
to distinguish the type and then derive an Iso
that can get us to and from the UUID
via makeLenses
. We also create a Prism
between ByteString and UUID. I like to read prisms as the left type variable (ByteString
) is the “wider” type and the right type variable (UUID
) is the “narrow” one. That is to say, you know you can always go from the narrow type to the wide one but not necessarily the other way. Conveniently, PersistDbSpecific
expects a ByteString
so this is exactly what we need to serialize our type to the database.
note
is a great little function from the errors
package of type e -> Maybe a -> Either e a
that upgrades a Maybe
into an Either
.
Lastly, you’ll want to make sure that the uuid-ossp
extension is enabled in your database. You can issue the command CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
to do so.
The other day I was looking through some haskell code and found a curious little combinator: <$. At first I thought it was <$>
, the infix alias for fmap
. I’m so used to seeing <$>
because I see and write it many times a day. I decided to look it up on the official haddocks to see what was going on. The description is:
Replace all locations in the input with the same value. The default definition is fmap . const, but this may be overridden with a more efficient version.
I know I’ve read that before and didn’t understood it. Locations? What’s that supposed to mean? Why would I want to use this? “Whatever, I’ll probably never need it,” I thought. And so I never ended up using <$
until now, and that’s a real shame. Let’s take a closer look at the documentation and the type and see if we can figure out what it means.
The type is
(<$) :: Functor f => a -> f b -> f a
So we only require a Functor
constraint to use it. As Bartosz Milewski notes, Functors are Containers. Remembering this analogy of “Containers” made the “locations” phrasing click for me. A list is a functor, and intuitively, locations in a list are the elements. So let’s see what it looks like to replace all locations in the input with the “same value”, or the first argument:
Prelude Control.Applicative> 3 <$ [7,8,9]
3,3,3] [
Ah ha! This makes sense too, looking at the type.
The first argument is of type a
, the second f b
, a container of b
. And that’s the last time we see b
. The only thing Functor
provides us is
fmap :: Functor f => (a -> b) -> f a -> f b
Given we know nothing about a
and b
, this only lets us apply a function to every “location” in a functor/container. There’s no breaking out early, nor adding new elements. <$
doesn’t even take a function, just a single value of a
. It knows nothing about a
and nothing about f b
except how to map over it. It couldn’t even return an “empty” f b
because functor doesn’t give it the tools to do that. The only thing <$
can do is replace each location in the functor with the given a
and indeed the only implementation it can have is the one it has, fmap . const
. Pretty cool!
So back to the list example. I guess this could be helpful. You could imagine a list or Map
of items to validate and some condition causing you to replace all values with some invalidated value. Or maybe a list of tests and we want to replace them all with failures. This is a bit of a stretch though. There is a much more useful case…
Haskell parsers tend to be Applicative Functors and are really nice to use so you end up using them a lot. You may be writing an Applicative parser for command line options using optparse-applicative, a parser using attoparsec, or a FromJSON
instance in aeson. In any of these scenarios, if you find yourself parsing some fixed token, <$
can help!
Let’s say we’ve got the following type:
data Status = Staged | Running | Finished
I’ve been writing parsers like this for some time:
string :: Text -> Parser Text
parseStatus :: Parser Status
= parseStaged <|> parseRunning <|> parseFinished
parseStatus where
= string "Staged" *> pure Staged
parseStaged = string "Running" *> pure Running
parseRunning = string "Finished" *> pure Finished parseFinished
Parser
is a Functor
, Applicative
, and Alternative
(which gives us <|>
). This is where its hard to apply the description <$
of replacing all “locations” in a functor. What are locations in a parser? In this case the type is more illuminating. The f b
in the type is Parser Text
. f
is Parser
, and b
is Text
. We want to throw out the Text
once this parser succeeds and replace it with our token. So we instead can write:
parseStatus :: Parser Status
= parseStaged <|> parseRunning <|> parseFinished
parseStatus where
= Staged <$ string "Staged"
parseStaged = Running <$ string "Running"
parseRunning = Finished <$ string "Finished" parseFinished
This feels nicer. We no longer have to “lift” the value into the parser with pure
. You can read this as “the result is Staged if I’m given a the string ‘Staged’”.
This little thought exercise has helped underscore the importance of reasoning with the tools that typeclasses give us. The simpler typeclasses have this great property of being extremely polymorphic, which at once makes them very powerful in their use and very constrained in their implementations.