Saturday, August 31, 2013

4.2.1: A Trivial Iterator - Take one

Iterators are objects that allow us to iterate over a stream of data, be that from a file, a linked list, an array or any other data collection. The iterator encapsulates both location in the data and movement across it.

At very minimum, the iterator must support three operations:
  1. To move across the data.
  2. To return the data that the iterator currently points to.
  3. Allow the user to check if he has reached the end of the data.
The Upto iterator represents iteration over the interval [m,n) which is the list of numbers from m to n-1. One of the benefits of using iterators in this example is that we don't have to generate the list of numbers before hand, we can simply evaluated it as needed.

A very simple way to create an iterator, and a very similar to the one that is used in HOP, is to use closures to capture the state and return a function that acts as an iterator. This function that is returned must be able to satisfy all three criterias in a single function call.


type (
 Iter func() (int, bool)
)

func upto(m, n int) Iter {
 return func() (int, bool) {
  if m < n {
   ret := m
   m++
   return ret, true
  }
  return n, false
 }
}

The function accepts two numbers and returns a function that acts as an iterator. Upon invocation it increases the current position and returns the previous number and a boolean that indicates whether we have reached the end of the stream.


func main() {
 i := upto(3, 5)
 for v, ok := i(); ok; v, ok = i() {
  fmt.Println(v)
 }
}

Looping is simple, but a bit clumsy. We start by constructing the iterator. Then in the initialization section of the for construct we retrieve the first value and the EOS (End Of Stream) indicator. The last section of the for constructs gets the next value along with updating the EOS indicator.

It's all very trivial and brief, there is very little code here that doesn't directly touch on either generating the interval or looping through it.

But I don't think it is a good fit for iterators in Go. For one, the invocation performs way to many functions in one call. There is e.g. no way to check for EOS without advancing it. Second, it is very limiting in the kind of iterators we can create. Random access is off the table. Thirdly, it is very hard to write algorithms for these kind of iterators. There is no way to declare what kind of iterator the algorithm needs. Fourth, I don't think this style of iterators scales very well across different problems.

Go is a static language; formalism is our friend and path to freedom.

Get the source at GitHub.

Saturday, August 24, 2013

Interlude

Chapter three is finished and with it the first 113 pages of the book. Now the first realy long chapter begins, counting about 80 pages. It is about iterators, and for me it is one of the core concepts of the book.
My strategy for this chapter is to read through the whole chapter and do the examples before posting. I want do figure out a good way to do iterators in Go that scales well across the examples in the book. I'm nearly finished and the first example should be posted soon.

Tuesday, August 6, 2013

3.8.1: Memoization of Object Methods

Our final example from chapter three is the memoization of object methods. Again we can use the same memoizer function of type Memfunc as before, no changes are needed.
type (
 Keyfunc  func(...interface{}) string
 Memfunc  func(...interface{}) interface{}
 Algofunc func(int) int
)

func memoize(f Memfunc, kf Keyfunc) Memfunc {
 if kf == nil {
  kf = func(v ...interface{}) string {
   // Very naive keygen func, simply use string representation
   // of every param
   var buffer bytes.Buffer
   for i := 0; i < len(v); i++ {
    buffer.WriteString(fmt.Sprint(v[0]))
    buffer.WriteString(",")
   }
   return buffer.String()
  }
 }

 cache := make(map[string]interface{})
 return func(v ...interface{}) interface{} {
  cachekey := kf(v)
  if r, ok := cache[cachekey]; ok {
   // Hit, return previous result
   return r
  }
  // Miss, call calculator
  r := f(v...)
  cache[cachekey] = r
  return r
 }
}

Our object is a trivial one, its only purpose to demonstrate the speedup from the memoizing transformation. The method accepts an int argument, waits for one second and then returns the argument.
type Object struct {
}

func (o *Object) Say(p int) int {
 time.Sleep(1 * time.Second)
 return p
}

The main function starts by executing the raw method and timing it:
func main() {
 var obj Object
 duration, res := timeThis(obj.Say, 1)
 fmt.Printf("Object says: %d, takes %s to do it.\n", res, duration)

Our typecasting code structure is simpler this time around because the method is not a recursive one, only typecasting is needed.
 msay := func(f Algofunc) Algofunc {
  memsay := memoize(func(values ...interface{}) interface{} {
   return f(toInt(values[0]))
  }, nil)

  // Return a function that will do the neccessary typecasting
  return func(val int) int {
   return toInt(memsay(val))
  }
 }(obj.Say)

 duration, res = timeThis(msay, 1)
 fmt.Printf("Memoized object says: %d, takes %s to do it.\n", res, duration)
 duration, res = timeThis(msay, 1)
 fmt.Printf("Memoized object says: %d, takes %s to do it.\n", res, duration)
 duration, res = timeThis(msay, 2)
 fmt.Printf("Memoized object says: %d, takes %s to do it.\n", res, duration)
 duration, res = timeThis(msay, 2)
 fmt.Printf("Memoized object says: %d, takes %s to do it.\n", res, duration)
}

The results are predictable, the method takes one second to execute while the memoized version drops down to 15ms after the first execution:
Object says: 1, takes 1.001060856s to do it.
Memoized object says: 1, takes 1.001117812s to do it.
Memoized object says: 1, takes 14.71us to do it.
Memoized object says: 2, takes 1.001044387s to do it.
Memoized object says: 2, takes 14.893us to do it.

Get the source at GitHub.

Monday, August 5, 2013

3.7.4: Partitioning

The point of making a generic memoize is to be able to use it with different algorithms. The partition algorithm from 1.8: Partitioning has a different signature than the fibonacci algorithm yet we can use memoizer unchanged. First for the type definitions, the only difference being the signature of the Algofunc:
type ( Keyfunc func(...interface{}) string Memfunc func(...interface{}) interface{} Algofunc func(int, []int) []int )

memoize is exactly the same as before:
func memoize(f Memfunc, kf Keyfunc) Memfunc {
 if kf == nil {
  kf = func(v ...interface{}) string {
   // Very naive keygen func, simply use string representation
   // of every param
   var buffer bytes.Buffer
   for i := 0; i < len(v); i++ {
    buffer.WriteString(fmt.Sprint(v[0]))
    buffer.WriteString(",")
   }
   return buffer.String()
  }
 }

 cache := make(map[string]interface{})
 return func(v ...interface{}) interface{} {
  cachekey := kf(v)
  if r, ok := cache[cachekey]; ok {
   // Hit, return previous result
   return r
  }
  // Miss, call calculator
  r := f(v...)
  cache[cachekey] = r
  return r
 }
}

The only change we need to make to findshare is to change it to a constructor that accepts a Memfunc and returns the actual algorithm. It is the same patterns as we used for fib:
func findshare(mf Memfunc) Algofunc {

 f := func(target int, treasures []int) []int {
  if target == 0 {
   return make([]int, 0)
  }
  if target < 0 || len(treasures) == 0 {
   return nil
  }
  first, rest := treasures[0:1], treasures[1:]
  solution := toIntSlice(mf(target-first[0], rest))
  if solution != nil {
   return append(first, solution...)
  }
  return toIntSlice(mf(target, rest))
 }

 if mf == nil {
  mf = func(v ...interface{}) interface{} {
   return f(toInt(v[0]), toIntSlice(v[1]))
  }
 }
 return f
}

In the main function we start with doing unmemoized calculations:
func main() {
 treasure := []int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20}
 target := 209

 rawfindshare := findshare(nil)
 duration, res := timeThis(rawfindshare, target, treasure)
 fmt.Printf("Raw partition %v into %d: %d at %s\n", treasure, target, res, duration)

Then we proceed to do the memoized calculations on the same data, constructing the same typecasting code structure as before:
 mfindshare := func() Algofunc {
  // Forward decleration of calculator
  var calcshare Algofunc

  // Memoizer
  memfindshare := memoize(func(values ...interface{}) interface{} {
   return calcshare(toInt(values[0]), toIntSlice(values[1]))
  }, nil)

  // Construct the calculator, use the memonizer defined above
  calcshare = findshare(memfindshare)

  // Return a function that will do the neccessary typecasting
  return func(target int, treasures []int) []int {
   return toIntSlice(memfindshare(target, treasures))
  }
 }()

 duration, res = timeThis(mfindshare, target, treasure)
 fmt.Printf("Memoized partition %v into %d: %d at %s\n", treasure, target, res, duration)

}

The speedup we gain for the memoization is about 96% for this example:
Raw partition [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20] into 209: [2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20] at 558.018056ms
Memoized partition [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20] into 209: [2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20] at 18.783834ms

Get the source at GitHub.

Sunday, August 4, 2013

3.5: The Memoize Module

As mentioned before Go does not have at this time (version 1.1) generics. Go does not have any kind of macro language that allows us to reason about code while compiling. And as far as I know Go has no way to change the function symbol table while compiling. There is a package called debug/gosym but it seems to be more about inspection than meddling; feel free to correct me.

The memoize function in HOP uses some Perl features that are hard to mimic in Go. For one, Perl is very liberal with function signatures, as in there are none. Every function is simply a name that accepts any number and type of arguments through the @_ symbol. 

In Go that means the variadic function f(...interface{}) and a whole lot of typecasting.Second, and far harder, is Perl's ability to dynamically change the symbol table. The idea behind memoize is to speed up recursive functions by caching computations. Chaching the first computation is easy because we can simply call the memoized version rather than the raw version. But that doesn't help because the raw version then calls itself recursively rather than the memoized version, meaning the expensive computation is not cached at all.

But hope is here. Closures, anonymous functions and functions as first class objects ride in to save the day. Sort of.

First we start by defining a few function types. Keyfunc is a function to calculate the cache key for the memoizer. Memfunc is the memoizer. And Algofunc is the type of the algorithm we want to speed up. In this example we are using the fibonacci computation.


type (
 Keyfunc  func(...interface{}) string
 Memfunc  func(...interface{}) interface{}
 Algofunc func(int) int
)

The memoizer accepts two functions. First a Memfunc which is the algorithm wrapped in a typecasting function; and a Keyfunc. It returns the actual memoizer that is a closure that contains a reference to the cache and the keyfunc.


If no Keyfunc is supplied we construct a naive key function that simply runs through a slice of interface values and constructs a string representation of its value with Sprint. Then the cache is created and a memoizer is constructed and returned. It simply constructs the cache key, checks the cache for already computed result, returns it if it exists, otherwise it calls the algorithm supplied, stores the result and returns it.



func memoize(f Memfunc, kf Keyfunc) Memfunc {
 if kf == nil {
  kf = func(v ...interface{}) string {
   // Very naive keygen func, simply use string representation
   // of every param
   var buffer bytes.Buffer
   for i := 0; i < len(v); i++ {
    buffer.WriteString(fmt.Sprint(v[0]))
    buffer.WriteString(",")
   }
   return buffer.String()
  }
 }

 cache := make(map[string]interface{})
 return func(v ...interface{}) interface{} {
  cachekey := kf(v)
  if r, ok := cache[cachekey]; ok {
   // Hit, return previous result
   return r
  }
  // Miss, call calculator
  r := f(v...)
  cache[cachekey] = r
  return r
 }
}

Next we create our fibonacci calculator. Ideally this code would be in no way aware of a possible memoization but that is sadly not possible since we need to be able to intercept recursive calls. We need to take a more co-operative route.


This functions accepts a possible memoizer or nil and returns an algorithm. The function that we return is the calculator that calls the supplied memoizer and returns typecasted results from it.


Next we check if a memoizer was supplied and if not we construct a small function that simply calls the fibonacci algorithm and assign that to the memoizing function, thus making the algorithm recursive.


func fib(mf Memfunc) Algofunc {

 // Fib calculator that calls either an memonized function or itself
 f := func(month int) int {
  if month < 2 {
   return 1
  }
  return toInt(mf(month-1)) + toInt(mf(month-2))
 }

 // If no memoize was supplied, create a dummy that simply calls
 // the fib calculator
 if mf == nil {
  mf = func(v ...interface{}) interface{} {
   return f(toInt(v[0]))
  }
 }

 return f
}

In the main function we start by creating a simple fibonacci calculator (by passing nil into the constructor) and calculate a single fibonacci number, timing the execution. 


func main() {
 fibnum := 30
 if len(os.Args) > 1 {
  fibnum, _ = strconv.Atoi(os.Args[1])
 }

 // Regular fib calculation
 rfib := fib(nil)
 duration, res := timeThis(rfib, fibnum)
 fmt.Printf("Raw fib %d: %d at %s\n", fibnum, res, duration)


Then we proceed to finally create the memoized version of the calculator. The constructor starts by declaring a Algofunc variable without actually declaring the function. Then the memoize constructor is called, passing in a anonymous function that calls the previously declared function, typecasting interface parameter to an int. Then the calculator is created, passing in the memoizer for the recursion. Finally an anonymous function gets returned that simply typecasts from Algofunc to Memfunc.

 mfib := func() Algofunc {
  // Forward decleration of fib calculator
  var calcfib Algofunc

  // Memoizer, uses yet to be defined but already declared fib calculator
  memfib := memoize(func(values ...interface{}) interface{} {
   return calcfib(toInt(values[0]))
  }, nil)

  // Construct the calculator, use the memonizer defined above
  calcfib = fib(memfib)

  // Return a function that will do the neccessary typecasting
  return func(month int) int {
   return toInt(memfib(month))
  }
 }()

 // Memoized fib calculation
 duration, res = timeThis(mfib, fibnum)
 fmt.Printf("Memoized fib %d: %d at %s\n", fibnum, res, duration)

What follows is the artists impression of this monstrosity:
Chart of Memoize

I heard you liked closures in your closures in your closures in your closures.


Amazingly, this frankensteinian construction works. Sample run for few choice calculations results in:


Raw fib 1: 1 at 260ns
Memoized fib 1: 1 at 57.066us
Raw fib 10: 89 at 20.838us
Memoized fib 10: 89 at 118.31us
Raw fib 12: 233 at 69.558us
Memoized fib 12: 233 at 142.454us
Raw fib 13: 377 at 117.464us
Memoized fib 13: 377 at 137.451us
Raw fib 14: 610 at 180.078us
Memoized fib 14: 610 at 145.62us
Raw fib 20: 10946 at 3.305775ms
Memoized fib 20: 10946 at 145.757us
Raw fib 35: 14930352 at 5.566529702s
Memoized fib 35: 14930352 at 252.713us
Raw fib 40: 165580141 at 53.862903338s
Memoized fib 40: 165580141 at 364.974us

The raw calculation is faster at lower number, with the memoized version becoming quicker around fib(14). At fib(40) the raw function takes around 54 seconds while the memoized version is still returning within the second (0.36s).

Possibly the most specialized "general" code I've ever written.

Get the source at GitHub.

Saturday, August 3, 2013

Interlude

Chapter two is finished and we are 61 pages into the book. Chapter three (Chaching and Memoization) revolves around memoization as a strategy to speed up recursive calculation.

By the way, Mark Jason Dominus approves.

2.2: Calculator

As in 2.1 we are aiming to separate the branching logic from the algorithm, making it pluggable and the algorithm reusable. And in this example, we actually do code two separate branching tables using the same algorithm for two different effects.

This is a simple Reverse Polish Notation calculator, accepting expressions in the form of "2 3 + 1 -" and evaluating a result depending on the branching logic supplied.

We start with a very similar type declaration as in the previous post, a action function and a dispatch table. The Stack is used to record the state of the evaluation.

type (
 ActionFunc  func(string, *Stack)
 ActionTable map[string]ActionFunc
)

The evaluate function accepts the expression, the branching logic and a stack for the state. It loops through the expression and evaluates which action to take depending on the token. If the token is a number the NUMBER action is selected. Otherwise an action is selected depending on the token or a default one if that fails. The action is then executed.

The function ends with popping the top value of the stack and returning it to the caller.

func evaluate(expression []string, actions ActionTable, stack *Stack) interface{} {
 for _, t := range expression {
  var action ActionFunc
  if _, err := strconv.ParseFloat(t, 64); err == nil {
   action = actions["NUMBER"]
  } else {
   var ok bool
   if action, ok = actions[t]; !ok {
    action = actions["__DEFAULT__"]
   }
  }
  action(t, stack)
 }
 return stack.Pop()
}

The main function starts with declaring a branching logic that supports calculating an expression in RPN format using the +, -, *, / and sqrt operators. Then the evaluate algorithm is called and the result is printed.

func main() {

 calcActions := ActionTable{
  "+": func(token string, stack *Stack) {
   stack.Push(stack.PopFloat() + stack.PopFloat())
  },
  "-": func(token string, stack *Stack) {
   v := stack.PopFloat()
   stack.Push(stack.PopFloat() - v)
  },
  "*": func(token string, stack *Stack) {
   stack.Push(stack.PopFloat() * stack.PopFloat())
  },
  "/": func(token string, stack *Stack) {
   v := stack.PopFloat()
   stack.Push(stack.PopFloat() / v)
  },
  "sqrt": func(token string, stack *Stack) {
   stack.Push(math.Sqrt(stack.PopFloat()))
  },
  "NUMBER": func(token string, stack *Stack) {
   v, _ := strconv.ParseFloat(token, 64)
   stack.Push(v)
  },
  "__DEFAULT__": func(token string, stack *Stack) {
   panic(fmt.Sprintf("Unkown token %q", token))
  },
 }

 v := evaluate(os.Args[1:], calcActions, new(Stack))
 fmt.Printf("Result: %f\n", toFloat(v))

Next we create a branching logic that supports building an Abstract Syntax Tree from a RPN expression. The __DEFAULT__ function is selected for all tokens except numbers, building a tree of slices on the stack. The result is then printed twice, once in the raw format of the Go data structure and once in the form of an infix string, built by the astToString function.

 astActions := ActionTable{
  "NUMBER": func(token string, stack *Stack) {
   stack.Push(token)
  },
  "__DEFAULT__": func(token string, stack *Stack) {
   t := stack.Pop()
   if stack.Len() > 0 {
    stack.Push([]interface{}{token, stack.Pop(), t})
   } else {
    stack.Push([]interface{}{token, t})
   }

  },
 }

 v = evaluate(os.Args[1:], astActions, new(Stack))
 fmt.Printf("AST tree: %v\n", v)
 fmt.Printf("AST to string: %q\n", astToString(toInterfaces(v)))

}

func astToString(tree []interface{}) string {
 if len(tree) == 1 {
  return toString(tree[0])
 }
 if len(tree) == 2 {
  op, val := toString(tree[0]), toInterfaces(tree[1])
  s := astToString(val)
  return "( " + op + " " + s + " )"
 }
 op, l, r := toString(tree[0]), toInterfaces(tree[1]), toInterfaces(tree[2])
 s1 := astToString(l)
 s2 := astToString(r)
 return "( " + s1 + " " + op + " " + s2 + " )"
}


At the end of the file, not shown here, are some auxiliary functions. A stack data structure is defined along with three typecasting functions.

Get the full code at GitHub.