Wednesday, July 17, 2013

1.5: Applications and Variations of Directory Walking

The dirwalk is an generic function (yes, Go lacks generics, we will get to that) that walks a directory tree and executes a supplied file function or a supplied directory function for each entry. 

To start there are some types that need to be declared. ComputeType is our generic type and FileFunc and DirFunc are the two user supplied functions.


type (
 ComputeType interface{}
 FileFunc    func(f *os.File) ComputeType
 DirFunc     func(d *os.File, results []ComputeType) ComputeType
)

Next comes a utility function that is only needed so that the user can call dirwalk without either a FileFunc or a DirFunc if those are not needed. It simply returns a empty value.


func empty() ComputeType {
 var e ComputeType
 return e
}

dirwalk opens a file entry and checks if it is a regular file or a directory. If it is a regular file it calls the FileFunc on it and returns the result. If it is a directory it loops through all entries of that directory calling itself recursively on each entry, collecting the results in a slice. It then calls the DirFunc on the result collection and returns the result of the DirFunc.


func dirwalk(name string, f FileFunc, d DirFunc) ComputeType {
 file, err := os.Open(name)
 if err != nil {
  panic(err.Error())
 }
 defer file.Close()

 stat, err := file.Stat()
 if err != nil {
  panic(err.Error())
 }
 if stat.IsDir() {
  files, err := file.Readdirnames(0)
  if err != nil {
   panic(err.Error())
  }
  name = name + string(os.PathSeparator)
  results := make([]ComputeType, 0)
  for _, subfile := range files {
   results = append(results, dirwalk(name+subfile, f, d))
  }
  if d == nil {
   return empty()
  }
  return d(file, results)
 }
 if f == nil {
  return empty()
 }
 return f(file)
}

Now the lack of proper generics starts to complicate things. So far dirwalk has only used the interface{} type but the point of all this is to compute actual values. Typecasting is needed. To reduce the code the following utility function is used. It tries to typecast and panics on failure. Another way to write this function is to accept a default value and return that on failure.


func int64value(val interface{}) int64 {
 if v, ok := val.(int64); ok {
  return v
 }
 panic(fmt.Sprintf("Unexpected type: %v", val))
}

The FileFunc used here simply returns the size of the file, which is an int64 value.

func filefunc(f *os.File) ComputeType {
 stat, err := f.Stat()
 if err != nil {
  panic(err.Error())
 }
 return ComputeType(stat.Size())
}

DirFunc goes through the results slice, extracting the int64 value for each entry and returning the total.


func dirfunc(d *os.File, results []ComputeType) ComputeType {
 var total int64
 for _, v := range results {
  total += int64value(v)
 }
 return total
}

The main function simply calls dirwalk with the size calculation functions and then extracts the int64 value.


func main() {
 if len(os.Args) != 2 {
  fmt.Printf("Usage %s NAME\n", os.Args[0])
  os.Exit(0)
 }
 fmt.Println(int64value(dirwalk(os.Args[1], filefunc, dirfunc)))
}

Get the source at GitHub.