Home > Notes > Books > Notes and Solutions from Structure and Interpretation of Computer Programs

2.3 Symbolic Data

Exercise 2.53

(list 'a 'b 'c) ; => (a b c)
(list (list 'george)) ; => ((george))
(cdr '((x1 x2) (y1 y2))) ; => ((y1 y2))
(cadr '((x1 x2) (y1 y2))) ; => (y1 y2)
(pair? (car '(a short list))) ; => false
(memq 'red '((red shoes) (blue socks))) ; => false
(memq 'red '(red shoes blue socks)) ; => (red shoes blue socks)

Exercise 2.54

(define (my-equal? l1 l2)
  (cond ((and (null? l1) (not (null? l2))) false)
        ((and (null? l2) (not (null? l1))) false)
        ((eq? (car l1) (car l2)) (equal? (cdr l1) (cdr l2)))
        (else false)))

Exercise 2.55

Using the special form of quote we are evaluating,

(car (quote (quote abracadabra)))

The outer quote means we are taking a list of the symbols of 'quote abracadabra', and then with car we take the first item in the list.

Exercise 2.56

Revisit. Almost got it but was hampered by my memory of derivatives.

Exercise 2.57

Exercise 2.58

I'd like to do these ones but this example has created some kind of mental block in my head these last couple days where I don't want to work on the book, so I'm going to just move on for now in order to try to gain back my momentum.

Exercise 2.59

(define (union-set set1 set2)
  (if (null? set1) 
    set2 
    (union-set (cdr set1) 
               (adjoin-set (car set1) 
                            set2))))

Exercise 2.60

The above methods will still operate on the new representation with no changes. To get a performance improvement adjoin-set doesn't have to check for element-of-set?. Checking for set membership and intersection will become slower as the lists get longer because of the duplicates.

Exercise 2.61

Exercise 2.62

Sets as binary trees

Representing sets as binary trees allows O(log n) search and insertion.

(define (entry tree) (car tree))
(define (left-branch tree) (cadr tree))
(define (right-branch tree) (caddr tree))
(define (make-tree entry left right)
  (list entry left right))

(describe (element-of-set? x set)
  (cond ((null? set) false)
        ((= x (entry set)) true)
        ((< x (entry set))
          (element-of-set?
           x
           (left-branch set)))
        ((> x (entry set))
          (element-of-set?
           x
           (right-branch set)))))

(describe (adjoin-set x set)
  (cond ((null? set) (make-tree x '() '()))
        ((= x (entry set)) set)
        ((< x (entry set))
          (make-tree 
            (entry set)
            (adjoin-set x (left-branch set))
            (right-branch set)))
        ((> x (entry-set))
          (make-tree
            (entry set)
            (left-branch set)))
            (adjoin-set x (right-branch set))))

These are only O(log n) if the tree is balanced.

Exercise 2.63

(define (tree->list-1 tree)
  (if (null? tree)
      '()
      (append 
        (tree->list1 
          (left-branch tree))
        (cons (entry tree)
              (tree->list1
                (right-branch tree))))))

(define (tree->list2 tree)
  (define (copy-to-list tree result-list)
    (if (null? tree)
        result-list
        (copy-to-list 
          (left-branch tree)
          (cons (entry tree)
                (copy-to-list
                  (right-branch-tree)
                  result-list)))))
  (copy-to-list tree '()))

1. Do the two procedures produce the same results for every tree? If not, how do the results differ? What lists do the two procedures produce for the trees in Figure 2.16?

The two procedures will produce the same list for all the trees.

2. Do the two procedures have the same order of growth in the number of steps required to convert a balanced tree with n elements to a list? If not, which one grows more slowly?

I think the second grows slower.

Exercise 2.64

(define (list->tree elements)
  (car (partial-tree 
        elements (length elements))))

(define (partial-tree elts n)
  (if (= n 0 )
      (cons '() elts)
      (let ((left-size
              (quotient (- n 1) 2)))
        (let ((left-result
                (partial-tree 
                  elts left-size)))
          (let ((left-tree 
                  (car left-result))
                (non-left-elts 
                  (cdr left-result))
                (right-size
                  (- n (+ left-size 1))))
            (let ((this-entry
                    (car non-left-elts))
                  (right-result 
                    (partial-tree
                      (cdr non-left-elts)
                      right-size)))
              (let ((right-tree
                      (car right-result))
                    (remaining-elts
                      (cdr right-result)))
                (cons (make-tree this-entry
                                 left-tree
                                 right-tree)
                      remaining-elts))))))))

1. Write a short paragraph explaining as clearly as you can how partial-tree works. Draw the tree produced by list->tree for the list (1 3 5 7 9 11).

Note that quotient will truncate the result when both args are integers. The reason to nest the lets is so that the enclosing definitions can be referred to.

The method determines the size of the left tree by taking the floor of half of the total size, minus one for the root. The left tree is built recursively from the first left-size elements in the list. The size of the right tree is the total size - left-size - 1 for the root. this-entry, the root will be the next element after building the left-tree, then the right-tree can be built recursively from the remaining elements. From there the parts are passed to make-tree.

5
     /  \
    1    9
   / \  / \
'()   3 7  11

2. What is the order of growth in the number of steps required by list->tree to convert a list of n elements?

Apparently constant time, but I had to look it up.

Exercise 2.65

Give O(n) implementations of union-set and intersection-set for sets represented as (balanced) binary trees.

Exercise 2.66

Implement the lookup procedure for the case where the set of records is structured as a binary tree, ordered by the numerical values of the keys.

(define (lookup given-key set-of-records)
  (cond ((null? set-of-records) false)
        ((= given-key (entry set-of-records)) 
          (entry set-of-records))
        ((< given-key (entry set-of-records))
         (lookup 
          given-key
          (left-branch set-of-records)))
        ((> given-key (entry set-of-records))
         (lookup 
          given-key 
          (right-branch set-of-records)))))

2.3.4 Example: Huffman Encoding Trees

ASCII encodes each character as 7 bits. 2^7 = 128 possible characters. Generally, to distinguish n symbols requires log_2(n) bits per symbol. These codes are fixed-length, each symbol is represented with the same number of bits. Can also create variable-length codes (e.g. Morse). If symbols appear with different frequency, a variable-length code can be more efficient.

A difficulty of variable-length codes is knowing when you've reached the end of a symbol. Some codes (Morse) use a separator character (space). Can also design the code such that no complete code for any symbol is the prefix of the code for another symbol. Called a prefix code. Can achieve significant savings this way.

A particular way is Huffman encoding. A Huffman code can be represented as a binary tree whose leaves are the encoded symbols. Each non-leaf node has a set containing all the symbols below it. Each symbol at a leaf is also assigned a weight -- its relative frequency. Each non-leaf node contains a weight that is the sum of all weights of leaves below it. The weights are used to construct the tree, not in encoding or decoding.

Given a Huffman tree, the encoding of any symbol can be found by starting from the root and move down until the leaf is reached. Each time the left branch is chosen, add a zero, each time the right branch is chosen, add a 1.

To decode, begin at the root and use the message sequence to move down the branches left and right. Once a leaf is reached, record it and start again.

Generating Huffman Trees

Huffman gave an algorithm for constructing the best tree given an alphabet and relative frequencies. The algorithm is simple: symbols with the lowest frequency should appear further away from the root. Find the two leaves with the lowest weights and merge them to produce a node with them as its left and right branch and their combined weight. Remove the two leaves from the original set and replace by the new node. Repeat until there is one node left, the root. The algorithm doesn't always specify a unique tree: weights may be the same and the left-right choice is arbitrary.

Representing Huffman Trees

Can represent leaves as follows:

(define (make-leaf symbol weight)
  (list 'leaf symbol weight))

(define (leaf? object)
  (eq (car object) 'leaft))

(define (symbol-leaf x) (cadr x))
(define (weight-leaf x) (caddr x))

Merge two nodes:

(define (make-code-tree left right)
  (list left
        right
        (append (symbols left)
                (symbols right))
        (+ (weight left) (weight right))))

Will also need the following selectors:

(define (left-branch tree) (car tree))
(define (right-branch tree) (cadr tree))

(define (symbols tree)
  (if (leaf? tree)
      (list (symbol-leaf tree))
      (caddr tree)))

(define (weight tree)
  (if (leaf? tree)
      (weight-leaf tree)
      (cadddr tree)))

The procedures symbols and weight are examples of generic procedures, they do slightly different things depending on the kind of data they are called with.

The decoding procedure

(define (decode bits tree)
  (define (decode-1 bits current-branch)
    (if (null? bits)
        '()
        (let ((next-branch
                (choose-branch
                  (car bits)
                 current-branch)))
          (if (leaf? next-branch)
              (cons
                (symbol-leaf next-branch)
                (decode-1 (cdr bits) tree))
              (decode-1 (cdr bits)
                        next-branch)))))
  (decode-1 bits tree))

(define (choose-branch bit branch)
  (cond ((= bit 0) (left-branch branch))
        ((= bit 1) (right-branch branch))
        (else (error "bad bit: CHOOSE-BRANCH" bit))))

Sets of weighted elements

Above we represent sets of symbols in non-leaf nodes with a simple list. We also need sets of leaves and trees, and to successively find the smallest element, so is makes sense to use an ordered representation.

(define (adjoin-set x set)
  (cond ((null? set) (list x))
        ((< (weight x) (weight (car set)))
         (cons x set))
        (else
          (cons (car set)
                (adjoin-set x (cdr set))))))

(define (make-leaf-set pairs)
  (if (null? pairs)
      '()
      (let ((pair (car pairs)))
        (adjoin-set
          (make-leaf (car pair)     ; symbol
                     (cadr pair))   ; frequency
          (make-leaf-set (cdr pairs))))))

Exercise 2.67

Define an encoding tree and a sample message:

(define sample-tree
  (make-code-tree
    (make-leaf 'A 4)
    (make-code-tree
      (make-leaf 'B 2)
      (make-code-tree
        (make-leaf 'D 1)
        (make-leaf 'C 1)))))

(define sample-message 
  '(0 1 1 0 0 1 0 1 0 1 1 1 0))

Use the decode procedure to decode the message, and give the result.

Solution

ADABBCA

Exercise 2.68

The encode procedure takes a message and a tree and encodes the message.

(define (encode message tree)
  (if (null? message)
     '()
     (append 
      (encode-symbol (car message)
                     tree)
      (encode (cdr message) tree))))

Write encode-symbol, a procedure to return the list of bits that encodes a symbol according to a given tree. It should signal an error if a symbol is not in the tree at all. Test it with the message in 2.67.

Solution

(define (encode-symbol symbol tree)
  (if (contains (symbols tree) symbol)
    (let ((left (left-branch tree))
          (right (right-branch tree)))
          (if (contains (symbols left) symbol)
              (if (leaf? left)
                  0
                  (cons 0 (encode-symbol symbol left)))
              (if (leaf? right)
                  1
                  (cons 1 (encode-symbol symbol right)))))
    (error "bad tree, does not contain symbol")))

Untested in the interpreter (getting fed up with repl.it) but when I trace through it with a pencil it seems to work.

Exercise 2.69

The following takes a list of symbol-frequency pairs (no symbol appears in more than one pair) and generates a Huffman encoding tree according to the algorithm.

(define (generate-huffman-tree pairs)
  (successive-merge (make-leaf-set pairs)))

make-leaf-set is given above and transforms the list of pairs into an ordered set of leaves. successive-merge uses make-tree-code to successively merge the smallest weight elements of the set until there's one left, which is the desired tree. Write successive-merge.

Solution

(define (successive-merge trees)
  (if (null? (cdr trees))
      (car trees)   ; final element, we are done
      (let ((smaller (car trees))
            (small (caar trees))
            (the-rest (caadr trees)))
            (let ((merged (make-tree smaller small)))
                 (successive-merge (adjoin merged the-rest))))))

I'm currently no longer testing these because repl.it is having problems, but I'm fairly certain the above is correct.

Exercise 2.70

The following eight-symbol alphabet with associated relative frequencies was designed to efficiently encode the lyrics of 1950s rock songs.

A    2    GET 2    SHA 3    WAH 1
BOOM 1    JOB 2    NA 16    YIP 9

Use generate-huffman-tree to generate a corresponding Huffman tree, and use encode to encode the following message:

Get a job
Sha na na na na na na na na
Get a job
Sha na na na na na na na na
Wah yip yip yip yip yip yip yip yip yip
Sha boom

How many bits are required for the encoding? What is the smallest number of bits that would be needed to encode this song if we used a fixed-length code for the eight-symbol alphabet?

Solution

Tracing this out, not using the interpreter. To build the tree, the call looks like this,

(generate-huffman-tree 
  (list (list 'A    2)
        (list 'BOOM 1)
        (list 'GET  2)
        (list 'JOB  2)
        (list 'SHA  3)
        (list 'NA  16)
        (list 'WAH  1)
        (list 'YIP  1)))

And inside generate-huffman-tree, the call to make-leaf-set yields the list,

BOOM  1
WAH   1
JOB   2
GET   2
A     2
SHA   3
YIP   9
NA   16

And the generated tree,

NAYIPABOOMWAHSHAJOBGET 36
      /                 \
   NA 16       YIPABOOMWAHSHAJOBGET 20
              /           \
           YIP 9    ABOOMWAHSHAJOBGET 11
                    /           \
               ABOOMWAH 4    SHAJOBGET 7
              /     \         /      \
            A2   BOOMWAH 2  SHA 3   JOBGET 4
                 /     \            /    \
              BOOM 1  WAH 1     JOB 2   GET 2

And the encoded message:

So 83 bits are required for the message. Fixed length codes require log_2(n) bits per symbol and there are 8 symbols so 3 bits per symbol would be required. There are 36 symbols in the message so 108 total bits would be required.

Exercise 2.71

Suppose we have a Huffman tree for an alphabet of n symbols, and that the relative frequencies of the symbols are 1, 2, 4, ... , 2^n−1. Sketch the tree for n = 5; for n = 10. In such a tree (for general n) how many bits are required to encode the most frequent symbol? The least frequent symbol?

Solution

For n=5,

s1s2s3s4s5 31
   /       \
s5 16    s1s2s3s4 15
         /     \
      s4 8    s1s2s3 7
              /     \
           s3 4    s1s2 3
                   /    \ 
                s2 2    s1 1

For n=10, the tree would follow the same pattern so I won't draw it. In general, 1 bit is required to encode the most common symbol, and n-1 bits for the least common.

Exercise 2.72

Consider the encoding procedure that you designed in Exercise 2.68. What is the order of growth in the number of steps needed to encode a symbol? Be sure to include the number of steps needed to search the symbol list at each node encountered. To answer this question in general is difficult. Consider the special case where the relative frequencies of the n symbols are as described in Exercise 2.71, and give the order of growth (as a function of n) of the number of steps needed to encode the most frequent and least frequent symbols in the alphabet.

Previous Next