(list 'a 'b 'c) ; => (a b c)
(list (list 'george)) ; => ((george))
(cdr '((x1 x2) (y1 y2))) ; => ((y1 y2))
(cadr '((x1 x2) (y1 y2))) ; => (y1 y2)
(pair? (car '(a short list))) ; => false
(memq 'red '((red shoes) (blue socks))) ; => false
(memq 'red '(red shoes blue socks)) ; => (red shoes blue socks)
(define (my-equal? l1 l2)
(cond ((and (null? l1) (not (null? l2))) false)
((and (null? l2) (not (null? l1))) false)
((eq? (car l1) (car l2)) (equal? (cdr l1) (cdr l2)))
(else false)))
Using the special form of quote we are evaluating,
(car (quote (quote abracadabra)))
The outer quote means we are taking a list of the symbols of 'quote abracadabra', and then with car we take the first item in the list.
Revisit. Almost got it but was hampered by my memory of derivatives.
I'd like to do these ones but this example has created some kind of mental block in my head these last couple days where I don't want to work on the book, so I'm going to just move on for now in order to try to gain back my momentum.
(define (union-set set1 set2)
(if (null? set1)
set2
(union-set (cdr set1)
(adjoin-set (car set1)
set2))))
The above methods will still operate on the new
representation with no changes. To get a performance
improvement adjoin-set
doesn't have to check for
element-of-set?
. Checking for set membership
and intersection will become slower as the lists get longer
because of the duplicates.
Representing sets as binary trees allows O(log n)
search and insertion.
(define (entry tree) (car tree))
(define (left-branch tree) (cadr tree))
(define (right-branch tree) (caddr tree))
(define (make-tree entry left right)
(list entry left right))
(describe (element-of-set? x set)
(cond ((null? set) false)
((= x (entry set)) true)
((< x (entry set))
(element-of-set?
x
(left-branch set)))
((> x (entry set))
(element-of-set?
x
(right-branch set)))))
(describe (adjoin-set x set)
(cond ((null? set) (make-tree x '() '()))
((= x (entry set)) set)
((< x (entry set))
(make-tree
(entry set)
(adjoin-set x (left-branch set))
(right-branch set)))
((> x (entry-set))
(make-tree
(entry set)
(left-branch set)))
(adjoin-set x (right-branch set))))
These are only O(log n)
if the tree is balanced.
(define (tree->list-1 tree)
(if (null? tree)
'()
(append
(tree->list1
(left-branch tree))
(cons (entry tree)
(tree->list1
(right-branch tree))))))
(define (tree->list2 tree)
(define (copy-to-list tree result-list)
(if (null? tree)
result-list
(copy-to-list
(left-branch tree)
(cons (entry tree)
(copy-to-list
(right-branch-tree)
result-list)))))
(copy-to-list tree '()))
1. Do the two procedures produce the same results for every tree? If not, how do the results differ? What lists do the two procedures produce for the trees in Figure 2.16?
The two procedures will produce the same list for all the trees.
2. Do the two procedures have the same order of growth in the number of steps required to convert a balanced tree with n elements to a list? If not, which one grows more slowly?
I think the second grows slower.
(define (list->tree elements)
(car (partial-tree
elements (length elements))))
(define (partial-tree elts n)
(if (= n 0 )
(cons '() elts)
(let ((left-size
(quotient (- n 1) 2)))
(let ((left-result
(partial-tree
elts left-size)))
(let ((left-tree
(car left-result))
(non-left-elts
(cdr left-result))
(right-size
(- n (+ left-size 1))))
(let ((this-entry
(car non-left-elts))
(right-result
(partial-tree
(cdr non-left-elts)
right-size)))
(let ((right-tree
(car right-result))
(remaining-elts
(cdr right-result)))
(cons (make-tree this-entry
left-tree
right-tree)
remaining-elts))))))))
1. Write a short paragraph explaining as clearly as you can
how partial-tree
works. Draw the tree produced by list->tree
for the list (1 3 5 7 9 11)
.
Note that quotient
will truncate the result when both args are integers.
The reason to nest the let
s is so that the enclosing definitions can
be referred to.
The method determines the size of the left tree by taking the floor of
half of the total size, minus one for the root. The left tree is built
recursively from the first left-size
elements in the list. The size
of the right tree is the total size - left-size - 1 for the root.
this-entry
, the root will be the next element after building the left-tree
,
then the right-tree
can be built recursively from the remaining elements.
From there the parts are passed to make-tree
.
5
/ \
1 9
/ \ / \
'() 3 7 11
2. What is the order of growth in the number of steps required
by list->tree
to convert a list of n
elements?
Apparently constant time, but I had to look it up.
Give O(n)
implementations of union-set
and intersection-set
for sets represented as (balanced) binary trees.
Implement the lookup
procedure for the case where the set of
records is structured as a binary tree, ordered by the numerical
values of the keys.
(define (lookup given-key set-of-records)
(cond ((null? set-of-records) false)
((= given-key (entry set-of-records))
(entry set-of-records))
((< given-key (entry set-of-records))
(lookup
given-key
(left-branch set-of-records)))
((> given-key (entry set-of-records))
(lookup
given-key
(right-branch set-of-records)))))
ASCII encodes each character as 7 bits. 2^7 = 128 possible characters.
Generally, to distinguish n symbols requires log_2(n)
bits per symbol.
These codes are fixed-length, each symbol is represented with the same
number of bits. Can also create variable-length codes (e.g. Morse). If
symbols appear with different frequency, a variable-length code can be
more efficient.
A difficulty of variable-length codes is knowing when you've reached the end of a symbol. Some codes (Morse) use a separator character (space). Can also design the code such that no complete code for any symbol is the prefix of the code for another symbol. Called a prefix code. Can achieve significant savings this way.
A particular way is Huffman encoding. A Huffman code can be represented as a binary tree whose leaves are the encoded symbols. Each non-leaf node has a set containing all the symbols below it. Each symbol at a leaf is also assigned a weight -- its relative frequency. Each non-leaf node contains a weight that is the sum of all weights of leaves below it. The weights are used to construct the tree, not in encoding or decoding.
Given a Huffman tree, the encoding of any symbol can be found by starting from the root and move down until the leaf is reached. Each time the left branch is chosen, add a zero, each time the right branch is chosen, add a 1.
To decode, begin at the root and use the message sequence to move down the branches left and right. Once a leaf is reached, record it and start again.
Huffman gave an algorithm for constructing the best tree given an alphabet and relative frequencies. The algorithm is simple: symbols with the lowest frequency should appear further away from the root. Find the two leaves with the lowest weights and merge them to produce a node with them as its left and right branch and their combined weight. Remove the two leaves from the original set and replace by the new node. Repeat until there is one node left, the root. The algorithm doesn't always specify a unique tree: weights may be the same and the left-right choice is arbitrary.
Can represent leaves as follows:
(define (make-leaf symbol weight)
(list 'leaf symbol weight))
(define (leaf? object)
(eq (car object) 'leaft))
(define (symbol-leaf x) (cadr x))
(define (weight-leaf x) (caddr x))
Merge two nodes:
(define (make-code-tree left right)
(list left
right
(append (symbols left)
(symbols right))
(+ (weight left) (weight right))))
Will also need the following selectors:
(define (left-branch tree) (car tree))
(define (right-branch tree) (cadr tree))
(define (symbols tree)
(if (leaf? tree)
(list (symbol-leaf tree))
(caddr tree)))
(define (weight tree)
(if (leaf? tree)
(weight-leaf tree)
(cadddr tree)))
The procedures symbols
and weight
are examples of generic procedures,
they do slightly different things depending on the kind of data they are
called with.
(define (decode bits tree)
(define (decode-1 bits current-branch)
(if (null? bits)
'()
(let ((next-branch
(choose-branch
(car bits)
current-branch)))
(if (leaf? next-branch)
(cons
(symbol-leaf next-branch)
(decode-1 (cdr bits) tree))
(decode-1 (cdr bits)
next-branch)))))
(decode-1 bits tree))
(define (choose-branch bit branch)
(cond ((= bit 0) (left-branch branch))
((= bit 1) (right-branch branch))
(else (error "bad bit: CHOOSE-BRANCH" bit))))
Above we represent sets of symbols in non-leaf nodes with a simple list. We also need sets of leaves and trees, and to successively find the smallest element, so is makes sense to use an ordered representation.
(define (adjoin-set x set)
(cond ((null? set) (list x))
((< (weight x) (weight (car set)))
(cons x set))
(else
(cons (car set)
(adjoin-set x (cdr set))))))
(define (make-leaf-set pairs)
(if (null? pairs)
'()
(let ((pair (car pairs)))
(adjoin-set
(make-leaf (car pair) ; symbol
(cadr pair)) ; frequency
(make-leaf-set (cdr pairs))))))
Define an encoding tree and a sample message:
(define sample-tree
(make-code-tree
(make-leaf 'A 4)
(make-code-tree
(make-leaf 'B 2)
(make-code-tree
(make-leaf 'D 1)
(make-leaf 'C 1)))))
(define sample-message
'(0 1 1 0 0 1 0 1 0 1 1 1 0))
Use the decode
procedure to decode the message, and give
the result.
ADABBCA
The encode
procedure takes a message and a tree and encodes the message.
(define (encode message tree)
(if (null? message)
'()
(append
(encode-symbol (car message)
tree)
(encode (cdr message) tree))))
Write encode-symbol
, a procedure to return the list of bits that encodes
a symbol according to a given tree. It should signal an error if a symbol
is not in the tree at all. Test it with the message in 2.67.
(define (encode-symbol symbol tree)
(if (contains (symbols tree) symbol)
(let ((left (left-branch tree))
(right (right-branch tree)))
(if (contains (symbols left) symbol)
(if (leaf? left)
0
(cons 0 (encode-symbol symbol left)))
(if (leaf? right)
1
(cons 1 (encode-symbol symbol right)))))
(error "bad tree, does not contain symbol")))
Untested in the interpreter (getting fed up with repl.it) but when I trace through it with a pencil it seems to work.
The following takes a list of symbol-frequency pairs (no symbol appears in more than one pair) and generates a Huffman encoding tree according to the algorithm.
(define (generate-huffman-tree pairs)
(successive-merge (make-leaf-set pairs)))
make-leaf-set
is given above and transforms the list of pairs into
an ordered set of leaves. successive-merge
uses make-tree-code
to successively merge the smallest weight elements of the set until
there's one left, which is the desired tree. Write successive-merge
.
(define (successive-merge trees)
(if (null? (cdr trees))
(car trees) ; final element, we are done
(let ((smaller (car trees))
(small (caar trees))
(the-rest (caadr trees)))
(let ((merged (make-tree smaller small)))
(successive-merge (adjoin merged the-rest))))))
I'm currently no longer testing these because repl.it is having problems, but I'm fairly certain the above is correct.
The following eight-symbol alphabet with associated relative frequencies was designed to efficiently encode the lyrics of 1950s rock songs.
A 2 GET 2 SHA 3 WAH 1
BOOM 1 JOB 2 NA 16 YIP 9
Use generate-huffman-tree
to generate a corresponding Huffman
tree, and use encode
to encode the following message:
Get a job
Sha na na na na na na na na
Get a job
Sha na na na na na na na na
Wah yip yip yip yip yip yip yip yip yip
Sha boom
How many bits are required for the encoding? What is the smallest number of bits that would be needed to encode this song if we used a fixed-length code for the eight-symbol alphabet?
Tracing this out, not using the interpreter. To build the tree, the call looks like this,
(generate-huffman-tree
(list (list 'A 2)
(list 'BOOM 1)
(list 'GET 2)
(list 'JOB 2)
(list 'SHA 3)
(list 'NA 16)
(list 'WAH 1)
(list 'YIP 1)))
And inside generate-huffman-tree
, the call to make-leaf-set
yields the list,
BOOM 1
WAH 1
JOB 2
GET 2
A 2
SHA 3
YIP 9
NA 16
And the generated tree,
NAYIPABOOMWAHSHAJOBGET 36
/ \
NA 16 YIPABOOMWAHSHAJOBGET 20
/ \
YIP 9 ABOOMWAHSHAJOBGET 11
/ \
ABOOMWAH 4 SHAJOBGET 7
/ \ / \
A2 BOOMWAH 2 SHA 3 JOBGET 4
/ \ / \
BOOM 1 WAH 1 JOB 2 GET 2
And the encoded message:
11111110
01111011
10000000
00111111
00111101
11000000
00011011
10101010
10101010
10111011
010
So 83 bits are required for the message. Fixed length codes require log_2(n)
bits per symbol and there are 8 symbols so 3 bits per symbol would be required.
There are 36 symbols in the message so 108 total bits would be required.
Suppose we have a Huffman tree for an alphabet of n symbols, and that the relative frequencies of the symbols are 1, 2, 4, ... , 2^n−1
. Sketch the tree for n = 5
; for n = 10
. In such a tree (for general n
) how many bits are required to encode the most frequent symbol? The least frequent symbol?
For n=5
,
s1s2s3s4s5 31
/ \
s5 16 s1s2s3s4 15
/ \
s4 8 s1s2s3 7
/ \
s3 4 s1s2 3
/ \
s2 2 s1 1
For n=10
, the tree would follow the same pattern so I won't draw it. In general, 1 bit is required to encode the most common symbol, and n-1
bits for the least common.
Consider the encoding procedure that you designed in Exercise 2.68. What is the order of growth in the number of steps needed to encode a symbol? Be sure to include the number of steps needed to search the symbol list at each node encountered. To answer this question in general is difficult. Consider the special case where the relative frequencies of the n symbols are as described in Exercise 2.71, and give the order of growth (as a function of n) of the number of steps needed to encode the most frequent and least frequent symbols in the alphabet.