Data abstraction, a methodology for structuring systems such that a program can be specified independently of the way data objects are implemented.
Do so by errecting an abstraction barrier.
So far have used selectors and constructors to do so (e.g. for rational numbers).
It might not always make sense to speak of "underlying representation". There might be more than one useful representation. e.g. a complex number can be represented in rectangular form and polar form.
Also programing systems are often designed by many people over time with requirements that change over time. Not possible for everyone to agree on data representations in advance.
So we also need abstraction barriers to isolate design choices from each and permit different choices to coexist. Also need conventions that permit programmers to incorporate modules into larger systems additively.
Generic procedures -- procedures that can operate on data that may be represented in more than one way. Main technique for building generic procedures will be to work in terms of data objects that have type tags. Also discuss data-directed programming.
Like rational numbers, complex numbers can be represented as ordered pairs. Set can be thought of as a 2D space with two axes, the "real" and "imaginary" axis.
From this POV, the complex number z = x + iy (where i^2=-1) can be thought of as the point in the plane whose real coordinate is x and whose imaginary coordinate is y.
In this representation, addition reduces to addition of coordinates.
Mulitplication is more natural to think of in polar form, as a magnitude and an angle (r and A). The product of two complex numbers is the vector obtained by stretching one by the length of the other and then rotating it through the angle of the other.
To design a system that can work with both representations, assume operations on complex numbers are implemented in terms of four selectors: real-part
, imag-part
, magnitude
, and angle
. Also two constructors: make-from-real-imag
and make-from-mag-ang
.
Then we can perform operations with the most convenient representation:
(define (add-complex z1 z2)
(make-from-real-imag (+ (real-part z1) (real-part z2))
(+ (imag-part z1) (imag-part z2))))
(define (sub-complex z1 z2)
(make-from-real-imag (- (real-part z1) (real-part z2))
(- (imag-part z1) (imag-part z2))))
(define (mul-complex z1 z2)
(make-from-mag-ang (* (magnitude z1) (magnitude z2))
(+ (angle z1) (angle z2))))
(define (div-complex z1 z2)
(make-from-mag-ang (/ (magnitude z1) (magnitude z2))
(- (angle z1) (angle z2))))
To complete the package, need to implement the constructors and selectors in terms of primitive numbers and lists. Should we use rectangular form or polar form?
If using rectangular form, real and imaginary operations are easy but polar coordinates require trigonometric functions:
(define (real-part z) (car z))
(define (imag-part z) (cdr z))
(define (magnitude z)
(sqrt (+ (square (real-part z))
(square (image-part z)))))
(define (angle z)
(atan (image-part z) (real-part z)))
(define (make-from-real-imag x y) (cons x y))
(define (make-from-mag-ang r a)
(cons (* r (cos a)) (* r (sin a))))
If using polar form, magnitude and angle are straightforward, but rectangular coordinates require trignomonetric functions:
(define (real-part z) (* (magnitude z) (cos (angle z))))
(define (imag-part z) (* (magnitude z) (sin (angle z))))
(define (magnitude z) (car z))
(define (angle z) (cdr z))
(define (make-from-real-imag x y)
(cons (sqrt (+ (square x) (square y)))
(atan y x)))
(define (make-from-mag-ang r a) (cons r a))
Regardless, data abstration ensures that the same implementation of add-complex
, sub-complex
, mul-complex
, and div-complex
work with either.
Data abstraction is an application of "principle of least commitment". But it can be carried even further. We can continue to use both representations above instead of choosing one. But we'll need a way to distinguish between them. A straightforward way is to use a type tag.
Assume we have procedures type-tag
contents
, and attach-tag
:
(define (attach-tag type-tag contents)
(cons type-tag contents))
(define (type-tag datum)
(if (pair? datum)
(car datum)
(error "Bad tagged datum: TYPE-TAG" datum)))
(define (contents datum)
(if (pair? datum)
(cdr datum)
(error "Bad tagged datum: CONTENTS" datum)))
Using these, we can define predicates rectangular?
and polar?
:
(define (rectangular? z)
(eq? (type-tag z) 'rectangular))
(define (polar? z) (eq? (type-tag z) 'polar))
Then we must modify the two representations to tag the numbers they create, and rename them to distinguish between them:
(define (real-part-rectangular z) (car z))
(define (imag-part-rectangular z) (cdr z))
(define (magnitude-rectangular z)
(sqrt (+ (square (real-part-rectangular z))
(square (imag-part-rectangular z)))))
(define (angle-rectangular z)
(atan (imag-part-rectangular z)
(real-part-rectangular z)))
(define (make-from-real-imag-rectangular x y)
(attach-tag 'rectangular (cons x y)))
(define (make-from-mag-ang-rectangular r a)
(attach-tag 'rectangular
(cons (* r (cos a)) (* r (sin a)))))
(define (real-part-polar z)
(* (magnitude-polar z) (cos (angle-polar z))))
(define (imag-part-polar z)
(* (magnitude-polar z) (sin (angle-polar z))))
(define (magnitude-polar z) (car z))
(define (angle-polar z) (cdr z))
(define (make-from-real-imag-polar x y)
(attach-tag 'polar
(cons (sqrt (+ (square x) (square y)))
(atan y x))))
(define (make-from-mag-ang-polar r a)
(attach-tag 'polar (cons r a)))
Then we implement generic selectors by checking the tag of the argument and calling the appropriate procedure.
(define (real-part z)
(cond ((rectangular? z)
(real-part-rectangular (contents z)))
((polar? z)
(real-part-polar (contents z)))
(else (error "Unknown type: REAL-PART" z))))
(define (magnitude z)
(cond ((rectangular? z)
(magnitude-rectangular (contents z)))
((polar? z)
(magnitude-polar (contents z)))
(else (error "Unknown type: MAGNITUDE" z))))
(define (angle z)
(cond ((rectangular? z)
(angle-rectangular (contents z)))
((polar? z)
(angle-polar (contents z)))
(else (error "Unknown type: ANGLE" z))))
And we can still use the same procedures add-complex
, sub-complex
, mul-complex
, and div-complex
. Finally we must choose which to use when constructing complex numbers. A reasonable choice is to use whichever we have the arguments for.
(define (make-from-real-imag x y)
(make-from-real-imag-rectangular x y))
(define (make-from-mag-ang r a)
(make-from-mag-ang-polar r a))
The system is now decomposed into three independent parts: the complex number arithmetic operations, the polar implementation, and the rectangular implementation. Note that in this general mechanism, some discipline is required for implementers to add and strip off type tags.
The general strategy of checking the type of a datum and calling an appropriate procedure is called dispatching on a type. A powerful strategy but with two significant weaknesses.
First, the generic interface procedures (real-part
, imag-part
, magnitude
, angle
) must know about all the different representations. e.g. if we add a new representation we need to add a clause to each to check for the new type.
Second, must guarantee that no two procedures in the system has the same name.
Underlying issue of both weaknesses is that the technique is not additive. We need a means for modularizing the system design even further. This is provided by data-directed programming.
Consider that when we deal with a set of generic operations that are common to a set of types we are dealing with a 2D table that contains possible operations on one axis and possible types on the other. The entries in the table are the procedures that implement each operation for each type.
Data-directed programming is designing programs to work with such a table directly. Assume we have two procedures, put
, and get
, for manipulating the operation-and-type table:
(put <op> <type> <item>)
installs the <item>
in the table, indexed by the <op>
and <type>
.
(get <op> <type>)
looks up the <op>
, <type>
entry in the table and returns the item found there. If no item is found, get
returns false.
For now, assume these are included in our language. In Chapter 3 (3.3.3) we will see how to implement these and other operations for manipulating tables.
Here is how data-directed programming can be used in the complex number system. A collection of procedures, called a package is defined and these are interfaced to the rest of the system.
(define (install-rectangular-package)
;; internal procedures
(define (real-part z) (car z))
(define (imag-part z) (cdr z))
(define (make-from-real-imag x y) (cons x y))
(define (magnitude z)
(sqrt (+ (square (real-part z))
(square (imag-part z)))))
(define (angle z)
(atan (imag-part z) (real-part z)))
(define (make-from-mag-ang r a)
(cons (* r (cos a)) (* r (sin a))))
;; interface to the rest of the system
(define (tag x) (attach-tag 'rectangular x))
(put 'real-part '(rectangular) real-part)
(put 'imag-part '(rectangular) imag-part)
(put 'magnitude '(rectangular) magnitude)
(put 'angle '(rectangular) angle)
(put 'make-from-real-imag 'rectangular
(lambda (x y) (tag (make-from-real-imag x y))))
(put 'make-from-mag-ang 'rectangular
(lambda (r a) (tag (make-from-mag-ang r a))))
'done)
And the polar package,
(define (install-polar-package)
;; internal procedures
(define (magnitude z) (car z))
(define (angle z) (cdr z))
(define (make-from-mag-ang r a) (cons r a))
(define (real-part z) (* (magnitude z) (cos (angle z))))
(define (imag-part z) (* (magnitude z) (sin (angle z))))
(define (make-from-real-imag x y)
(cons (sqrt (+ (square x) (square y)))
(atan y x)))
;; interface to the rest of the system
(define (tag x) (attach-tag 'polar x))
(put 'real-part '(polar) real-part)
(put 'imag-part '(polar) imag-part)
(put 'magnitude '(polar) magnitude)
(put 'angle '(polar) angle)
(put 'make-from-real-imag 'polar
(lambda (x y) (tag (make-from-real-imag x y))))
(put 'make-from-mag-ang 'polar
(lambda (r a) (tag (make-from-mag-ang r a))))
'done)
Both representations can use the same names for the procedures bcause they are internal to different procedures. The complex-arithmetic selectors access the table by means of a general "operation" procedure called apply-generic
, which applies a generic operation to some arguments. apply-generic
looks in the table under the same of the operation and the types of the arguments and applies the resulting procedure:
(define (apply-generic op . args)
(let ((type-tags (map type-tag args)))
(let ((proc (get op type-tags)))
(if proc
(apply proc (map contents args))
(error
"No method for these types: APPLY-GENERIC"
(list op type-tags))))))
Using apply-generic
, can define the generic selectors:
(define (real-part z) (apply-generic 'real-part z))
(define (imag-part z) (apply-generic 'imag-part z))
(define (magnitude z) (apply-generic 'magnitude z))
(define (angle z) (apply-generic 'angle z))
Observe these don't change if a new representation is added.
Can also extract from the table the constructors to be used by programs external to the packages in making complex numbers from real and imaginary parts and from magnitudes and angles.
(define (make-from-real-imag x y)
((get 'make-from-real-imag 'rectangular) x y))
(define (make-from-mag-ang r a)
((get 'make-from-mag-ang 'polar) r a))
Section 2.3.2 described a program that performs symbolic differentiation:
(define (deriv exp var)
(cond ((number? exp) 0)
((variable? exp)
(if (same-variable? exp var) 1 0))
((sum? exp)
(make-sum (deriv (addend exp) var)
(deriv (augend exp) var)))
((product? exp)
(make-sum (make-product
(multiplier exp)
(deriv (multiplicand exp) var))
(make-product
(deriv (multiplier exp) var)
(multiplicand exp))))
;; more rules can be added here
(else (error "unknown expression type: DERIV" exp))))
We can regard this program as performing a dispatch on the type of expression to be differentiated. In this situation the "type tag" of the datum is the algebraic operator symbol (such as +
) and the operation being performed is deriv
. We can transform this program into data-directed style by rewriting the basic derivative procedure as
(define (deriv exp var)
(cond ((number? exp) 0)
((variable? exp) (if (same-variable? exp var) 1 0))
(else ((get 'deriv (operator exp))
(operands exp) var))))
(define (operator exp) (car exp))
(define (operands exp) (cdr exp))
a. Explain what was done above. Why can't we assimilate the predicates number?
and variable?
into the data-directed dispatch?
b. Write the procedures for derivatives of sums and products, and the auxiliary code required to install them in the table used by the product above.
c. Choose any additional differentiation rule that you like, such as the one for exponents (Exercise 2.56), and install it in this data-directed system.
d. In this simple algebraic manipulator the type of an expression is the algebraic operator that binds it together. Suppose, however, we indexed the procedures in the opposite way, so that the dispatch line in deriv
looked like
((get (operator exp) 'deriv) (operands exp) var)
What corresponding changes to the derivative system are required?
I'll come back and actually test these out sometime. I was having a bit of a hard time understanding what I was doing until refering to the really detailed solution here.
a. The derivative operations for sum and product are now stored in the table. We can't assimilate the predicate number?
or variable
because they don't have an operator
to inspect.
b.
(define (install-sum-derivative-package)
;; internal procedures
(define (sum-deriv operands var)
(make-sum (deriv (addend operands) var)
(deriv (augend operands) var)))
;; interface to the rest of the system
(put 'deriv '+ sum-deriv))
(define (install-product-derivative-package)
;; internal procedures
(define (prod-deriv operands var)
(make-sum (make-product
(multiplier operands)
(deriv (multiplicand operands exp) var))
(make-product
(deriv (multiplier exp) var)
(multiplicand exp))))
;; interface to the rest of the system
(put 'deriv '* prod-deriv)
c. Not completed.
d. We need to also swap the first two arguments to put
.
Insatiable Enterprises, Inc., is a highly decentralized conglomerate company consisting of a large number of independent divisions located all over the world. The company’s computer facilities have just been interconnected by means of a clever network-interfacing scheme that makes the entire network appear to any user to be a single computer. Insatiable’s president, in her first attempt to exploit the ability of the network to extract administrative information from division files, is dismayed to discover that, although all the division files have been implemented as datastructures in Scheme, the particular data structure used varies from division to division. A meeting of division managers is hastily called to search for a strategy to integrate the files that will satisfy headquarters’ needs while preserving the existing autonomy of the divisions.
Show how such a strategy can be implemented with data-directed programming. As an example, suppose that each division’s personnel records consist of a single file, which contains a set of records keyed on employees’ names. The structure of the set varies from division to division. Furthermore, each employee’s record is itself a set (structured differently from division to division) that contains information keyed under identifiers such as address
and salary
. In particular:
a. Implement for headquarters a get-record
procedure that retrieves a specified employee’s record from a specified personnel file. The procedure should be applicable to any division’s file. Explain how the individual divisions’ files should be structured. In particular, what type information must be supplied?
b. Implement for headquarters a get-salary
procedure that returns the salary information from a given employee’s record from any division’s personnel file. How should the record be structured in order to make this operation work?
c. Implement for headquarters a find-employee-record
procedure. This should search all the divisions’ files for the record of a given employee and return the record. Assume that this procedure takes as arguments an employee’s name and a list of all the divisions’ files.
d. When Insatiable takes over a new company, what changes must be made in order to incorporate the new personnel information into the central system?
a.
(define (get-record employeeId division)
((get 'get-record division) employeeId))
Not completed.
Key idea of data-directed programming is to handle generic operations by dealing explicitly with operation-and-type tables. In 2.4.2, we organized the required dispatching on type by having each operation take care of its own dispatching. This decomposes the operation-and-type table into rows, with each generic operation procedure representing a row in the table.
Alternative strategy: decompose the table into columns, arrange things so a data object, e.g. rectangular number, is represented as a procedure that takes as input the required operation name and performs the operation indicated. e.g.
(define (make-from-real-imag x y)
(define (dispatch op)
(cond ((eq? op 'real-part) x)
((eq? op 'imag-part) y)
((eq? op 'magnitude (sqrt (+ (square x) (square y)))))
((eq? op 'angle) (atan y x))
(else (error "Unknown op: MAKE-FROM-REAL-IMAG" op))))
dispatch)
(define (apply-generic op arg) (arg op))
Called message passing. Data object is an entity that recieves the requested operation name as a "message". A useful technique for organizing systems with generic operations.
Implement the constructor make-from-mag-ang
in message-passing style. This procedure should be analogous to the make-from-real-imag
procedure given above.
(define (make-from-mag-ang r a)
(define (dispatch op)
(cond ((eq? op 'magnitude) r)
((eq? op 'angle) a)
((eq? op 'real-part) (* r (cos a)))
((eq? op 'imag-part) (* r (sin a)))
(else (error "Unknown op: MAKE-FROM-MAG-ANG" op)))))
As a large system with generic operations evolves, new types of data objects or new operations maybe needed. For each of the three strategies—generic operations with explicit dispatch, data-directed style, and message-passing-style—describe the changes that must be made to a system in order to add new types or new operations. Which organization would be most appropriate for a system in which new types must often be added? Which would be most appropriate for a system in which new operations must often be added?