Introduction

Design Patterns

The legend has it that as a coherent concept design patterns were first introduced by a building architect - Christopher Alexander. In our field of expertise a design pattern can be defined as a description of a generic solution of a software design problem that repeats itself over and over regardless of area of application. A design pattern is not all code. It is a concept that results or materializes in code. It has its fair share of similarity with a recipe.

Design patterns are programming language and operating system independent. In this day and age they must be coded in afresh. If a particular pattern gains enough popularity then programming languages may absorb it into their core though programmers will have to write a certain amount of code to fully exercise it.

Design patterns are traditionally divided into three logical categories - creational, structural and behavioral. Creational patterns deal with how variables are created. Structural patterns deal with how data types and variables are combined to form new types and structures. Behavioral patterns deal with who does what in a group of data types and how their variables communicate with each other.

In addition to these categories we will look at an extra one - encapsulation patterns. This category is introduced first because it will be used in patterns throughout the rest of the text.

Here we interpret encapsulation as two related concepts. As the first concept encapsulation means hiding information. As the second concept encapsulation means binding data and functions together while hiding information.

Both concepts are the pivoting point of the pattern-based implementations of a custom data type in C. The emphasis of these implementations is on the mechanics of constructing a custom data type with C primitives only - ints, chars and the like.

In the chapters dedicated to composition and aggregation we will learn how to create custom encapsulated data types that themselves contain custom encapsulated data types which contain custom encapsulated data types which contain ...

Terminology

Author

is an individual who implements a certain source code arrangement including the one that can be qualified as a design pattern.

Users, End Users

are individuals who put to work a product created by authors. Quite often an author and a user are the same individual.

Library, Component, Package

is a logically and physically atomic unit of work produced by authors (and users). Usually, but not necessarily, it consists of one header file and one shared library per entity.

World, Wild

is everything and anything else that is not a library, component or package.

Per distinct unit of software (project) component and world are the compliments of each other. In set theory notation, if S is a (universal) set of all software, W is a world set and L is a library set, then:

$$L^{c} = W = S \; \backslash \; L$$

Public (adj.)

is an entity that is visible to the world and, by its virtue, to the component (library).

Private

is an entity visible only within the boundaries of its component (library) and completely invisible to the world.

Interface

is a collection of C functions that reveals to the world only what can be done with the underlying data type and nothing else.

Interfaces are abstract. They are minimal and complete. They could be either public or private.

Implementation

is everything that makes an entity usable and operational excluding its interface. It is how the advertisement broadcast by interfaces' what is honored materially.

If interfaces unite, eliminate and simplify then implementations do the opposite. It takes both to create a usable software entity.

Calling interfaces abstract may seem a little strange. After all, from a raw computer science perspective and according to its definition an interface is a collection of C functions each of which has a very concrete and very specific prototype and the total number of these functions is also very concrete - both properties must be set at keyboard time so that both are known at compile time. What gives?

When we call interfaces abstract we apply the concept of abstraction to the absence of the underlying implementation details. Take the various I/O mechanism types - disk files, sockets, message queues and pipes, to name just a few.

To, for example, open an I/O channel message queue-based implementations may use msgget(), disk file-based implementations may use either the section two open() or its close cousin fopen(), socket-based implementations may use either accept() or connect() and so on.

However, if I/O interface offers just one common public function, open(), for example, then its users are shielded from and are not exposed to all of these irrelevant in the large scope details and in that context we call interfaces "abstract".

Borrowing the "general" or "closed form" concept from mathematics we can say that open() is a closed form solution for the "open various types of I/O channels" problem.

Whole

is a custom C data type used to solve a particular problem at hand. Usually, but not necessarily, public. A whole consists of or has parts.

Part

is a (custom) C data type that belongs to a whole as its member - as a C member variable of a structure or a union. Usually, but not necessarily, a part is private (no pun intended).

In the definitions above the words consists of, has, belongs to, etc. are interpreted very narrowly as in members of a C structure (union).

In this text we will distinguish three types of membership. If a part has to be permanently embedded in a whole - created in the same memory block - then we will call it an integral part. By its definition a life time of an integral part equals to that of its whole.

If a part does not have to be permanently embedded in a whole but its life time has to be equal to that of its whole nonetheless then we will call it a non-integral owned part. "Owned" means that a whole still has to manage this part's life cycle.

If a part does not have to be permanently embedded in a whole and its life time is of no consequence to a whole then we will call it a non-integral rented part. "Rented" means that a whole completely ignores the fact that a pointer to some variable is embedded into its definition. See Composition and the related chapters for an in-depth discussion.

A whole can and usually does employ any combination of membership types. A part, on the other hand, should participate in one type of membership only. It should only be an integral or a non-integral owned or a non-integral rented. Requirements determine which type of ownership is appropriate.

For example. An airplane's fuselage, wings and tail should be non-integral rented parts in an aircraft design software because its users would add and remove these parts at will over and over - until they are happy with a final product. In a flight simulation software, however, same parts should be either integral or non-integral owned as its users would want to fly a fully assembled plane without modifying it.

Ancestor

is an existing C data type used as a base for an extension of its functionality.

Descendant

is a new (custom) C data type that extends its ancestor's functionality. A descendant may also become an ancestor if its functionality is extended.

Compile, Recompile

means convert .[hc] files into .o files with a C compiler.

Link, Relink

for applications: convert .o files into an executable image with a C link editor, record any explicit dependencies on required libraries, if any.

for libraries: convert .o files into a shared (dynamic) library with a C link editor, recording any implicit dependencies on other (shared) libraries, if any (we will use only shared libraries in this text).

Build, Rebuild

means first compile then link.

Build Lines

is a collective term for what in practice makes up the bulk of make files.

The Principle of Least Sum

The most popular activity of a long term software project maintenance is keeping up with the never ending stream of requirements changes. We can reduce the latter activity to three distinct operations: modification and removal of existing and addition of new features. In an abbreviated form we will collectively call these activities ARM for Add, Remove and Modify. ARMing software's features then translates directly into ARMing Implementation (encapsulation), Creation, Structure, and Behavior, ICSB for short, of the relevant data types. It, in turn, means that:

some combination of data type's ICSB must be modified
data type's build lines may have to be modified
the corresponding packaging (library) has to be rebuilt

Further, since external software depends (explicitly or not) on a data type's ICSB, then ARMing that ICSB may affect this external software in some way because its:

source code may have to be modified
build lines may have to be modified
binaries may have to be rebuilt

The items in the lists above that have to be acted upon are not interesting because they must be carried out and can not be avoided. Consequently, the items that may be acted upon become interesting and if, for brevity sake, we agree to call ARMing ICSB refactoring, then a question of interest is:

how does component's refactoring affect its external dependencies?

In the branch of theoretical physics called classical mechanics there is the Principle of Least Action. Its Hamiltonian, and most used, formulation states that the first variation of the integral of a Lagrangian of a system over time vanishes. Using the mathematical apparatus of the calculus of variations the principle may be stated as follows:

$$\delta \int_{t_1}^{t_2}L(q(t), \dot q(t)) dt = 0$$

Here $L$ is the system's Lagrangian, the difference between the kinetic and the potential energies, $q(t)$ is a complete set of generalized independent coordinates whose number is determined by the number of the system's degrees of freedom - the number of ways an independent motion in a system can occur, and $\dot q(t) = \frac {dq}{dt}$ is a set of the corresponding velocities - first derivatives of coordinates over time. The integral itself is called "action".

In a more plain and somewhat simplified English the above principle means that out of all the possible trajectories that a system can take during motion between times $t_1$ and $t_2$ the actual or the true ones will be those for which the action is least or stationary.

C programming may not be as glorious as theoretical physics but here, similarly, I propose the Principle of Least Sum:

out of all the possible source code arrangements that one should be chosen which minimizes the sum total of all the files that must be changed in order to ARM ICSB:

$$\sum_i f_i^{\Delta} = min$$

Here $\Delta$ represents a change, $f_i^{\Delta}$ represents a file that must be changed in order to accommodate the requirement(s) change and the sum is called a refactoring sum.

The motivation for the above principle was the following idea: any arrangement of source code (and other objects) can be looked at as a dependency tree - a specialized graph. For each node of such a graph we define traditional operations of addition and removal plus an extra marking operation - that of updating a node. All three operations, essentially, lead to the same outcome - a ripple effect of a trip toward the top of the dependency tree during which we count each and every encountered node (that must be refactored) toward the overall cost of the ARMing operation. Intuitively (and vaguely) we see here an association with finding a shortest path through a graph problem.

Further, within a single human-readable source code file the principle is applied to the language-specific constructs - functions, for example. Within functions the principle is applied to statements, within statements - to keywords and expressions, within expressions - to terms and operators, within terms - to variables and so on.

A similar to Principle of Least Action but different parallel may be drawn between the Principle of Least Sum and the concept of efficiency of a computer science (CS) algorithm which is captured in a $O(g(n)), n \in N, n \to + \infty $ notation, where $n$ is the number of items that a given CS algorithm must process.

Between two CS algorithms performing the same task the one with the slowest growing $g(n)$ when $n \to +\infty $ is classified as more efficient.

Regardless of how a given source code arrangement is qualified, patterned or non-patterned, it will react to refactoring in a certain way. To introduce a numeric characteristic to gauge or quantify the consequences of refactoring I suggest that the above sum is used.

Let us call the numeric value of a refactoring sum the refactoring cost of a given source code arrangement or just the cost. Let us call cost analysis the process of computation of the above cost. In the early chapters we will do this analysis together while in the later chapters you are encouraged to do it yourself even if an explicit request to do so is absent.

Refactoring is a sweeping concept that absorbs everything from changing the value of a single bit in a mask to changing a behavior of a data type. To experience refactoring in its rudimentary manifestation observe the following code (in cat.c):

extern int main( int argc, char* argv[] ) { ssize_t n; char buf[ 1024 ]; int fd = open( "/path/to/a/file", O_RDONLY ); while ( ( n = read( fd, buf, 1024 ) ) > 0 ) { write( fileno( stdout ), buf, n ); } close( fd ); return 0; }

which works perfectly fine and does exactly what it is asked to do but carries the cost of $3$ against "cat a different disk file" requirement.

Proof: to copy a different disk file the following files must be changed: 1) cat.c, 2) cat.o, 3) cat.

In contrast, the following implementation of the same task in cat'.c:

... int fd = open( argv[ 1 ], O_RDONLY ); ...

carries the cost of $0$ against the same requirement. As such, cat'.c is classified as having a lower refactoring cost than cat.c against the given requirement.

However, cat'.c carries the cost of $3$ against "copy output to an arbitrary disk file" requirement (prove it). Assume that cat''.c addresses that issue and carries the cost of $0$ against that requirement. Still, cat''.c carries a non-zero cost against "use a different byte I/O mechanism" requirement as it can not be used to move bytes between hosts, for example, and the list of requirements that can be imposed on a given source code arrangement goes on.

The patterns presented in this text attempt to minimize the refactoring costs of source code arrangements against specific requirements.

Prerequisites

Not much is needed to work through the code in this text. Modern IDEs and make files are not required but can be used, of course, if you wish. However, a bare minimum will do:

a computing device
an operating system
a text editor
an ANSI C Compiler/Linker
a debugger, optional
a command line prompt

My choices were:

a laptop
a 64-bit processor
64-bit OpenSUSE Linux distribution
vi
gcc
gdb
xterm

Code

Each design pattern implementation is accompanied with:

full source code in its entirety (or a description of how to get it)
compile command(s)
link command(s)
execution command(s)
output, if any
design cost analysis

For obvious reasons the code will be minimal with virtually no error handling except where absolutely needed.

While putting a pattern together we will start with a bare minimum also - just one function. It will quickly become evident that initially this is all that is needed. Later on, after the pattern's tiny framework is in place and functioning properly, you are encouraged to add more functions to it as you see fit.

Lastly, I have no intention of keeping up with the quickly changing version numbers of compilers and operating systems. The only thing I can guarantee is that all the code presented in this book compiled without warnings at publishing time.

Final Observations

You can read the chapters in any order you wish. However, to derive the most benefit it would be most advantageous to read the encapsulation and creational chapters in order first since the ADT pattern, for example, is used throughout the rest of the text.

Before you start typing consult your local documentation and find out how to:

build an application that uses at least one library
build a shared library, see below
load a shared library at run time
find an address of a symbol at run time

We will use only shared or dynamic libraries in this book and the following commands to build them:

gcc -g -c -fPIC -I . libfoo.c gcc -g -shared -o libfoo.so libfoo.o

The first line above only compiles a position-independent code into an object file, libfoo.o. The second line takes that object file and packages it into a shared library, libfoo.so

Shared libraries on Apple traditionally carry the .dylib extension. On Windows it is .dll. Find the specifics for your system.

To build an application that uses a library we will use the following commands:

gcc -g -c -I . app.c gcc -g -L . -o app app.c -lfoo

The first line above only compiles a source file into an object file. The second line packages it into an application and records the fact that it has to load a certain library at start up. On UNIX/Linux you may have to set and export LD_LIBRARY_PATH environment variable. Again, your compiler/linker flags may differ. Find the equivalents.

To load a library at run time UNIX/Linux uses dlopen(). On Windows it is LoadLibrary().

To find a symbol's address at run time UNIX/Linux uses dlsym(). On Windows it is GetProcAddress().

For UNIX/Linux Users

dlopen()/dlsym() functions are first used in Factory chapter. Do a little bit of leg work prior. Read the man page. Vendors and distributions vary on minutia. For example, OpenSUSE insisted on the presence of _GNU_SOURCE symbol in order to recognize RTLD_DEFAULT. It also insisted on the presence of -ldl:

gcc -g -D_GNU_SOURCE -c -fPIC -I . libfoo.c gcc -g -shared -o libfoo.so libfoo.o -ldl

A popular UNIX derivative required neither and the following worked just fine:

gcc -g -c -fPIC -I . libfoo.c gcc -g -shared -o libfoo.so libfoo.o

Find out your system's particulars.

$\blacksquare$

top prev next