| Site Map:
Software and Data
- Hardware, Data, and Software
'Software' is an essential element of a
computer system. The Hardware doesn't do anything without the Software. From the
'40s thru the '70s EDP (Electronic Data Processing) systems needed only three more elements to be complete:
Personnel, Procedures, and Data. Since then, Networking is essential for business systems.
Software is 'special data' that controls the CPU as it chugs thru I-P-O cycles (Input - Process - Output)
that delight our senses, help satisfy our customers, and inform our managers and executives about the
state of their enterprise.
Mixing software and data in the same RAM is one of the defining characteristics of our von Neumann machines.
Most software is loaded into RAM, and when 'executed' or 'run' it sets to processing data in the same RAM.
Some software is packaged as Firmware, which is burned or flashed into a ROM, EPROM, or other form of
non-volatile memory. The BIOS on a PC or other 'smart' device is an example. When the power
is turned on, the circuitry is designed to load the computer's startup program from a chip on the mainboard,
load the RAM with the OS and UI from a Secondary storage device, then pass control over to OS and the User.
Some computers, like an 'IP Camera' or other appliance, don't involve the User much, they just
boot up and run...
Hardware is physical stuff we can touch and see, and most hardware components
store or handle data in units of bits, bytes, nibbles, and
Data and software are logical entities, somewhat analogous
to 'the story' of a book except that they are read by and control the
computing hardware and not our imaginations.
Data and Programs also have their residence on hard disks in common.
Some files hold data, and others hold programs.
The operating system keeps track of files' locations in directories that
reference both hardware and software.
Data are formatted depending on their type: text or character data is
perhaps encoded as ASCII, EBCDIC, Unicode, or UTF-8; numbers can be:
binary integers for counting, floating-point for engineering calculations,
decimal for accounting, or some other format
appropriate for the calculations at hand.
Programs, the basic unit of Software,
are special data containing instructions for the computer,
often called 'Scripts'. Operating systems are collections of programs, application systems
are collections of programs, many called 'classes'.
Today's 'traditional' Programs (3GL) exist in two states, one for the programmer
and the other for the machine: 'Source code' is edited by programmers into
simple 'text files' encoded as ASCII or EBCDIC using an editor like vi or emacs, or
an IDE-Integrated Development Environment like WebSphere or Visual Studio.
'Object Code' is binary code produced by a 'Compiler' that reads the program/script and
makes Object Code (op codes and operands) appropriate for the CPU at hand.
(These are similar to the code in the Little Man Computer except with many more bits for each word,
and hundreds of instructions instead of a dozen.)
An alternative to the 'traditional' programming environment for several
decades is to have a 'virtual machine' or 'runtime engine' such as the Java Runtime Engine
(or Java Virtual Machine),
or Microsoft's .NET Framework. In these environments, the programmer keys in Source Code but
the compiler doesn't produce object code for a particular CPU, it generates code
for 'middleware' to interpret. Instead of binary code for the CPU, a JAR or DLL
file is output by the compiler.
This Middleware approach allows developers to develop applications that will easily run
on a number of platforms where the middleware has been installed.
Where 'traditionally developed' software will only run on one platform (CPU/OS combo),
software written for middleware will run on any platform that accepts the middleware.
If we want to use the advanced 'Rich Internet Applications' we need to add Java to our browsers
and we can run games or applications delivered as 'Java Applets' or 'Java Net Beans'.
Sun's (lately Oracle's) Java Virtual Machine has been adapted to practically
\ every CPU and OS known to mankind. A Java-developed application will run on
a MAC, a Windows PC, or a Linux Geek's Notebook computer. There is a considerable 'performance hit'
using middleware, but today's computers are so fast it doesn't matter much.
IBM's z-Series Mainframes, mid-range AS400s and i5s use middleware.
IBM's mainframes run VM-Virtual Machine and are compatible with every
mainframe built by IBM since the '70s. Their SLIC-System Licensed Internal Code used
on their proprietary mid-range platforms allows their
software engineers to work without much regard to which CPU is at the heart of the machine.
The Windows' .NET Framework, installed with every version of Windows since XP and Server 2000
is an excellent middleware implementation that is already allowing
Visual Studio-developed applications to run on Linux servers (Mono Project).
Today's Open Source languages depend on an 'interpreter' which works
similar to middleware except human-readable source code is passed to it instead of
binary object code or the 'byte code' that is passed to a runtime engine. With
Open Source, the customer can _see_ the source code, where in tradtional environments
an application can be distributed with only binary or compiled programs and the customer
cannot see the code.
- Software & programming languages
The text covers most of these terms and topics well, make sure you've
got them clearly understood:
- operation code
- In machine language, even simple operations
require several instructions. Machine languages are referred
to as 1st generation languages.
- If programmers had to write in machine language
there would be very few programmers.
- Hardware uses absolute addressing only (remember
the LMC!), nothing is relative, there are no variable names.
- Most programming languages, even assembler, allow
programmers to forget about absolute addressing, let them use
variable names and 'relative addressing' instead.
- A base address and displacement make a relative
address -- dynamic address translation is the process of resolving
these to an absolute address.
- Binary instructions and absolute addresses are
the only thing the CPU 'understands' -- this code is often called
'executable', or 'object code'.
- Assembler languages substitute mnemonics,
abbreviations, for machine instructions and let programmers who
intimately understand the CPU's instruction set write their
'source code' using relative addressing in their assembler
- An 'assembler', or assembler program, reads the
programmer's source program (or programs), resolves all the
relative addressing into absolute addresses, and produces object
code that can be loaded into memory and executed. Assembler
is a 2nd generation language.
- Higher level application programming languages,
3GL, let the programmer work with descriptive variable names,
algebraic notation, and 'control structures' for alternative
selections and loops -- so they can forget about machine
instructions & addressing. (Visual Basic, C++, Java are
- Compilers (or interpreters) read the source code
written in a and convert it to object code that can be
loaded into memory by the OS and executed
- Compilers read the whole program at once and
write an executable that can be run pretty much on its own -- once
compiled the object code executes relatively quickly and can be
- Interpreters execute the program 'on the fly',
interpreting one line of the source program at a time into
executable code -- this is good for a programmer, but results in
relatively slow operation. Many languages provide the opportunity
to develop in 'interpreted mode' and then compile the scripts
into object/executable code when distributing the application.
- Machine language, assembly languages, and 3GL
languages require a programmer to know or invent 'procedures' for
the processor to follow -- they are called 'procedural languages'
and 'scripting' and 'debugging' are ordinary activities with them.
- 4GLs (Fourth Generation Language)
are 'database aware' and include
predefined procedures for most common tasks of user and database
interface, and are sometimes called 'nonprocedural languages' or
'declarative' languages. For example, a 4GL programmer is
able to input a screen's design using a graphical interface and
the 4GL 'writes the program' to make the screen work all the way
from the user interface thru to the database update. 4GLs
eliminate 80% or more of time spent coding routine
processes, and allow programmers to focus on algorithms.
As the GUI is used to design a form, the 4GL writes 3GL code
'in the background'. Visual Studio, when forms are bound to the databasea becomes
a powerful 4GL. Sybase's PowerBuilder, IBM's System Builder,
Oracle's Developer, and Java's Net Beans are other examples of 4GL.
- Structured software
The text book provides a somewhat lofty definition that is entirely
true for this important
concept. But another definition might be more helpful for
teaching how to apply the 'rules of structure' to software. I
like definitions of Structure that reference the three 'logical
structures' available for analysts and programmers in a structured
approach to systems and software design: sequence, alternative selections,
or loops. Something like 'all statements are related only by
sequence, alternative selections, or loops' is a good phrase.
With this in mind, it's only necessary to discover how a modern,
structured language supports these three logical constructs and start
using them. Since most students these days learn programming in
a structured environment, many learn the rules 'by osmosis', and may
not be aware that they tend to use structured techniques.
'Sequence' is easy, and it is the way all our CPUs work: they execute
one instruction after another _unless_ one of the other logical
structures provides a branching instruction.
'Alternative Selections' are implemented in computer languages as
'if/then', 'if/then/else', 'elseif', 'case', and 'switch' statements.
These allow the program to take alternative sequences depending on the
data input into the program. They are the equivalent of the
'branch if register...' instruction in the Little Man
'Loops' are special alternatives that result in a branch back to an
earlier instruction that represents the 'top of the loop.' Most
languages now provide syntax for several structured loops: while,
until, for/next, & for each.
These will be presented in class. Here
are web pages about structure that will also be referenced in
class. They look at using structured techniques to help make
sure that the software is 'correct', that it does the right things at
the right times and produces accurate results.
One reason I harp on structure is because the ability to demonstrate
its mastery, along with Object Orientation, in a portfolio of your system design documentation, or in
a behavioral interview, is one of the ways to convince an interviewer
that you have the 'deep technical skills' they are looking for in many
- Object-Oriented Software
Object Orientation has been very important for software development
and deployment since the 90s. OO was conceived earlier, but was
rather loosely applied in the several languages that implemented
it. Through the 90s it became more and more 'standardized' in
the 'Unified Process' and the various notations of the 'Unified Modeling Language' (UML).
A dozen or so of the leading
OO thinkers got together and agreed on OO constructs and all of us in IT benefit as a result.
Since then, OO concepts have been implemented more or less
consistently in the OO languages: C++, Java, C#.NET, VisualBasic.NET,
and several others support OO constructs like: properties, methods,
events, inheritance, instantiation, encapsulation, & reusability.
.NET is mentioned here because it was the result of Microsoft's effort
to become more thoroughly compliant with this modern Unified Process
and to address basic issues of database access, system security, system monitoring,
web services along with aligning all their application development
languages to work with the .NET Framework that distributes with XP and
later revisions of Windows Server.
The text covers OO briefly, at a very high level. Our students
experience it on the ground level in all programming courses, since
all languages we use in courses are both Structured _and_ Object-Oriented.
Diagram 3.6 shows 'an object' as containing both a data
structure and the methods for processing the data. 'Methods'
are like 'constructor' (aka 'new'), 'get', 'set', 'update database', 'report', 'delete',
and scads of others that describe what the object can do on it's own.
Packaging data structures and methods this way simplifies their deployment and reuse in the future
since OO programmers can easily examine objects and determine what needs to be in the message sent to an object
to make it do what they want in the future.
Sun's tutorial about Object-Oriented
Programming Concepts is a good place to look for more about OO.
In some regards, Sun 'wrote the book' about applying OO to a
For OO Programmers, the big three features of Objects are their Properties,
Methods, and Events. These are what is mostly manipulated in OO programming code like
Visual Basic, C#, or PHP. In non-procedural environments like LabView or MicroSoft's SharePoint
these OO concepts also apply.
SOAP-Simple Object Access Protocol allows computer systems of
any kind to interface directly. Mainframe-to-Mainframe, PC-to-Mainframe,
Customer-to-suppliers. Prior to OO it was difficult and expensive to connect
with different manufacturers' machines or operating systems. Since OO
- Libraries, object modules, linkage editors, load
modules, external subroutines, reentrant code
These terms are back from the good old days when putting a COBOL program into production
on a mainframe required intimate knowledge of the concepts they imply.
The first four terms describe a process of getting high-level language translated into
binary machine code, resolving relative memory locations to fixed,
and getting that code running on the mainframe for the users.
Today's development environments are often much simpler, and use different terms.
'External Subroutines' are programs that perform tasks commonly required of
other programs in the library. Rather than each programmer having to invent
or copy sections of script for totaling the lines on an order, for example,
an external subroutine might be written to be 'called from' any of the
dozens of programs that have to display an order.
This also adds value since changes that are inevitably required in common
routines don't have to be made in each program that uses it --
only the subroutine needs to be edited and recompiled.
'Reentrant Code' is another concept that spans this half century of computing.
This is where a program is loaded in memory for several users, devices, or processes to use simultaneously.
Each user's _data_ is kept in a separate place, and the OS (or program) keeps track of the
state of each user's trip through the program and their data.
In the 'old days' programmers had to exercise these reentrant and 'multi-entrant' techniques in each program they wrote
so that they could support dozens or hundreds of users on a computer that might have an 8-bit CPU and 64K of RAM.
Today these are still a very efficient way of making application resources available
for a large number of users. Some operating systems & application environments
traditionally found on minicomputers do this sharing of object code 'automagically' without the programmers
being burdened with these most arcane tasks.
- Data elements & data structures
'Data structures' are used to get past a limitation of storing records
on a computer's disk or RAM: there is only _one_ order for the
records, and that is where they are actually written on the
The 'relative record concept' introduced in the text demonstrates
this. files contain related records which contain fields. In
many systems records are stored as shown, where all the records are
the same length and each field is the same length.
Without any other 'mechanism' for finding records, a 'sequential
search' is used to find a record, or records, that match some
criteria. In figure 3.19, a search for an address of 'Northside
Mall' would take 'N times longer' than a search for '142 Maple
Street. If there are zillions of records involved, there would
be a lot of unhappy Users if a sequential search was all that is
available. Although the average number of records to be searched
ends up being half the records in a file, the Users would get pretty
jumpy wondering if this was going to be a long search or a short one.
Data structures come to the rescue here. Most computers allow
for 'direct access' to a record stored on disk. _If_ the program
that 'knows' the record's relative position in a file, the mechanism
will get you to it pretty directly, without the sequential search.
Perhaps the most common data structures today are 'indexes' that
provide very quick access, by any number of 'key fields', to any
record in a database: records' relative position can be determined
almost instantly with very little sequential searching.
Figure 3.22 gives an example of how a very small index ordered by last name
might look. If indexes (indices) are properly constructed, Users
can find records instantly without having to 'look up' the record's
position in the file.
It's common for applications to let their users search by name,
address, zip code, phone number, email address, social security
number, account number, stock location, or any other fields that might
apply in a particular record-keeping situation. Even in a huge
database like the IRS or a credit card company keeps, it doesn't take
longer for the system to find the last record on the disk than it does
to find the first.
The cost of the quick access is usually more disk space. A table
that has lots of indexed fields (lots of 'inversions') may require more storage space for the
indices than for the data that are indexed. Today, disk storage is relatively
cheap, clerical time is relatively expensive, and no system should make its users
spend much time searching for records, no matter how large the database.
We'll look at a few data structures in class: Linked Lists, Stacks,
Queues, and Hashes.
- Database Management Systems
DBMSs are software that use various data
structures to store and retrieve records so that programmers don't have
to know, or invent, the tricks for storing and retrieving them.
The alternative to using a DBMS to manage data is putting the data
into 'flat files' and letting programmers manage the best they can
when they need to create, read, update, or delete (CRUD)
records. Without very strong coding standards and architecture,
flat files may lead to a 'Tower of Babel' effect where the more
programmers who work the more tricks have to be learned by those who
work on the code.
Today, the value of DBMSs is recognized and very few systems are
written without one.
A DBMS keeps all the data in its database and provides methods for CRUD as well as
for securing records from those who might only be allowed to R, and not to C, U, or D.
With a DBMS, programmers don't have to worry about the complexity of data
structures used to locate a record's location on the disk, they just have to know the 'database language' and how to
use it in their programs.
'Relational Databases' are most common today. They are all based on 'tables'
with a familiar 'columns and rows' organization for records. RDBMSs allow records to
relate to one another by storing 'foreign keys' that 'point to' other
'Structured Query Language' (SQL) is the more-or-less
standard database language used in relational databases.
Although the Q implies this language might do Query only,
SQL has 'insert' and 'update' verbs used to write records on the database.
Although the relational model is perhaps the most adaptable and simplest to understand,
it is largely based on 'relational algebra' developed to manage records on punched cards,
which actually _do_ have fixed width fields and records,
and it is not always the most efficient model for handling large volumes of
transactions or records, or fields that may vary wildly in length.
So, other database models persist, most of which are valuable because they perform better
than the relational model in some situation or another. 'Heavy
transaction processing' in a mainframe environment, for example, might
stress a relational DBMS and another model, perhaps 'hierarchical' or 'network', will
out perform it.
Common DBMSs are: IBM's DB2, IMS, and their recently acquired U2 products
UniVerse and UniData; Sperry/Univac's DMS is common in government
systems; MySQL and PostgreSQL are Open Source offerings that will run in Windows and *ix platforms; Sybase is on
a lot of minicomputers; Microsoft's SQLServer is maturing nicely and
will take more market share in the medium-to-large server market over the next years;
and there are many others too numerous to mention.
We'll look at a couple of DBMS examples in class: Access & MySQL...