VCU INFO300 - Computer Hardware and Software
Site Map:
 

Software and Data

  • Hardware, Data, and Software

    'Software' is an essential element of a computer system. The Hardware doesn't do anything without the Software. From the '40s thru the '70s EDP (Electronic Data Processing) systems needed only three more elements to be complete: Personnel, Procedures, and Data. Since then, Networking is essential for business systems.

    Software is 'special data' that controls the CPU as it chugs thru I-P-O cycles (Input - Process - Output) that delight our senses, help satisfy our customers, and inform our managers and executives about the state of their enterprise.

    Mixing software and data in the same RAM is one of the defining characteristics of our von Neumann machines. Most software is loaded into RAM, and when 'executed' or 'run' it sets to processing data in the same RAM.

    Some software is packaged as Firmware, which is burned or flashed into a ROM, EPROM, or other form of non-volatile memory. The BIOS on a PC or other 'smart' device is an example. When the power is turned on, the circuitry is designed to load the computer's startup program from a chip on the mainboard, load the RAM with the OS and UI from a Secondary storage device, then pass control over to OS and the User. Some computers, like an 'IP Camera' or other appliance, don't involve the User much, they just boot up and run...


    Hardware is physical stuff we can touch and see, and most hardware components store or handle data in units of bits, bytes, nibbles, and words.  

    Data and software are logical entities, somewhat analogous to 'the story' of a book except that they are read by and control the computing hardware and not our imaginations.

    Data and Programs also have their residence on hard disks in common. Some files hold data, and others hold programs. The operating system keeps track of files' locations in directories that reference both hardware and software.

    Data are formatted depending on their type: text or character data is perhaps encoded as ASCII, EBCDIC, Unicode, or UTF-8; numbers can be: binary integers for counting, floating-point for engineering calculations, decimal for accounting, or some other format appropriate for the calculations at hand.

    Programs, the basic unit of Software, are special data containing instructions for the computer, often called 'Scripts'. Operating systems are collections of programs, application systems are collections of programs, many called 'classes'.

    Today's 'traditional' Programs (3GL) exist in two states, one for the programmer and the other for the machine: 'Source code' is edited by programmers into simple 'text files' encoded as ASCII or EBCDIC using an editor like vi or emacs, or an IDE-Integrated Development Environment like WebSphere or Visual Studio. 'Object Code' is binary code produced by a 'Compiler' that reads the program/script and makes Object Code (op codes and operands) appropriate for the CPU at hand. (These are similar to the code in the Little Man Computer except with many more bits for each word, and hundreds of instructions instead of a dozen.)  

    An alternative to the 'traditional' programming environment for several decades is to have a 'virtual machine' or 'runtime engine' such as the Java Runtime Engine (or Java Virtual Machine), or Microsoft's .NET Framework. In these environments, the programmer keys in Source Code but the compiler doesn't produce object code for a particular CPU, it generates code for 'middleware' to interpret. Instead of binary code for the CPU, a JAR or DLL file is output by the compiler.

    This Middleware approach allows developers to develop applications that will easily run on a number of platforms where the middleware has been installed. Where 'traditionally developed' software will only run on one platform (CPU/OS combo), software written for middleware will run on any platform that accepts the middleware. If we want to use the advanced 'Rich Internet Applications' we need to add Java to our browsers and we can run games or applications delivered as 'Java Applets' or 'Java Net Beans'. Sun's (lately Oracle's) Java Virtual Machine has been adapted to practically \ every CPU and OS known to mankind. A Java-developed application will run on a MAC, a Windows PC, or a Linux Geek's Notebook computer. There is a considerable 'performance hit' using middleware, but today's computers are so fast it doesn't matter much.

    IBM's z-Series Mainframes, mid-range AS400s and i5s use middleware. IBM's mainframes run VM-Virtual Machine and are compatible with every mainframe built by IBM since the '70s. Their SLIC-System Licensed Internal Code used on their proprietary mid-range platforms allows their software engineers to work without much regard to which CPU is at the heart of the machine. The Windows' .NET Framework, installed with every version of Windows since XP and Server 2000 is an excellent middleware implementation that is already allowing Visual Studio-developed applications to run on Linux servers (Mono Project).

    Today's Open Source languages depend on an 'interpreter' which works similar to middleware except human-readable source code is passed to it instead of binary object code or the 'byte code' that is passed to a runtime engine. With Open Source, the customer can _see_ the source code, where in tradtional environments an application can be distributed with only binary or compiled programs and the customer cannot see the code.
  • Software & programming languages

    The text covers most of these terms and topics well, make sure you've got them clearly understood:
    • program 
    • instruction
    • operation code
    • operands
    • In machine language, even simple operations require several instructions.  Machine languages are referred to as 1st generation languages.
    • If programmers had to write in machine language there would be very few programmers.
    • Hardware uses absolute addressing only (remember the LMC!), nothing is relative, there are no variable names.
    • Most programming languages, even assembler, allow programmers to forget about absolute addressing, let them use variable names and 'relative addressing' instead. 
    • A base address and displacement make a relative address -- dynamic address translation is the process of resolving these to an absolute address.
    • Binary instructions and absolute addresses are the only thing the CPU 'understands' -- this code is often called 'executable', or 'object code'.
    • Assembler languages substitute mnemonics, abbreviations, for machine instructions and let programmers who intimately understand the CPU's instruction set write their 'source code' using relative addressing in their assembler programs.
    • An 'assembler', or assembler program, reads the programmer's source program (or programs), resolves all the relative addressing into absolute addresses, and produces object code that can be loaded into memory and executed.  Assembler is a 2nd generation language.
    • Higher level application programming languages, 3GL, let the programmer work with descriptive variable names, algebraic notation, and 'control structures' for alternative selections and loops -- so they can forget about machine instructions & addressing.  (Visual Basic, C++, Java are examples)
    • Compilers (or interpreters) read the source code written in a  and convert it to object code that can be loaded into memory by the OS and executed
    • Compilers read the whole program at once and write an executable that can be run pretty much on its own -- once compiled the object code executes relatively quickly and can be used indefinitely.
    • Interpreters execute the program 'on the fly', interpreting one line of the source program at a time into executable code -- this is good for a programmer, but results in relatively slow operation. Many languages provide the opportunity to develop in 'interpreted mode' and then compile the scripts into object/executable code when distributing the application.
    • Machine language, assembly languages, and 3GL languages require a programmer to know or invent 'procedures' for the processor to follow -- they are called 'procedural languages' and 'scripting' and 'debugging' are ordinary activities with them.
    • 4GLs (Fourth Generation Language) are 'database aware' and include predefined procedures for most common tasks of user and database interface, and are sometimes called 'nonprocedural languages' or 'declarative' languages.  For example, a 4GL programmer is able to input a screen's design using a graphical interface and the 4GL 'writes the program' to make the screen work all the way from the user interface thru to the database update.  4GLs eliminate  80% or more of time spent coding routine processes, and allow programmers to focus on algorithms.  As the GUI is used to design a form, the 4GL writes 3GL code 'in the background'. Visual Studio, when forms are bound to the databasea becomes a powerful 4GL. Sybase's PowerBuilder, IBM's System Builder, Oracle's Developer, and Java's Net Beans are other examples of 4GL.

  • Structured software

    The text book provides a somewhat lofty definition that is entirely true for this important concept.  But another definition might be more helpful for teaching how to apply the 'rules of structure' to software.  I like definitions of Structure that reference the three 'logical structures' available for analysts and programmers in a structured approach to systems and software design: sequence, alternative selections, or loops.  Something like 'all statements are related only by sequence, alternative selections, or loops' is a good phrase.

    With this in mind, it's only necessary to discover how a modern, structured language supports these three logical constructs and start using them.  Since most students these days learn programming in a structured environment, many learn the rules 'by osmosis', and may not be aware that they tend to use structured techniques.

    'Sequence' is easy, and it is the way all our CPUs work: they execute one instruction after another _unless_ one of the other logical structures provides a branching instruction. 

    'Alternative Selections' are implemented in computer languages as 'if/then', 'if/then/else', 'elseif', 'case', and 'switch' statements.  These allow the program to take alternative sequences depending on the data input into the program.  They are the equivalent of the 'branch if register...' instruction in the Little Man Computer.  

    'Loops' are special alternatives that result in a branch back to an earlier instruction that represents the 'top of the loop.'  Most languages now provide syntax for several structured loops: while, until, for/next, & for each.

    These will be presented in class.  Here are web pages about structure that will also be referenced in class.  They look at using structured techniques to help make sure that the software is 'correct', that it does the right things at the right times and produces accurate results.

    One reason I harp on structure is because the ability to demonstrate its mastery, along with Object Orientation, in a portfolio of your system design documentation, or in a behavioral interview, is one of the ways to convince an interviewer that you have the 'deep technical skills' they are looking for in many IT positions.  

  • Object-Oriented Software

    Object Orientation has been very important for software development and deployment since the 90s.  OO was conceived earlier, but was rather loosely applied in the several languages that implemented it.  Through the 90s it became more and more 'standardized' in the 'Unified Process' and the various notations of the 'Unified Modeling Language' (UML). A dozen or so of the leading OO thinkers got together and agreed on OO constructs and all of us in IT benefit as a result. 

    Since then, OO concepts have been implemented more or less consistently in the OO languages: C++, Java, C#.NET, VisualBasic.NET, and several others support OO constructs like: properties, methods, events, inheritance, instantiation, encapsulation, & reusability.

    .NET is mentioned here because it was the result of Microsoft's effort to become more thoroughly compliant with this modern Unified Process and to address basic issues of database access, system security, system monitoring, web services along with aligning all their application development languages to work with the .NET Framework that distributes with XP and later revisions of Windows Server.

    The text covers OO briefly, at a very high level. Our students experience it on the ground level in all programming courses, since all languages we use in courses are both Structured _and_ Object-Oriented.

    Diagram 3.6 shows 'an object' as containing both a data structure and the methods for processing the data.  'Methods' are like 'constructor' (aka 'new'), 'get', 'set', 'update database', 'report', 'delete', and scads of others that describe what the object can do on it's own. Packaging data structures and methods this way simplifies their deployment and reuse in the future since OO programmers can easily examine objects and determine what needs to be in the message sent to an object to make it do what they want in the future.

    Sun's tutorial about Object-Oriented Programming Concepts is a good place to look for more about OO.  In some regards, Sun 'wrote the book' about applying OO to a programming language...

    For OO Programmers, the big three features of Objects are their Properties, Methods, and Events. These are what is mostly manipulated in OO programming code like Visual Basic, C#, or PHP. In non-procedural environments like LabView or MicroSoft's SharePoint these OO concepts also apply.

    SOAP-Simple Object Access Protocol allows computer systems of any kind to interface directly. Mainframe-to-Mainframe, PC-to-Mainframe, Customer-to-suppliers. Prior to OO it was difficult and expensive to connect with different manufacturers' machines or operating systems. Since OO
  • Libraries, object modules, linkage editors, load modules, external subroutines, reentrant code

    These terms are back from the good old days when putting a COBOL program into production on a mainframe required intimate knowledge of the concepts they imply. The first four terms describe a process of getting high-level language translated into binary machine code, resolving relative memory locations to fixed, and getting that code running on the mainframe for the users. Today's development environments are often much simpler, and use different terms.

    'External Subroutines' are programs that perform tasks commonly required of other programs in the library. Rather than each programmer having to invent or copy sections of script for totaling the lines on an order, for example, an external subroutine might be written to be 'called from' any of the dozens of programs that have to display an order. This also adds value since changes that are inevitably required in common routines don't have to be made in each program that uses it -- only the subroutine needs to be edited and recompiled.

    'Reentrant Code' is another concept that spans this half century of computing. This is where a program is loaded in memory for several users, devices, or processes to use simultaneously. Each user's _data_ is kept in a separate place, and the OS (or program) keeps track of the state of each user's trip through the program and their data. In the 'old days' programmers had to exercise these reentrant and 'multi-entrant' techniques in each program they wrote so that they could support dozens or hundreds of users on a computer that might have an 8-bit CPU and 64K of RAM. Today these are still a very efficient way of making application resources available for a large number of users. Some operating systems & application environments traditionally found on minicomputers do this sharing of object code 'automagically' without the programmers being burdened with these most arcane tasks.

  • Data elements & data structures

    'Data structures' are used to get past a limitation of storing records on a computer's disk or RAM:  there is only _one_ order for the records, and that is where they are actually written on the media.  

    The 'relative record concept' introduced in the text demonstrates this. files contain related records which contain fields.  In many systems records are stored as shown, where all the records are the same length and each field is the same length.   

    Without any other 'mechanism' for finding records, a 'sequential search' is used to find a record, or records, that match some criteria.  In figure 3.19, a search for an address of 'Northside Mall' would take 'N times longer' than a search for '142 Maple Street.  If there are zillions of records involved, there would be a lot of unhappy Users if a sequential search was all that is available.  Although the average number of records to be searched ends up being half the records in a file, the Users would get pretty jumpy wondering if this was going to be a long search or a short one.

    Data structures come to the rescue here.  Most computers allow for 'direct access' to a record stored on disk.  _If_ the program that 'knows' the record's relative position in a file, the mechanism will get you to it pretty directly, without the sequential search.

    Perhaps the most common data structures today are 'indexes' that provide very quick access, by any number of 'key fields', to any record in a database: records' relative position can be determined almost instantly with very little sequential searching.  

    Figure 3.22 gives an example of how a very small index ordered by last name might look.  If indexes (indices) are properly constructed, Users can find records instantly without having to 'look up' the record's position in the file.  

    It's common for applications to let their users search by name, address, zip code, phone number, email address, social security number, account number, stock location, or any other fields that might apply in a particular record-keeping situation.  Even in a huge database like the IRS or a credit card company keeps, it doesn't take longer for the system to find the last record on the disk than it does to find the first.

    The cost of the quick access is usually more disk space. A table that has lots of indexed fields (lots of 'inversions') may require more storage space for the indices than for the data that are indexed. Today, disk storage is relatively cheap, clerical time is relatively expensive, and no system should make its users spend much time searching for records, no matter how large the database.

    We'll look at a few data structures in class: Linked Lists, Stacks, Queues, and Hashes.

  • Database Management Systems

    DBMSs are software that use various data structures to store and retrieve records so that programmers don't have to know, or invent, the tricks for storing and retrieving them.  

    The alternative to using a DBMS to manage data is putting the data into 'flat files' and letting programmers manage the best they can when they need to create, read, update, or delete (CRUD) records.  Without very strong coding standards and architecture, flat files may lead to a 'Tower of Babel' effect where the more programmers who work the more tricks have to be learned by those who work on the code. 

    Today, the value of DBMSs is recognized and very few systems are written without one. 

    A DBMS keeps all the data in its database and provides methods for CRUD as well as for securing records from those who might only be allowed to R, and not to C, U, or D. With a DBMS, programmers don't have to worry about the complexity of data structures used to locate a record's location on the disk, they just have to know the 'database language' and how to use it in their programs.  

    'Relational Databases' are most common today.  They are all based on 'tables' with a familiar 'columns and rows' organization for records. RDBMSs allow records to relate to one another by storing 'foreign keys' that 'point to' other records. 'Structured Query Language' (SQL) is the more-or-less standard database language used in relational databases. Although the Q implies this language might do Query only, SQL has 'insert' and 'update' verbs used to write records on the database.

    Although the relational model is perhaps the most adaptable and simplest to understand, it is largely based on 'relational algebra' developed to manage records on punched cards, which actually _do_ have fixed width fields and records, and it is not always the most efficient model for handling large volumes of transactions or records, or fields that may vary wildly in length.

    So, other database models persist, most of which are valuable because they perform better than the relational model in some situation or another.  'Heavy transaction processing' in a mainframe environment, for example, might stress a relational DBMS and another model, perhaps 'hierarchical' or 'network', will out perform it.

    Common  DBMSs are: IBM's DB2, IMS, and their recently acquired U2 products UniVerse and UniData; Sperry/Univac's DMS is common in government systems; MySQL and PostgreSQL are Open Source offerings that will run in Windows and *ix platforms; Sybase is on a lot of minicomputers; Microsoft's SQLServer is maturing nicely and will take more market share in the medium-to-large server market over the next years; and there are many others too numerous to mention.

    We'll look at a couple of DBMS examples in class: Access & MySQL...