'Storage' refers to 'Secondary Storage', usually magnetic disk, where data may be kept permanently or indefinitely if they're managed properly. RAM and 'memory' refer to 'Primary Storage' which is used to keep data close to the CPUs while they're working with the data. Primary storage is 'volatile', will lose its data if it loses power. A well-managed disk storage system can be nearly 100% reliable and available.
HDD-Hard Disk Drives have been the ordinary secondary storage medium for data accessed on-line since the 1960's as they supplemented, then replaced magnetic tape for data storage. Tape continues in use for backup of disks and transaction logging. HDD are 'direct access media' where a read/write assembly can quickly find any cylinder, head, and sector. HDDs can also handle sequential files, and can operate 100s of times faster when disk files are processed sequentially. Since the '90s SSD-Solid State Drives have become more and more affordable and will soon eclipse HDD technology, which is also becoming less expensive.
In the '10s Automatic Tiered Storage has emerged to get the best price and performance from storage systems. In a system with automated tiered storage, data that needs the most and quickest access is kept on the very fast but more expensive solid-state SSD or flash storage that works more than ten times faster than HDD. As data becomes less-needed it is moved out to the slower HDD storage tier for 'pretty quick' random access, and may eventually archived on optical media or tape for infrequent reference. Tiered storage works on enterprise-scale, all the way down to PCs, where many 'hybrid drives' with both SSD and HDD capacity keep a copy of recently and frequently used data on SSD and remove the copies from the SSD as they become less-frequently accessed.
Even though disks allow random access, it can take a long time to search for a record on a large disk. A sequential search for one of a billion records requires a average of half a billion reads, which would take a long time.
Records and directories on HDD may be 'indexed' so they may be accessed nearly-instantly by any number of 'keys'. Data structures on huge scales, billions and billions of records, can retrieve or update any record with a maximun of several hits on an index scattered across disks. Indexed Sequential, Hashed, Balanced Tree, and other data structures provide very quick access to the records of commerce. Indexes consume space on storage devices, but the tradeoff is very quick access to data.
A simple example of indexing would be the directory for a company with thousands of employees. A more complex example is 'search engine indexing' which allows Google and other web search engines to provide sub-second response time with useful results for practically any query.
Thumbdrives & other memory sticks, CD, and DVD are other non-volatile, secondary storage devices that can transfer data to and from a machine's RAM, primary storage, for processing.
Secondary storage includes 'off-line' tapes or disks, and 'near-line' technologies like this Tape Storage Robot. Off-line storage means that somebody has to retrieve the tape or disk and mount it on a drive to read it -- this might take minutes or hours depending on where the tape is kept.
With near-line storage tape robots and juke boxes can keep huge amounts of data on magnetic tape cartridges where the robot can find and spool a tape to a magnetic disk within seconds where it's data may be accessed directly. Near-line storage isn't exactly new -- IBM and other legacy manufacturers have been providing robot-tended mass storage since the 1950's.
Punched cards were the ordinary data storage medium from the 1890s well into the 1970s. Punched cards were used for EDP-Electronic Data Processing long before computers arrived in the data processing departments.
The first use of binary data on cards was by the Frenchman Jacquard in the mid-1800s. His looms used data punched on cards to control looms that could weave tapistry, brocades, and other complex patterns into fabric. Charles Babbage, Hermann Hollerith, and others saw how punched cards could be used to store and process graphical data and by the late 1800's Hollerith's and others' machines were kept busy processing business and government data.
Paperwork from business operations, the census, or the courthouse was batched and the data punched onto cards using Keypunch Machines by data entry clerks at a rate something like 10,000 keystrokes per hour. A second clerk at a 'verifying punch' keyed the batch again while reading the stack of cards to check for and correct errors. Batch and 'hash' totals helped keep errors to a minimum, but there were always errors in batched data.
Computers arrived in Data Processing of ordinary businesses in the 1960s, adding to other heavy machines that had been processing data punched on cards since the early-1900s. Keypunches, verifying punches, card readers, card sorters, gang punches, tabulators, printers, bursters, and decollators made for a very noisy environment. The EDP department was busy after close of business and through the night getting the day's business punched onto cards, balanced, processed, and reported to management the next morning.
[[Check Google's images for 'hollerith tabulator, print, list', card sorter, unit record equipment, ibm or univac line printer, decollator, burster, and puchched cards for an assortment of technology in data processing operations as computers arrived in the 1960s.]]
Punched cards provided a durable record that could be read dozens or hundreds of times by sorters, tabulators, gang punches and other other 'unit record' devices. After data was punched into the cards, the rest of the processing was automated. Punched cards were also 'human readable', with the contents printed on the top edge of the cards. This allowed them to be processed manually if needed, perhaps to correct errors found in invoices or reports. It was easy to duplicate cards holding critical data and move them off-site.
These 'unit record' machines handled all sorts of business, with some tabulators able to print a couple or several listings in one pass through huge stacks of cards. Large tabulators included card punches so summary data could be output to cards used as input for later processing of weekly, monthly, or annual reports.
Hollerith and IBM tabulators were the ultimate in the line of clockwork calculators from the rennaisance through the industrial revolution. These electromechanical devices had limited logical capabilities optimzied for sequential processing of records punched on cards.
Tabulators were programmed using plugboards (google 'tabulator plugboard') that were latched into the machine to set it up for running batches of punched cards. Using several counters and accumulators, tabulators could do simple math and print shipping lists, invoices, sales commissions, statements of account, past-due notices, financial statements, inventory and management reports, and other business documents and reports.
Magnetic tape blended into EDP through the '50s and '60s, where a tabulator or other machine could read tape as well as cards. (Google 'early data processing'.) It was very easy to copy a batch of data from a stack of punched cards onto tape for further processing. Or, a 'key to tape' desk wrote the records directly to tape.
Punched Tape was also common after the early 1900's into the 70's. Teletypes could transmit documents from punched tape and punch a tape as a telegram or telex was received. Many offices had typewriters that could punch tape. Early, small computers used by statisticians and engineers used paper tape. There were several in the Hibbs building that we used to learn programming in the '60s.
Magnetic Drum and Magnetic Disk were introduced in the late 1950's, but were relatively expensive through the '60s compared to the legacy of tape and card-oriented data processing equipment. Magnetic drum and disk storage were used only for tasks that required a 'real time answer', like supporting telephone inquiry or repeating radar screens and providing quick replay in a defense or air traffic control facility.
From the '60s into the '80s the cost of magentic disk storage was was roughly $1,000 per Megabyte, prohibitively expensive for many businesses.
Where it could be afforded, it was easy to point out the value of instant access to records as magnetic disks and 'data terminals' became more feasible for data processing through the '70s. The savings in punched cards and paper for reports could offset the cost of a data terminal in a year or two. [[Google green bar paper]] With many management and audit reports only valid for one day, there was a lot of paper to trash, or recycle. The EDP department for a medium-sized business could use a pallet or dumpster load of punched cards and 11X17 paper every few days.
As the cost of on-line terminals dropped and paper rose, it was easy to justify terminals on desks. The improvements in customer service, inventory control, and other business processes made the cost of new technology easier to bear.
OLTP - Online Transaction Processing, POS - Point of Sale, and all kinds of point of transaction data acquisition quickly eclipsed punched cards for those companies that could afford a new computer system with disks, terminals, and primitive networks. OLTP with batch controls greatly reduced the error rate since the people involved in the transactions were keying the data and could catch errors on the spot. Terminals removed the cost of data entry and verification by clerks who were punching cards long after the order was handwritten or typed and were in no position to catch errors.
As small computers came on the scene in the '80s HDDs-Hard Disk Drives evolved to become very economical for storage of all kinds of data for almost any application. As the market for HDD for personal computers and servers warmed up into the '90s magnetic disks became less and less expensive as the capacity doubled every couple of years.
HDD-Hard Disk Drives were supplemented with 'floppy disks' in the '70s, making an economical way to store and transport data and software as small computers came on the scene for word processing and all kinds of business. 'Sneaker Net' was common before LAN tech came on the scene, where you'd write files to share on a floppy and walk it to a colleague or put it in the mail.
Early magnetic disk drives were simply called 'disks' or 'disk drives' for a couple decades into the '70s. The term 'hard disk' or 'HDD-Hard Disk Drive' came along in the '70s to differentiate hard disks from several forms of Floppy Disks that were widely used in word-processing and early desktop computers. Bernouli Disks and Zip Disks emerged and became very common in business, schools, and homes.
Magnetic Disk technology has seen dramatic changes in capacity and speed of access over the past decades with very little change in the basic geometry or physics of disk devices since they became feasible for business use in the '60s. Early disks were as large as 39" and shrunk to 24" by the time they were widely adopted in computer rooms. They spun at a leasurely 1,200 rpm.
Today's ordinary HDD is much smaller and spins at 5,200 rpm, with state-of-the-art disks spinning 12,000 rpm or faster. It's expensive to manufacture a disk that won't distort or fly apart at higher rpms, so we may have reached the the plateau for HDD's speed of rotation.
Read/write heads don't touch the disk surface, but fly so close to it dust or smoke particles can be trapped and ruin the magnetic surface. There were lots of 'head crashes' on early disks where dust, or somebody bumping into the chassis of a disk drive, caused the read/write head to scratch or gouge the magnetic surface. The first line of defense was filters on CRAC-Computer Room Air Conditioners that made computer rooms relatively dust-free.
Today's HDDs are usually packaged as sealed units with simple pressure equalizers that keep dust out but let the sealed unit 'breathe' in response to changes in atmospheric pressure. These sealed units, called 'Winchester Disks', were introduced in the mid-70s by IBM as 30 MegaByte cartridges containing magnetic platters and a read/write mechanism that could be slipped in and out of a disk drive the size of a washing machine. Two spindles in the IBM 3030 gave 60 MegaBytes of on-line storage, were '30/30', like a Winchester 30-30 Repeating Rifle, hence the name. Winchester technology helped make disk storage more reliable by eliminating dust from disk drives.
As the first small disk drives arrived on the market for PCs in the '80s, most of them were Winchester technology, with the whole mechanism in a sealed unit. The 'form factor' for the new generation of hard-drives arriving in the '80s was 8", quickly followed by 5 1/4" for several years, then 3 1/2" and 2 1/2" drives as small PCs and notebooks came along. 1.3" and 1" MicroDrives were used in small computers and cameras, with early units of 20 MegaBytes capacity, growing to 8 GigaBytes as they were eclipsed by flash memory.
Today's USB drives, aka Thumb Drives, and 'the cloud' have eclipsed most of the portable, magentic media. This leaves us with highly reliable, sealed disk units in most desktop and laptop computer systems and solid-state drives or flash memory in our portable computers and phones.
Early disk drives for computers in the '60s and '70s were manufactured with proprietary interfaces that would only attach to the manufacturer's computers. Moat of today's disks have standard interfaces and 'form factors' so any manufacturer's drive will work on any manufacturer's computer. MFM, RLL, SCSI, IDE/PATA, SATA, and SAS are, or have been, standard interfaces adopted since PCs arrived in the marketplace in the '80s.
Starting in the '80s with the advent of PCs, interfaces for smaller computers were standardized, with MFM and RLL disk controllers dominating the market for personal computers, word-processing systems, and small servers. MFM and RLL drives had no electronics on them, only the motors that spun the disks and positioned the read/write heads. All the electronics were on the computer's main board or 'hard disk controllers' plugged into the bus. These drives required periodic maintenance to reformat and refresh the data stored on them. But, they were much less expensive than the proprietary HDDs used on mainframe and midrange computers in the prior decades.
From the '80s through about 2010 workstation/server and mid-range computers in business and enterprise servers were likely to use SCSI-Small Computer System Interface, pronounced 'scuzzy'. SCSI was a common interface for scanners, printers, and tape drives as well as magnetic disks. SCSI Drives were desirable because one SCSI controller can handle up to 15 devices. SCSI drives were more expensive, but were usually faster and better engineered, with MTBF-Mean Time Before Failure of ten or fifteen years. SCSI has had a very long run for enterprise computing, from the '80s through the '00s.
In the late '80s IDE-Integrated Drive Electronics came to the PC and server market, using PATA-Parallel ATA interfaces, usually called 'IDE Drives'. IDE was a leap in technology that put 'drive electronics', a 'little computer', on each disk drive to manage its operations. This made disks more intelligent and reduced the complexity of the disk controllers on the computer.
IDE remains a key feature of modern HDDs and SSDs, regardless of the style of interface. IDE supports ZDR-Zone Density Recording and SMART-Self Monitoring Analysis and Reporting Technology which is essential for the management of disk storage. IDE drives also refresh formatting data on the lowest platter of our drives so that 'low level formatting' is no longer required, where it was an annual event on earlier drives.
In the late '90s SATA-Serial Advanced Technology Attachment replaced PATA as the ordinary disk interface and eclipsed SCSI and PATA in speed. In 2016, SATA continues to be the ordinary HDD technology for personal computers and notebook computers, and for many servers. Although the consumer-grade SATA controller can control only two SATA disks, SATA can be configured to handle large numbers of drives, can handle 'external drives' with eSATA, and SATA drives can be 'hot swappable' allowing them to be swapped in or out without powering down the server.
An ordinary SATA drive is cheaply constructed and would be beaten to an early death if deployed in an enterprise server that is busy 24 hours a day. SATA is better suited for the occasional use through the day by a single user. They are somewhat likely to run their expected service life of five years with light use.
SAS-Serially Attached SCSI emerged in about 2005 as the new 'enterprise disk drive', valuable for enterprise but of little benefit for a home computer or small server. SAS drives cost two or three times more than a SATA drive with similar capacity, but they are engineered to higher standards, are full-duplex (can read and write at the same time), can rotate up to 15,000 RPM compared to an ordinary drive's 5,200, and seek times are twice as fast. SAS provides many options for network managers to tune and optimize performance. Where old parallel SCSI controller could address 15 devices on a controller, SAS can reference more than 65,000 and are ideally suited for managing huge storage arrays for cloud computing and enterprise applications.
SAS drives can be very quick, with transfer speeds of up to 12 Gbps where SATA tops out at 6 Gbps and most are slower.
Lots of prior-generation SCSI drives remain in data centers and network rooms today because they were well-built and have extremely long service lives. They are being obsoleted and new units are likely to be SAS HDD or SSD configured as SAS.
As at 2017, SSD - Solid State Drives are about twice the price of HDD with similar capacity, are about a dozen to twenty times faster, and have other advantages over HDD. HDD will likely become obsolete rather quickly as some remaining limitations are overcome for SSD technologies. HDD 'storage density' has been doubling every couple of years since disks were introduced in the '60s. We wonder how long this doubling can keep up?
The disk and solid state drives in well-managed systems are seldom deployed as single units. They are usually deployed in arrays of two or more drives, called a RAID. RAIDs of three or more disks are used to extend the storage space beyond what a single drive can provide. For example, a RAID5 made of four 1 TByte drives provides about 3 TBytes of storage space.
RAID1 and higher provide redundancy to continue operations and recover gracefully in the wake of the inevitable failure of one of the array's drives. RAID1 and higher can continue operation when a drive fails -- most can survive the faiure of one drive in the array. The operating system sees the RAID as a single unit capable of block transfer and DMA - Direct Memory Access just like a single drive.
A MTBF-Mean Time Before Failure of 5 years is a good guess for inexpensive disk drives, up to 15 if you pay a lot more. This makes about 1/5 or 1/10 a conservative probability that a disk will fail in a year. In practice, drives are more likely to fail very early in their life so network managers like to 'burn in' their disks to rule our manufacturing defects before putting them into service. There are lots of disk drives showing zero errors years past their service life, but we can't trust valuable availability to luck. Any disk may fail, even those that sit on a rack with conditioned, continuous power. Higher level RAID schemes manage the redundancy in disk arrays and greatly increase the availability of data relative to single disks.
Here is the Wikipedia Article on RAID.
There are several Standard RAID Levels from 0 through 6. The level doesn't indicate how many drives are in the RAID, it indicates what techniques are used to make the RAID. A RAID0 may have 2 or more drives. A RAID5 may be made up of 3 to several drives, a RAID3 might have 5 drives. Some manufacturers or storage units use proprietary schemes for RAID other than these standard levels.
'Hybrid' or 'nested' RAIDs are often used these days. For example, a RAID 10 is made up of two RAID 1 mirrors.
Here is a RAID Level Tutorial with good pictures from The Geek Stuff.
RAID0 Provides Zero Redundancy, RAID0 is used to combine independent disks to make more space available than a single disk can provide. For example, a BLOB-Binary Large Object of 3.5 TeraBytes is too large to fit on a single 1 TByte drive, but it can fit on a RAID0 with 4 1-TeraByte drives. The RAID0 appears to the operating system as a single 4 TByte volume. The BLOB can be taken on and analyzed, but shouldn't be regarded as non-volatile storage.
One RAID0 the instructor provided some years back was for a video editor who used a big Mac that worked fine for the 30 or 60 second commericals produced by his agency. When they got a contract to make documentaries there wasn't enough disk space to hold the larger scenes. A RAID0 of four disks in a 'Clarion' storage unit came to the rescue. Because he immediately spooled his work to tapes for backup the un-reliability of RAID0 wasn't an issue.
A RAID0 is much more likely to fail! A RAID0 is _more likely_ to fail than a single large drive, since the loss of any one drive in a RAID0 will cause the whole array to fail.
In a RAID0 the probabilites of a drive's failure are added together rather than multiplied. So, a RAID0 of 2 disks is twice as likely to fail and 4 disks in a RAID0 is 4X more likely to fail. Nearly every RAID0 the instructor has operated for more than a year over the decades has failed -- way back on mid-range computers and more recently on two TiVos with external SATA drives.
RAID levels 1 and higher greatly enhance the reliabiity of disk storage systems by deploying two or more disks in an array rigged so that any _one_ disk in the array may fail without losing data or availability. Although there is some chance one drive in a RAID will fail, there is a much smaller probability that two independent drives in a cleanly-powered array will fail _at the same time_. The probabilites of failure are _multiplied_ in this case, resulting in a very small likelihood of disk failure.
RAIDS also increase the size of the 'volume' available to the operating system. It's always simpler to maintain one volume than many, and a RAID with 5 disks provides 4X more storage than a single drive. There are also performance advantages with more drives in a RAID because more cyinders are available for read/write where a single disk only has one set of arms moving. Modern 64-bit CPUs and LBA larger RAIDs possible, and they're very fast.
As you research RAIDs, consider that the level of the RAID does not indicate how many drives are in it. It describes the technology used to make the RAID work to provide redundancy and enhance performance. Wiki and google are your friends with more details...
When a disk fails a RAID1+ continues to operate with only a small impact on performance. The network manager can replace the failed disk or the entire RAID at a time that will not inconvenience the users. Mid-range computers, mainframes, clouds, storage systems, and hyperconverged systems may automatically manage the redundancy and assist network managers in the replacement of failed resources with no downtime.
Windows and Linux can manage RAIDs, or they may be operated as separate storage units in a NAS or SAN. RAID1 is easy to set up with two identical disks, and the tools to monitor performance and identify a failed drive are similar. These are valuable skills, can be developed with free software and a loose PC!
In some RAID chassis, a failed drive will be clearly identified and some can be 'hot swapped' -- replaced with a new drive without taking the unit off-line. The processor in the RAID chassis will quickly re-mirror the data and add the new drive to the existing array. If the RAID has any age on it, it might make more sense to replace the whole array, it is a good idea to keep a spare RAID that's been burned in.
RAID1 provides redundancy by completely 'mirroring' a drive with another drive. The drives normally operate 'in parallel' so every time one is updated the other is updated exactly the same. If one drive fails, the mirror is there to take over with no loss of data or performance. It's simple, uses 50% of disk capacity for redundancy, and provides no performance increase.
Please google for details of higher-level RAID. This next is quick rule of thumb:
RAID3 and Up use different techniques to provide redundancy, like 'data striping' and 'excess parity'. The major principle for higher-level RAIDs is: if we spread data across an array of drives we don't need to mirror _all_ the bits to keep operating if one drive in the array fails. RAIDs write only enough 'parity data' or Hamming code, or other code on one of the other disks to reconstruct the data missing if a disk drive fails or develops a bad cylinder.
The 'cost' in storage space to achieve greater reliability is approximately the capacity of one drive in the array. For example, in a RAID3 with 5 disks of 1 TeraByte each the total storage available to the operating system would be approximately 4 TeraBytes, only 20% of the drives' capacity is used to provide redundancy. Compare this with a RAID1, with a mirrored pair of disks, which uses 50% of the drives' capacity for redundancy.
Some higher-level RAIDs with more disks offer a significant performance benefits relative to single drive or RAID1 because there are more read/write heads active so more than one cylinder may be updated at a time.
RAID10 is widely used, where two RAID1s are deployed in parallel. This is a simple scheme to implement and operate even though 75% of the drive space is given to redundancy. HDD/SDD are cheap today, every increment of reliability is valuable in a mission-critical system, and simpler solutions are less likely to gang awry. RAID10 also enhances performance in some applications because they can read/write to more than one cylinder at a time.
Look at Wikipedia and other references about RAID if you're interested in the details of exactly how the several RAID levels differ.
RAID does not replace the need for offsite transaction logging and backup! RAID does not replace remote warm or hot sites and parallel operations. RAID1 and higher can make disk storage much more reliable than stand-alone drives but it does not replace backup or transaction logging. There are several ways a RAID may fail or lose data, from a failed RAID controller through a clumbsy or malicious employee who destroys data using the File Explorer or command line.
The #1 rule for transaction logging and backup is that the log and backup media must be off-site, preferably some distance from the system they backup. In the event of a building or regional disaster or theft a local RAID is as likely as the rest of the system to be lost or stolen. RAID levels 1 and higher are to improve reliability, performance, and mitigate risk of loss of availability, NOT to provide backup.
It's not safe to consider any one machine or system 100% safe or reliable. A RAID may suffer the loss of a 2nd drive before an inattentive network manager is aware of the problem. A RAID controller can fail. Or, a careless, clumbsy, or malicious employee or cracker can destroy or corrupt the data on the RAID.
If a disk drive or RAID loses power while in operation it may be 'corrupted' and require lengthy fsck or other recovery before it's available for use. This risk is _mitigated_ but _not eliminated_ by using PCU-Power Conditioning Units, UPS-Uninterruptable Power Supplies and backup generators on all computers, storage, and networks.
In 2015 Hard Disks remain the most used technology to keep data 'on-line', with SSD-Solid State Drives gaining market share as they become less expensive.
SDDs are a dozen or more times faster than HDDs in 2017. SDDs do not suffer 'rotation' and 'seek' delays since there are no moving or spinning parts. And their semi-conductor circuits are much less fragile than HDDs which makes them ideal for portable computers. SSDs are about a dozen times as expensive as magnetic disks, but were 50X to 100X more expensive per megabyte several years back.
The speed advantage with SSD can be dramatic with 'disk bound' operations. Booting up a PC or operating a DBMS is much faster with SSD technology. Database servers thrive on speed of access. SSD is cheap relative to slow access for customers and employees.
'SSD' is flash memory optimized for the application and packaged in a unit that attaches to a disk controller using the same cables as an ordinary SATA, SCSI, or SAS disk drive. The SSD behaves just like the disk technology it replaces, including SMART. Other schemes for packaging the flash memory don't maintain compatability with the legacy disk controllers and may not do block-mode transfer or DMA.
It's likely that SSD and other flash storage technologies will eclipse HDD for practically every application within a few or several years. Really, why would somebody want to keep data on a bunch of spinning disks accessed by flying & seeking heads when some solid-state structures can do better?
In recent years we need to add SSD-Solid State Drives to any discussion of storage. SSD technology has become less and less expensive and more and more desirable for both personal and server platforms and is replacing the older HDD technology in many applications.
The price 'per megabyte' of hard disk storage has dropped from $1000 as the technology matured in the '70s to a few pennies in 2015. SSD in the '90s was $50 per megabyte, is now around $1. Since SSD is a superior technology it will likely make HDD obsolete within several years as SSD prices come to par with HDD.
'Solid State Disk', as some say, is a misnomer since there are no disks involved. The term has stuch because a SSD _behaves_ the same as the Hard Disk Drive except they are dramatically faster because there are no moving parts. The ordinary SSD can plug into the same disk controller as its HDD counter-part: SATA, SCSI, and ancient IDE controllers can all support both HDD and SSD.
The speed for SSD access is much faster, five to twelve+ times, than for HDD-Hard Disk Drives. But, SSD access is still thousands of times slower than RAM access so can't replace RAM.
A solid state drive is semiconductor memory without any moving parts so the SSD is not nearly as fragile as a hard disk drive. The SSD makes for much more reliable notebook computers, especially the 'ultrabooks'. Notebooks' HDDs are notoriously un-reliable because they're jiggled while in use and get hard knocks as they travel with us in backpacks and bags or on the floor of a car.
SSDs are are also not 'magnetic' and hold their data in non-volatile memory, in the form of 'floating gates' implemented with VLSI circuits. SSDs use 'flash memory' similar to the non-volatile memory in SD cards and memory sticks that we attach via USB or other slot on the bus. The SSD version of flash memory is inside a chassis that behaves like a HDD and attaches via a disk controller like SATA or SCSI.
Google 'SSD Array' to see how these devices can be packaged for 'big data'.
SSD's are 'greener', require less power to operate and cool. While they're idle they consume very little power, as low as .1 watt, and only about a watt under full load. There is negligible time for them to 'spin up' so if they can be switched off and consume 0 power until needed. This compares to 6-8 watts for a HDD which takes valuable seconds to spin up. Starting and stopping HDDs shortens their service life, so most HDDs in server applications are powered-up and spinning for their entire service life.
Earlier SSD technology was limited in how many times data could be 'written over' so the best uses were for WORM-Write Once Read Many application such as archiving data that will never be updated. This restriction has been overcome in recent years so SSD can work for OS and other files that are frequently updated.
'Expect Exponential Growth' is a mantra at conferences for IBM mid-range and mainframers. The phrase recognizes the lots of businesses that have gone bust trying to accomodate 100+% growth when their non-scalable, server-based system couldn't handle the business. Organizations are expected grow 'incrementally', maybe a few percent or more a year. But, sometimes they double or quadruple, growing exponentially, maybe because they 'catch on' or buy a much larger competitor. Without a strategy to handle exponential growth it can cripple the organization.
One reason mid-range and mainframe computers can scale up to handle such large loads is because all the components are in one or more chassis, with multiple busses that extend across the chassis and 'channels' that interconnect the busses. RAM, CPUs, HDD, SSD, and high-speed optical WAN and LAN networks are all in there, attached to the same busses and channels that can move data much faster than a network can. And, they are fault-tolerant and can be expanded or repaired without taking the system down -- most run their entire service life without any un-planned downtime, and some are never down. Mainframes hold the records for sheer speed of absorbing transactions and handling database. Mid-range and mainframe computers are well-understood technologies that are well-supported by hardware and software engineers and are easy to run 'in parallel' to provide geographically separated redundant systems. The can 'scale up' their storage into DASD chassis that attach to the main busses.
Other approaches to scaling involve large numbers of smaller machines in Clusters or Server Farms that are controlled by software to provide a dead-reliable system using generic server hardware. The Cloud Operating Systems can do it. Windows, RedHat, Oracle, IBM, and several other softwares will manage server farms scattered across the globe to provide as close to 100% availability as is possible with mid-range and mainframe computers. Instead of buying a big, expensive machine that is fault-tolerant the server farm approach is to buy lots of inexpensive computers and storage systems and use software to make the system fault-tolerant. Software for virtualization, clustering, clouds, grids, and SDN-Software Defined Networking all works together to make a very reliable, scalable system that can be spread out globally to provide nearly 100% availability.
'Hyperconverged System' is a modern term since the '10s, is a way to implement the 'Web-Scale IT' that makes Google and FaceBook work so well. These systems combine storage, computing, and networking resources in large numbers of small machines that resemble rack-mounted servers in high-speed optical networks to provide a highly automated and greatly simplified, fault-tolerant environment that is easier to administer than today's legacy of server farms. With web-scale IT, if more resources are needed, or a server unit fails, new units are added to the system and the system incorporates them more-or-less automatically into the system.
Nutanix is a leader in commercial hyperconverged systems. They combine all these devices for storage, computing, and networking and connect them with high-speed optical networks, makeing a fault-tolerant platform that will scale easily. The proprietary Nutanix OS is built to host lots of virtualized operating systems on Intel CPUs. Scaling a Nutanix system is a simple matter of adding another block of hardware with the desired amount of storage, RAM, and CPU capacity. The Nutanix 'hypervisor' automatically configures the new resources, mirrors, and deploys them. Nutanix allows virtualization of large numbers of any OS that runs on Intel/x86 so it can hang lots Linux and Windows platforms under its hyperconverged armpits and dance.
VMWare, Citrix XenServer, RedHat, MicroSoft, IBM, HP, Oracle and others provide more-or-less complex solutions to converging and virtualizing computing, storage, and networking capabilities. Many of them require expertise of a team to integrate and manage separate systems for storage, computing, and networking. A hyperconverged system like Nutanix greatly simplifies the infrastructure.
Mainframes are the least expensive, simplest, best-supported way to support a _huge_ application environment. But, with an entry fee upwards of a million $, few startup organizations want to pay to play this way. Mid-range, server farm, and hyperconverged solutions allow a startup to get running for minimum layout and scale incrementally or exponentially as needed. The important thing is to choose a VAR or team with demonstrated success building highly scalable systems.
Although it sounds very retro, magnetic tape continues to be the medium of choice for backup of data and maintaining the 'transaction logs' that are essential for maintaining the integrity of business systems. Google 'tape storage robot' and 'tape jukebox' to get images showing the state of the art for this relatively ancient technology.
Tape is a 'sequential access medium' ideally suited for transaction logging, recovering to point of failure, and auditing all types of system data. HDD allows direct access to data and it easy to change a record undetected unless transaction logs are maintained. Records on sequential tape can't be changed without difficulty or collusion, especially if the tape is kept off-site in a robot or juke box that monitors logs all access to the tapes, held out of the reach of network technicians and managers. Transaction tapes help keep network managers and technicians honest since they know any pilferage is likely to be noticed.
Backup sets are retained for years and regularly reviewed to ensure the integrity of system data and recover from lost records. There is no less expensive way to keep multiple copies of backups and transaction logs off-site than to put them on tape and get them to another place. If an unscruplous employee or cracker discovers that they can delete or change records of the past they _will_ find a way to exploit it and steal. Theft or corruption of records may not be discovered until some 'end of period' process reveals it -- if you don't keep backups you can't get it back.
Automated robots and jukeboxes can make data on tapes available 'near-line', not quite 'on-line' but still pretty quick. A robot or jukebox can find the desired tape and spool its contents onto a disk unit within several seconds.
Today, records are seldom processed directly on tape as they were through the '70s. But, tape remains an important part of enterprise infrastructure.
Tape storage systems are relatively expensive and benefit from the economies of scale. With faster and faster networks, it's common for an enterprise to out-source transaction logging and system backup to a company that has large tape storage robots. This reduces expense and also gets transaction and backup data off-site to protect against data loss in a network room, local, or regional disaster.
CD-Compact Disks and DVD-Digital Versatile Disks or Digital Video Disks can store large amounts of data more or less permanently. The CD-R and DVD-R Rewritable media for home use are _not_ archival quality and should not be considered permanent. Data on these disks is written by using relatively weak lasers to change the color of a dye layer that is somewhat fragile.
CDs and DVDs that are manufactured to hold music, video, or software and those made for enterprise purposes have their data written by powerful lasers that actually form a pattern of 'pits and lands' into the surface. If a large number of copies is needed, as for the release of a movie, they may be made using mechanical techniques to 'press' the pattern into the optical surface. These may be considered 'archival' quality and are not 'writable' or 're-writable' by a computer.
A CD can store 700 MegaBytes, a DVD can store 4.7 GigaBytes of data, and the proprietary Blu-Ray, read using short-wave blue light, about 25 Gigs. DVDs may be 'error protected' by scattering error-recovery code across, similar to the schemes for higher-level RAIDs, so that they can survive scratches or other blemishes on their surface. If not burdened with DRM-Digital Rights Management DVDs are easy to copy so that archival data may be kept in multiple locations and generations.
Industrial-strength CD and DVD-ROMs are economical media for keeping large quantities of records 'archived' on-line for quick access. Starting in the '70s the records of many/most municipalities were scanned and placed on CD, and later on DVDs. Consumer-grade CD and DVD are good options for backing up important records and memories -- if they're 'really important' it would be wise to make multiple copies and copy them periodically to make sure they haven't deteriorated.
Microforms have been used for a long time as a better alternative than paper for archiving records.
Microfilm emerged for record-keeping in the early 1900's and was widely used by the '40s. Reels of film containing records were produced and copied photographically for distribution to keep multiple copies of records for convenience and archival and backup purposes. Microfilm readers in libraries and newspaper offices were busy through the '90s and many collections are still active today. Records on microfilm have easy to scan digitally for decades, and can be copied to digital media and kept on-line or near-line. As character recognition technology has advanced, archivists are able to scan original or filmed records into high-resolution, digital media and translated to text amenable for 'big data' techniques.
Microfiche appeared later, in the mid-60's and are also photographic film, but they are flat, rectangular sheets that hold hundreds of frames making for easy, more direct access to the desired records than scrolling through reels of microfilm. Several schemes were developed for mixing microfiche with computer-readable media such as punched cards or magnetic stripes so that records could be retrieved or processed using EDP techniques. One example of these systems, used for years by the VA State Police, was able to quickly find records in a central unit and send them electronically to remote readers. If a record needed to be updated, system components could punch out the frame with the old record and replace it with a frame holding the revised record.
Old microform technology has been easy to interface with modern digital technology by reading the film records one last time, maybe as they crumbled away, scanning them into digital form where they can be kept as long as they are managed. Newspapers, books, and records of all types from early civilizations through more modern times have been moved through microforms to digital storage and made available on The Internet.
Attachment is a key concept for managing HDD/SSD storage units. Small computers and servers that don't need more than a couple or few Terabytes of storage use DAS-Directly Attached Storage, where the HDD or SSD is attached to a disk controller that is attached to the same bus as the CPU, RAM, and other components. The large chassis and extended busses of midrange and mainframe computers allow for direct attachment of huge arrays of HDDs or SSDs.
In clusters or farms where more than one server requires access to storage, directly attached storage isn't an option. Network attached storage or storage area networks are used to separate storage from the computers so they may share the storage devices.
Here is a picture that shows DAS, NAS, and SAN.
HDDs and SSDs both play in DAS-Directly Attached Storage, NAS-Network Attached Storage, and SAN-Storage Area Network scenarios. SSD technology is quickly gaining share in the number of units deplayed, a gain accelerated in 2015 as prices for SSD continue to fall and capacity to rise. For most purposes, SSD is a 'drop in replacement' for HDD. It runs a dozen or more times faster, uses less power, generates less heat...
DAS-Directly Attached Storage is the quickest and simplest to deploy. Mid-range and Mainframe computers may address many TeraBytes or some PetaBytes of redundant, highly-available DAS. Server-class computers may address several TeraBytes of DAS. If more storage is needed than the platform can address, or there is a requirement to share storage among clustered of farmed computers, NASs and SANs are options.In clusters or farms of server-class computers, computers with relatively small DAS have quick access via NFS-Network File Systems and Web Services to NAS-Network Attached Storage devices or SAN-Storage Area Networks.
HDD or SSD may be DAS-Directly Attached Storage. Controllers for HDD or SSD deployed as DAS are on the same bus as the computer's CPUs, RAM, and other components. DAS is the most simplest, economical, and quickest access. Access to DAS is via very high-speed DMA-Direct Memory Access channels where data moves quickly between RAM and DAS at the direction of the CPUs without burdening them.
Workstation/Server and Mid-range/Mainframe-class machines differ in DAS capacity. A server-class machine may have easy connections for a dozen or more SCSI or SATA disks configured as RAID1 or higher-level RAIDs with several TeraBytes of redundant disk. As systems scale larger, mid-range or mainframe computers with two or three large chassis can put several dozen or hundreds of HDDs and/or SSDs onto the same network of busses and channels that handle dozens or hundreds of CPUs, Terabytes of RAM, and Terabits of I/O on all kinds of WAN and LAN connections.
DASD-Direct Access Storage Devices can be attached to a mid-range or mainframe bus and serve as DAS.
In server clusters or farms the problem is different. All the servers need quick access to the same operational data so One approach is to dedicate a server, or servers, in the farm to 'storage' and let all the other servers use it.
Another approach, usually better, is to use devices built specially for storage. Where DAS won't meet the requirements, NAS-Network Attached Storage or SAN-Storage Area Networks are deployed.
NAS is a chassis with several or dozens of disks, usually with a dedicated processor optimized for storage, that attaches to the same LAN used by the servers and clients. In a small office, or home, this is a relatively inexpensive solution that works well. The drawback with the NAS approach is that it is attached to the same networks as the servers and clients so the network-intense access to read from and write to storage consumes bandwidth and the clients may not have enough to give satisfactory response time for their users.
As the number of clients increases they compete for bandwidth with the relatively large amounts of data moving among the servers and the NAS, and performance suffers.
A SAN solves this problem by using a super-fast, often optical, network in the 'storage area' to isolate servers and storage devices on the SAN and deliver data quickly, sometimes outperforming DAS. The clients have their own network, usually an ordinary, copper-wired LAN, that doesn't have to share the bandwidth with the servers and storage units.
Both NAS and SAN offer advantages for backup of data and can be connected to remote units with high-speed WAN connections to geographically separated locations to protect against a regional, building, or network room disaster that destroys a NAS or SAN. NAS devices provide an easy way for users in a home or small office to save and access data. Drobo, Netgear, Seagate, and other manufacturers provide a range of NAS devices. Hitachi, IBM, HP, Dell, Veritas, Brocade, and other manufacturers sell built-for-purpose SAN systems. Some network managers design their own SANs using commodity-priced components.