Our accreditors are pleased to see that we cover Data Centers in this course. At my last chat with one of them, we discussed the shift toward 'The Cloud' and how important AWS and other cloud providers are today.
This chapter about Data Centers is a great argument for moving applications to The Cloud. Really, all The Cloud is, is equipment for computing, networking, and storage located in a data center somewhere other than an organization's offices.
In the not so recent past it's been ordinary for for business or enterprise to stake their presence on the web in a network room or data center on their own premises. In more recent years, since the 2010s, it's becomming more and more attractive to de-commission expensive, small data centers, server farms, and network rooms and move the applications they support to The Cloud.
Some organizations are reluctant to give up physical control of their data centers and have good reason to support their valuable legacy of applications and data on equipment on their own premises. If an organization has facilities that can rank as a Tier 1, 2, or 3 data center then it might make sense to keep their own data center operating.
Managed services at AWS, IBM, RackSpace, or other Cloud provider are not cheap and must be evaluated carefully as decisions are made about moving IT Infrastructure to The Cloud. Customers who somehow exceed the bandwidth for their contract can be surprised by their RackSpace bill!
As an example, a VDOT IT Manager I spoke with recently says there have been recent changes in policy for our Gret Commonwealth of Virginia that drop many of the past regulations that limited use of The Cloud, so we may see an exodus to The Cloud instead of computer rooms tended by State Employees?
Expedia shut down all their data centers and moved everything to AWS the day before one of our grads interviewed there and got his job, managing eXpedia's products in The Cloud.
It makes good sense for many application environments to move to The Cloud. Most Unix and Windows server-based application environments are very easy to move from their existing on-site platforms to The Cloud. Local RedHat servers' apps and data that I moved to The Cloud went directly to Centos servers at RackSpace Cloud have run without a hitch and with cheap bandwidth for years. Where their owners paid good money for a T3 with 44.7 Mbps to support their web apps for years, they've got more bandwidth than ever before included in their fee for IaaS - Infrastructure as a Service at RackSpace, way less than the cost of the T3.
Many or most desktop applications for Windows or Mac are also easy to move to The Cloud. An employee's, or student's, desktop and storage is 'virtual' in The Cloud and all they need is a browser or VPN client to access it from anywhere there's good bandwidth. Our 2nd floor lab is an example, and you may have opinions about running a virtual desktop on a thin client? The benefit for the school is that it saves lots of money, the cost for students is that they would rather run on faster computers. Lots of our students have left with their new degrees to work for companies that provide 'virtual desktops' on a large scale for large organizations.
For me, it's been liberating moving to The Cloud. I've had decades since the '80s of being liable to drive across town after Close of Business or at the crack of dawn to do stuff with tapes that show up in the mail, or start some server, or reset it's router, or other task that required being on-site at 'the console' of a server or at an employee's desk. Now, my customers' apps are running in The Cloud and we can get to their servers from anywhere we have Internet access, which is everywhere we might be.
However attractive The Cloud may be, IS Managers need to know what it takes to provision a Data Center, large or small so they can make the best arguments for maintaining a private network room or data center vs. moving it to The Cloud...
A data center is a facility used to house computer systems and associated components, such as telecommunications and storage systems. It generally includes redundant backup power supplies, redundant internet and telephone connections, and environmental controls such as air conditioning and fire suppression and various security devices. Large data centers are industrial scale operations using as much electricity as a small town. ( Wikipedia: Data Centers)
Data Centers have been growing larger through and since the 'dot comm bust' around Y2K. In years past, a data center was likely to be private and only house equipment for its owner. In more recent years data centers are more and more likely to house the computers for many customers who 'co-locate' their own equipment or pay for 'dedicated' or 'virtual servers' where there is access to high-speed internet backbone circuits. The expenses of air-conditioning, power-conditioning, backup power, fire-suppression, and security are shared among a large number of customers, and economies of scale prevail.
Today's ever faster, less expensive, and more reliable computers and networks favor 'centralization' of database, computers, and networking resources in data centers with access to high-speed networks close to the Tier One networks. The economies of scale suit data centers that cover tens or hundreds of thousands of square feet of real estate and have a good business model.
The #1 Rule in striving for 100% Availability of systems is to have two or more redundant, geographically-separated systems operating in parallel or a grid that can survive the failure of one or more sites and 'seamlessly recover' without disrupting business on The Internet. Geographically-separated means that servers are in different places, some distance apart, so that a local or regional disaster in one data center won't affect the others. Parallel means that both, or all, servers are updated every time one of them is, so a server can fail or be taken down without interrupting business. It takes an 'autonomous system' running on The Internet to provide for this seamless rollover if a server in one location fails.
Cloud providers' data centers are interconnected with fast optical circuits they control that make it easy for any organization to operate their networks and handle transaction logging, backup, and parallel operations on a global scale, at very reasonable rates. They may also have IP ASNs - Autonomous System Numbers that allow great flexibility and seamless recovery or rollover with servers on a global scale.
Data centers are certified on a tiering scale where Tier 1 is engineered to provide %99.67 availability through Tier 4 with redundant, fault-tolerant components to satisfy an expectation of %99.995 availability. The 3 tenths of a percent spread between Tiers 1 and 4 represent about a day's downtime per year vs. several minutes. The cost of moving up those decimal points is considerable, but an organization can distribute its redundancy on a national or global scale in The Cloud, whoever provides it, and survive a data center disaster gracefully, only losing transactions in flight at the time of whatever disaster.
The tiers for data centers are arranged 'backwards' from the tiering scheme of ISPs where the fastest Tier 1 Networks make up the backbone and higher tiers are more remote from it.
A higher tier data center is likely to be co-located or in the same neighborhood with Tier 1 or Tier 2 internet providers.
Many or most organizations of any size from the 1930s onward had their own 'electronic data processing department' at the main office and branches. Handwritten orders and vouchers of all types were brought to 'data processing' to be keyed onto punched cards for input to batch processes that output more punched cards tapes and lots of paper reports. Reports lagged a day or more behind and were seldom 100% accurate.
The term 'data center' appeared in the late 1970s as mid-range and mainframe computers supported networks with 'remote job entry' and local and remote 'data entry terminals'. LANs where likely to be a star configuration where all the terminals and printers connected with serial lines directly to 'ports' on the computer. WAN connections were typically leased lines, 19.2 Kbps through 64K, which was more than adequate for the thin stream of data from keyboards or card readers and to green screens' and printers. The leased lines cost roughly $1 per mile per month and the proprietary data communications equipment was complex, but the benefits of OLTP-On Line Transaction Processing outweighed the cost of networking.
Reliability and availability of computer systems were often compromised by inadequate backup power and lack of redundancy in networks, computers, and storage. Some 'down time' was expected. Without power-conditioning, uninterruptable power supplies, and backup generators any interruption or spike in municipal power could bring the systems down and some crashed and rebooted a few times a month.
There was a brief period of 'decentralization of computing' in the '80s when wide-area networks were slow and expensive and small personal or 'departmental' mid-range computers became more & more powerful. 'Distributed Data Processing' became a hot topic and many companies tried to implement it. The lack of cheap, fast wide-area networks isolated corporate mid-range and mainframe computers and many predicted their demise, but they never really died.
The cost of integrating the results of distributed computing resources could be crushing. And the result was often expensive, inaccurate information that always lagged behind 'real time' and led to poor management decisions and expensive supply-chain management. Keeping several or dozens of systems 'in synch' with each other is fraught with errors and requires constant monitoring and manangement.
In the mid-to-late 80's PCs, especially the Microsoft PCs running PC-DOS and Clones running MS-DOS, were becoming better-established as valuable tools on managers' desks. LAN and 'client/server' technology became affordable in the mid-90s when NT Server and Windows 3.11 for Workgroups came on the scene. Many managers were enthralled by the ease of producing good-looking reports from their spreadsheets and looked at the old 'green screens' that connected them to corporate networks with disdain.
LANs were becoming affordable with a standardization on Ethernet. But, WAN's were costing about $1/mile/month for the leased circuits, 64K digital or 19.2K analog, prevalent at the time.
Many managers argued for a 'de-centralized system' where a LAN at their branch would gather transaction data in the popular 'GUI' environment, summarize it, and periodically update the home-office system.
It sounded good, _looked_ good at the branches, but the home office found it increasingly expensive and inaccurate to integrate the branches' operational and financial reporting at the corporate level.
Where the data processing department had good backup and recovery procedures and never or seldom lost any data, personal computers on employees' desks were seldom backed up or secure so expensive data loss and theft were common. The instructor has seen records in such a mess that the only figures that could be trusted for months or years were the checks and deposits at the bank, with no records at all to explain the details. No manager can be effective in a situation like this.
In old distributed processing environments, if an executive at the home office needed up-to-the-minute data about a customer, supplier, or product it had to be requested from managers in the field, could be expensive to provide, and the request could viewed as 'trouble' by the manager who had to respond. In today's centralized data processing operation, good operational data and information is available to managers and executives without the burden of manually preparing reports or the cost of integrating them. Daily workflow management in a well-integrated business or enterprise application keeps everybody informed without being intrusive.
Bill Gates addressed this issue in his enlightening 'Business At The Speed of Thought', where his experience as a predatory monopolist adds value to his insights that support better integration of PCs and servers with 'recentralized' computing resources.
Today's data centers based around server farms, mid-range, mainframe computers and fast internet access reflect extreme 'recentralization' through the '00s. With LANs and WANs cheaper and faster than ever and a quick ramp-up of smartphones and cellular data plans to support mobile-friendly websites and apps, transactions are gathered in 'real time' as and wherever they occur. POS-Point of Sale or Point of Transaction data is captured directly by centralized databases and computing resources. In this environment, information at the home office is always current and available from anywhere at the click of a mouse. The value of work-flow management, EDI, supply chain management, on-line order entry and fulfilment, and many other business activities is greatly enhanced when accurate, up to the minute, data is available for decision making!
Today, with WAN costs greatly reduced by open standards of The Internet, it's cheaper than ever to provide secure access to centralized servers. An enterprise's legacy software and databases may have been 'web-faced' to provide secure access wherever it's needed -- an example of 'extending the value of the legacy' rather than replacing it.
SaaS- Software as a Service for everything from the sales counter to the bottom line is best served up from a centralized system. Startup companies often have the option of PaaS and SaaS for a few or several dollars per month per employee vs. tens or hundreds of thousands of dollars to set up a data center and purchase hardware and software.
The 'web interface' has become so standard that we see BYOD-Bring Your Own Device standards in some businesses, where it's cheaper to subsidize an employee's IT than it is to put a PC on their desk!
Providing as close to 100% availability as possible isn't cheap and must be weighed against the cost of a system being unavailable.
It requires lots of equipment to support the computers and networks in a data center. Avoiding a 'data disaster' and interruption of business require two or more systems separated geographically and running 'in parallel' so that enterprise data and applications are continuously available even if a local or regional power failure or disaster strikes and takes one of them down.
Certification is important for data centers and those who decide to locate their systems in in them. Uptime Institute and others issue certificates for data center design, security, and operations. This article from TechTarget.com discusses Data Center Certification vs. Less Calculable IT Skills.
Success of 'cloud computing', 'virtual servers', and 'managed services' by RackSpace.com, DigitalOcean.com, Amazon Web Services, and dozens of other competitors have many companies considering and buying into IaaS-Infrastructure as a Service instead of purchasing their own computing, storage, and networking equipment. Where the ordinary data center in the 1990's was privately owned and operated, today's data centers are likely to contain the systems of dozens, hundreds, or thousands of companies.
For several hundred dollars a month a small organization can set up their systems in a Tier 4 data center close to a Tier 1 network and have an easily scalable environment that can handle incremental or exponential growth. In more and more cases, this beats a decision to invest hundreds of thousands of dollars 'up front' into private data centers that may provide less availability and scalability in the future. (Notice: ISPs are rated with Tier 1 being the fastest and Data Centers are rated with Tier 4 being the most reliable and secure.)
Anything more than a 'rack full' of computers and networking equipment requires other equipment to support the compuers and air-conditioning and ensure safe, continuous operations. Depending on local codes, a small network with a few computers on an ordinary 20amp circuit may be legal to operate in an office or well-ventilated closet, using the ordinary air-conditioning that keeps the office comfortable for people.
Concentrating several, several dozen, or several hundred, servers and their associated networking equipment in a room stresses ordinary power and air-conditioning systems, increases risk of an electrical fire if some component in a power supply or air-conditioner fails, and greatly increases the risk that expensive computing or networking equipment will be 'fried' by an electrical malfunction.
Even a small 'network closet' or 'server room' needs systems to supply power and mitigate the risk of over-heating, 'dirty power', and failure of municipal power. It needs backup power from generators to remain available in the event of a local or regional power failure.
Typically, more space in a data center is required for the support equipment and personnel than is taken up by the computers. Here are several components of a data center:
Computers and storage devices occupy a relatively small percentage of the site of a data center. Air- and power-conditioning equipment, battery-powered UPS, backup generators, their fuel tanks, wire-ways, office and parking space for support staff... These take up lots more space than the computers in a data center.
The space and bandwidth to connect one or more geographically separated 'hot sites' or 'parallel sites' must be considered, too. It's never safe to keep all the eggs in one basket...
'Internet Exchanges' are specialized data centers where high-speed fiber circuits of The Internet Backbone come together so that Wide Area Networks may be 'internetworked' inside the exchange. Internet exchanges are heavily stocked with high-speed, industrial-strength routers that handle the internet's backbone traffic. Inside the exchange the routers are meshed with very high-speed fiber-optic jumpers to other routers, servers, and storage inside the IX. Much of the white space in an IX is used by companies that 'co-locate' their servers in these nexuses (nexii?) of The Internet where they can be connected more-or-less directly to backbone fiber to get max bandwidth for their customers without suffering the expense of provisioning high-speed circuits for the 'last mile' to companies' private data centers.
The kinds of support needed for 'exchanges' for telephone and telegraph services dates back more than a century. Many old telephone exchanges have been refit to handle Internet services.
These two large IX-Internet Exchanges in New York are examples of buildings that have been in service continuously since they were built as telegraph and telephone exchanges in the early 1900s. Older copper-wire & telco tech has been augmented with modern fiber-optic circuits and IP routers: Western Union and AT&T Long Lines at 60 Hudson St and 32 Avenue of the Americas. The Western Union building started shuffling telegrams around the world to be delivered at a walking pace, and now shuffles IP packets at the speed of light. AT&T's building was built to get telephone calls and data in copper circuits moved around the world, is now a nexxus of fiber. As people moved out and modern datacommunications moved in, older pneumatics and elevators have become conduits and chases for fiber. The Secretive Old Building is a repurposed high-rise block in Manhattan, houses lots of co-located network rooms including many popular websites, NSA and other security agencies. The New AT&T Long Lines is even more secretive, is a windowless, bomb-proof, high-rise fortress where much of the world's bandwdith is routed.
There are IXs in most cities, thousands of them, plus dozens of very large IXs around the world. 'Cloud Servers' located in these exchanges provide relatively inexpensive, scalable networking, internet, computing, storage, security, and backup for power and data. These services are marketed as 'IaaS', Infrastructure as a Service, at rates that are very competitive with the costs of building and maintaining a computer room. DataCenterMap.com shows many of them. Click the 'cloud servers' tab to see hundreds of 'cloud providers'. With the 'economies of scale' and 'the internet' it's harder to justify a private data center these days...
Here's a very good reference from CIO Mag about Data Centers. Here's a NY Times article about Power, Pollution, and The Internet, showing 'the cloud' is not benign & fluffy, is often clouds of diesel fumes. This video about The Cloud is a tour of a large data center... Here's an unprecedented look into Google's Data Centers -- highly secretive since their startup, Google released these photos mid-October 2012 and also provides 'street view' tours of their data centers. (Google Earth, on the other hand, obfuscates their larger data centers, showing whatever covered the acreage before they moved in!) Use google maps to find 'equinix, ashburn, va' to see a satellite view of the most densely and well connected data center on the East Coast.
The new NSA Data Center near Bluffdale, Utah is the largest data center, a million square feet, built to gather and keep Yottabytes, or maybe even Exabytes as technology emerges. NSA co-locates at IXs to soak up most of the world's communications and pour it into this new facility and others in their legacy.
Another pitch for mid-range and mainframes: The requirements for 'support infrastructure' are greatly reduced in enterprises or organizations that use mainframe or mid-range machines. Server-farm based applications can grow to thousands and thousands of cubic feet of 'white space', and there must be more than one farm to promise 100% availability. A company like IBM or Oracle/Sun or HP can engineer a mainframe or mid-range 'parallel sysplex' where a couple or a few machines in other locations are harnessed together, 'in parallel', via high-speed optical networks and any one of them can handle the load. These machines are as close to 100% reliable and available as any can be.
Scalability in a mid-range or mainframe generally means adding another 'processor book', CPU card, more RAM, or maybe another chassis -- not more rows of racked equipment in a server farm.
Take a Virtual Tour of an IBM System zEC12 -- links in the Demo section. Here's a Wired Article about zSeries with a brief history of mainframes and how they are used today. IBM's customers are zealous advocates of these 'big iron' systems that are engineered to reduce complexity and provide true 'fault tolerance' and 'continuous availability'. A pair of mainframe chassis holds the equivalent computing power of a couple or few thousand server class machines, and each has access to huge RAM, huge disk, fast busses and channels, and the fastest optical networks.
IBM was an early a leader in 'virtualization', has since the '60s provided a VM-Virtual Machine operating system so that any later hardware and OS can run applications developed for any earlier OS.
Many IBM customers with a legacy including applications developed for IBM mid-range, mainframe, or smaller unix machines have found they can extend the value of their IT legacy by 'web-facing' big machines.
These projects have turned mainframes and mid-range computers into 'Super Servers' that connect to The Internet and move data like a firehose. Where required, IBM's x Series servers and blades make it easy to integrate Windows applications servers in the same chassis as the mid-range or mainframe CPUs, with shared access to HUGE RAM (Dozens or hundreds of TeraBytes!), fast disk storage, and fast network interface.