Showing posts with label design. Show all posts
Showing posts with label design. Show all posts

Thursday, June 16, 2016

Role and responsibility of IT Infrastructure Technical Architect

In this article, I would like to describe the infrastructure architect role and his responsibility.

Any architect generally leads the design process with the goal to build the product.  The product can be anything the investor would like to build and use. The architect is responsible to gather all investor's goals, requirements, constraints and try to understand all use cases of the final product.

The product of IT technical infrastructure architect is an IT infrastructure system, also known as a computer system, running an IT applications supporting business services. That's very important statement. Designed IT infrastructure system is usually not built just in sake of infrastructure itself but to support business services.

There is no doubt that the technical architect must be a subject matter expert in several technical areas including computing, storage, networking, operating systems and applications but that's just a technical foundation required to fulfill all technical requirements. However, systems are not impacted just by technology but also by other external non-technical factors like business requirements, operational requirements and human factors. It is obvious that the architect's main responsibility is to fulfill all these requirements of the final product, IT infrastructure system in this particular case, although the last mentioned factor, a human factor, usually has the biggest impact on any systems design because we usually build systems for human usage and these systems has to be also maintained and operated by other humans as well.

Now, when we know what the IT Infrastructure Technical Architect does, let's describe what are his typical tasks and activities.

The Architect has to communicate with investor's stakeholders to gather all design factors including requirements, constraints and use cases. Unfortunately, there are usually also some design factors nobody have a specific requirement. These factors has to be documented as assumptions. When all relevant design factors are collected and revalidated with requestors and investor authorities, the architect starts design analysis and prepare conceptual design. The conceptual design is a high level design which helps to understand the overall concept of proposed product. Such conceptual design has to be reviewed by all design stakeholders and when everybody feels comfortable with the concept the architect can start low level design.

Low level design is usually prepared as decomposition of conceptual design. Low level design should be decomposed into several design areas because it is almost always beneficial to divide complex system into sub-systems until these become simple enough to be solved directly. This decomposition approach is also known as "Divide and conquer" method. The main purpose of low level design is to document all details important for successful implementation and operation of the product. Therefore it must be reviewed and validated by particular subject matter experts - other architects, operators, and implementers - for particular area. The low level design is usually divided into logical and physical design. Logical design is detailed technical design but general logical components are used without using a particular suppliers physical product models. materials, configuration details or other physical specifications. The purpose of logical design is to document general principles principles of overall design or particular decomposed, thus simplified, design area. Logical design is also used for proper product sizing and capacity planning. Physical design, on the other hand, is detailed technical design with specific products, materials and implementation details. Physical design is primarily intended to product builders and implementors because the product is build or implemented based on the physical design.

It is good to mention that there is no product or system without a risk. That's another responsibility of the architect. He should identify and document all risks and design limitations associated with proposed product. The biggest threats are not risks in general but unknown risks. Therefore, potential risks documentation and risk mitigation options is very important architect's responsibility. Risk mitigation plan or at least contingency plan should be the part of product design.

At the end of the day, the design should be implemented therefore the implementation plan is just another activity and document the architect must prepare to make the product real even the implementation is usually out of the architect scope.

It is worth to mention, that here is no proven design without design tests. Therefore the Architect should also prepare and perform the test plan. Test plan have to include validation and verification part. Validation part validates design requirements after product build or implementation. Only after validation, the architect can honestly proof that the product really fulfill all requirements holistically. Verification part verifies that everything was implemented as designed and operational personnel knows how to operate and maintain the system.

There is no perfect design nor product, therefore the architect should continually improve even already built product by communication with end users, operators and other investor stakeholders and take their feedback in to account for future improvements. After some period of time, the architect should initiate design review and incorporate all gathered feedback in to the next design version.

Now, when we know what the architect is responsible for let's summarize what skills are important for any good architect. The architect must have following decent skills and expertise:

  • communication skills
  • presentation skills
  • consulting skills
  • cross check validation skills
  • documentation skills
  • systematic, analytical, logical and critical thinking
  • technical expertise
  • ability to think and work in different levels of detail
  • ability to see a big picture but also have attention for detail because the devil is in the details
Even you have read this article to this point, you can ask what is the architect main responsibility. That's a faire question. Here is short answer.

The architect main responsibility is the happiness of all users and actors using designed product during the whole lifecycle of the product.




Tuesday, January 12, 2016

Datacenter Infrastructure Architectural Rules

It is always more complex but in general following rules applies to any datacenter infrastructure architecture transforming to cloud principles ...

Compute Rule
Compute performance is relatively cheap, but CPU context switching is pricey.
In other words, vCPU/pCPU ratio drives your consolidation.

Storage Rule
Storage capacity is relatively cheap, but I/O performance and response time is pricey.
In other words, storage performance drives your consolidation.

Network Rule
Network bandwidth is relatively cheap, but latency is pricey.
In other words, network latency drives your datacenter consolidation, geo clustering, DR architecture and hybrid cloud considerations.

Operational Rule
The infrastructure resources (compute/storage/network) are relatively cheap, but human resources are pricey.
In other words, automate as much as possible and keep it as simple as possible.

Tuesday, July 28, 2015

How you understand documenting Conceptual, Logic, Physical?

I have just read following question in Google+ "VCDX Study Group 2015"
As a fellow writer (we architects are not readers, but writers! :) ) wanted to ask you how you understand documenting Conceptual, Logic, Physical.
Can you add all these in a single Architecture design document with all 3 parts as 3 sections or you are better off creating 3 separate documents for each type of design?
I'm hearing very often similar questions about approach how to write good design documentation. So my answer was following ...

As a writer you have to decide what is the best for your readers :-)

When I'm engaged to write Architecture document I use different approaches for different design engagements. It really depends on project size, scope, audience, architecture team, etc... For example, right now I'm working on project where 6 architects are working on single High Level Design covering big picture and each preparing Low Level Design. At the end there is single HLD document and five separate LLD documents covering
  • Compute, 
  • Storage, 
  • Networking,
  • vSphere and
  • Backup.
I had another projects where whole architecture was in single document where each section was targeted to different auditorium. That was a case of my VCDX design documentation.

Generally I believe High Level Design (HLD) is for broader technical audience but also for business owners. Therefore physical design is not required in this stage and only Conceptual and brief Logical design for each area should be in HLD. Low Level Design (LLD) is for technical implementers and technical operational personnel therefore less writing creativity and more deep technical language for specific area should be used there with references to HLD. I recommend to read Greg Ferro's "Eleven Rules of Design Documentation" which IMHO apply very good to LLD.

HLD Conceptual Design should include business and technical requirements, constraints, assumptions, key design decisions, overall high level concept and risk analysis).

HLD Logical Design should include basic logical constructions for different design areas together with capacity planning.

LLD should include Conceptual, Logical and Physical design for specific area(s) or designed system/subsystem. In LLD conceptual design there should be a subset of HLD technical requirements, constraints and assumptions and maybe some other specific requirements irrelevant in HLD. They can be even discovered after HLD and LLD design reviews and additional technical workshops. Logical design can be the same as in HLD or you can go into deeper level but still stay in logical layer without product physical specifications, cabling port IDs, VLAN IDs, IP addressing, etc... These physical details should be in in physical design and if needed referenced in to attachments, excel workbooks, or similar implementation/configuration management documents.

LLD Physical design is usually leveraged by implementer to prepare As-Built documentation.

That's just my $0.02 and your mileage can vary. 

At the end I have to repeat ... you, as a writer (Architect), have to decide appropriate documentation format for your target audience.

Don't hesitate to share your thoughts in comments.

Sunday, September 07, 2014

vSphere HA Cluster Redundancy

All vSphere administrators and implementers know how easily vSphere HA Cluster can be configured. However sometimes quick and simple configuration doesn't do exactly what is expected. You can, and typically you should, enable Admission Control in vSphere HA Cluster configuration settings. VMware vSphere HA Admission Control is control mechanism checking if another VM can be powered on in HA enabled cluster and still satisfy redundancy requirement. So far so good however complexity starts from here because you have several options what algorithm you will use to fulfill your spare capacity redundancy requirement. So what options do you have?

Admission Control can be configured for following three algorithms:
  1. Define fail-over capacity by static number of hosts
  2. Define fail-over capacity by reserving a percentage of cluster resources
  3. Use dedicated fail-over hosts
Let's deep dive into each option ...

Algorithm 1 is generally N+X host redundancy 
When N+X redundancy is required most vSphere designers go with this option because it looks like most suitable choice. However, it is important to know that this particular algorithm is working with HA Slot Size. HA Slot Size is calculated based on defined reservations on powered VMs. If you don't use CPU/MEM reservations per VM than default reservation values (32 MHz, memory virtualization overhead)  are used for HA Slot Size calculation. By the way, VMware recommends to set  reservations per resource pools and not per VM so there is relatively high probability you don't have VM reservations and you will have very low HA Slot Size which means that Admission Control will allow to power on lot of VMs which introduce high resource over-allocation and your N+1 redundancy can significantly suffer. On the other hand, if you have just one VM with huge CPU/MEM reservations it can significantly impact and skew HA Slot Size with a negative impact on your VM consolidation ratio.  

How can we solve this problem? One solution is HA Cluster Advanced Options described below.

Maximum HA Slot size can be limited to two following advanced options.
  • das.slotcpuinmhz - Defines the maximum bound on the CPU slot size. If this option is used, the slot size is the smaller of this value or the maximum CPU reservation of any powered-on virtual machine in the cluster.
  • das.slotmeminmb - Defines the maximum bound on the memory slot size. If this option is used, the slot size is the smaller of this value or the maximum memory reservation plus memory overhead of any powered-on virtual machine in the cluster.
It helps in a situation when you have one VM with high CPU or RAM reservations. Such VM will not increase HA Slot Size but it consumes smaller HA Slots.

Default VM reservation values for HA slot calculation can be defined by another two advanced options.
  • das.vmcpuminmhz - Defines the default CPU resource value assigned to a virtual machine if its CPU reservation is not specified or zero. This is used for the Host Failures Cluster Tolerates admission control policy. If no value is specified, the default is 32MHz.
  • das.vmmemoryminmb - Defines the default memory resource value assigned to a virtual machine if its memory reservation is not specified or zero. This is used for the Host Failures Cluster Tolerates admission control policy. If no value is specified, the default is 0 MB.
Default VM reservation values can help you to define HA Slot Size you want but it doesn't automatically correspond with required overbooking and planed spare fail-over capacity because HA Slot Size is not proportional to VM sizes on a particular cluster. If you really want to have one real spare host fail-over capacity you have to go with option 3 (Use dedicated fail-over hosts).

Algorithm 2 : percentage cluster spare capacity
This algorithm doesn't use HA Slot size but it simply calculates total cluster CPU/MEM resources and decrease these cluster resources by spare capacity defined in percentage.  The rest of cluster available resources is also decreased by powered on VM reservations and new VMs can be powered on only when some cluster resources are available. Quite clear and simple, right? However, it also requires to have VM reservations otherwise you will end up with over-allocated cluster and your overbooking ratio will be too high which can introduce some performance issues. So once again, if you really want to have one real spare host fail-over capacity without dealing with VM reservations the best way is to go with option 3 (Use dedicated fail-over hosts).
Note that algorithm 2 doesn't use HA Cluster Advanced Options related to HA Slot mentioned above. However das.vmCpuMinMHz and das.vmMemoryMinMB can be used  to set default reservations. For more details read this.

Algorithm 3 : dedicated fail-over hosts
This algorithm simply dedicates specified hosts to be unused during normal conditions and used only in case of ESXi host failure. Multiple fail-over dedicated hosts are supported since vSphere 5.0. This algorithm will keep your capacity and performance absolutely predictable and independent on VM reservations. You'll get exactly what you configure.

UPDATE 2018-01-09: for some additional details about dedicated fail-over hosts read the blog post Admission Control - Dedicated fail-over hosts.

CONCLUSION
So what option to use? The correct answer is, as usually , ...  it depends :-)   

However, if VM reservations are not used and absolutely predictable N+X redundancy is required I currently recommend Option 3.

If you have a mental problem with not using some ESXi host during non-degraded cluster state (isn't it exactly what is required?) I recommend Option 1 but VM reservations must be used to have a realistic size of HA Slot. In this options, artificial HA Slot can be designed leveraging advanced options.

If you don't want elaborate with HA Slot and use all ESXi hosts in the cluster you can use Option 2 but VM reservations must be used for some capacity guarantee to avoid high overbooking ratio.

FEATURE REQUEST
It would be great if VMware vSphere has some kind of Cluster Reservation policy for VMs. For example, if you want to guarantee cluster resources overbooking 2:1 you would set up 50% CPU and 50% RAM reservations for each VM running in HA Cluster. This policy should be dynamic so if someone changes VM size from CPU or RAM perspective reservations would be recalculated automatically.

Let's break down our example above. We are assuming following HA CLUSTER RESERVATION POLICY => CPU 50%, RAM 50% assigned to our HA Cluster. Let's powered on VM with 2x vCPUs and 6GB RAM. Dynamic reservation calculation is quite easy from RAM perspective because memory reservation would be 3GB (50% from 6GB). It is a little bit more complicated from CPU reservation perspective. CPU dynamic reservation has to be calculated based on physical CPU where VM is running. So let's assume we have Intel Xeon E5-2450 @ 2.1GHz. So 50% from 2.1GHz is 1.05GHz but we have 2 vCPUs so we have to multiply it by 2. Therefore dynamic CPU reservation for our VM is 2.1GHz.  I believe with such dynamic reservation policy we would be able to guarantee overbooking ratio and define cluster redundancy more predictable from overbooking and performance degradation point of view.

FEEDBACK 
I would like to know what is your preferred HA Cluster Admission Control setting. So, don't hesitate to leave a comment and share your thoughts with the community. Any feedback is very welcome and highly appreciated.