Principles of System Architecture

Time： 2024-09-26 Column：Backend & Servers views：341

This article discusses several principles of system architecture that are generally applicable to relatively complex businesses. If you are dealing with simple applications with low traffic, you may draw opposite conclusions.

Principle 1: Focus on Real Benefits, Not Technology Itself

For software architecture, the primary concern should be the benefits it provides. If you’re focusing on technology for technology's sake, it is meaningless. The following benefits are crucial:

Lowering Technical Barriers to Accelerate Development
The architecture should enable parallel development, deployment, and operations without creating bottlenecks for any team. Even if organizational structure hampers this, a parallel system architecture can still be designed.
Enhancing System Stability
To improve the system's SLA and stability, appropriate solutions for planned and unplanned downtime are necessary. (Refer to "Architectures for High Availability.")
Reducing Costs through Simplification and Automation
The primary cost to optimize is labor, which not only incurs delays and high expenses but also frequent human errors. If the architecture design increases the need for more personnel rather than reducing it, it is a failure. Additionally, consider time and capital costs.

If an architecture fails to address these three aspects, it lacks significance.

Principle 2: Perspective of Application Services and APIs, Not Resources and Technologies

Many companies segment roles into operations and development, further dividing operations into infrastructure and application maintenance, and development into core and business development. This division leads to different perspectives. For instance, infrastructure operations focus on resource utilization and performance, while application operations and business development emphasize applications and services.

Due to the evolution of distributed architectures, some systems blur the lines between infrastructure and application layers. For example, service governance requires both foundational technologies and business collaboration, such as with Kubernetes, where networking is foundational but also requires business input for health checks and configurations.

This suggests that DevOps arose because many technologies and components are indistinguishable between development and operations, necessitating a merger of the two. Organizational and architectural optimizations cannot solely rely on tweaking individual roles or components; a top-down, holistic planning and design approach is required for overall improvement.

To achieve this, all individuals must share a unified perspective and goal—viewing issues from the perspective of services and external APIs, rather than from a technical or foundational standpoint.

Principle 3: Choose the Most Mainstream and Mature Technologies

Technology selection is critical. Incorrect choices can necessitate significant architectural adjustments, which are complex. Over the years, I've seen numerous cases where systems transitioned from PHP, Python, .NET, or Node.js architectures to Java + Go due to increasing complexity. This painful process is unavoidable; as systems grow larger and more complex, you cannot rely on immature technologies.

Opt for Mature, Industrial-Grade Technology Stacks
This means using technology stacks prevalent in large companies across various sectors (e.g., internet, finance, telecom) that require substantial technical investment.
Choose Globally Popular Technologies Over Local Trends
Technology should be global, not localized. Avoid being swayed by "special cases" from particular companies; universal applicability is key for longevity.
Utilize Established Technologies with Significant Benefits
Avoid reinventing the wheel or making modifications to open-source software. Many companies have tried to customize open-source projects like Mesos, only to end up creating something akin to Kubernetes.

In most cases, choosing Java will not lead you astray. It offers excellent productivity for business development and has a robust community with extensive resources available. Running on the JVM provides numerous advantages that reduce architectural risk and cost.

Principle 4: Completeness Over Performance

I’ve noticed that some architects prioritize performance under high load over system completeness and scalability. Cases abound where non-relational databases like MongoDB were employed initially, but later attempts to perform relational queries exposed significant limitations, leading to redundant data and inconsistency issues.

Architecture Guidelines:

Prioritize Scientific, Rigorous Models
Always use a complete ACID-compliant relational database as the foundation, with NoSQL as a supplementary option. This principle emphasizes a "tight-first, loose-later" approach.
Performance Can Always Be Addressed
My experience suggests that performance challenges can be resolved; focus should be on architectural completeness and scalability instead.
Sacrificing Completeness for Performance is Detrimental
The trade-off is not worth it.

Principle 5: Establish and Adhere to Standards, Norms, and Best Practices

This principle is crucial for ensuring better scalability in architecture. I frequently encounter systems that lack adherence to industry standards or internal consistency, resembling a chaotic group.

Standardize API Return Codes
For example, industry standards dictate that a 200 code indicates success, while 4xx and 5xx codes signal errors. Using a 200 code for all responses hampers monitoring systems.
Ensure Consistent User ID Design
Organizations must unify user ID systems rather than relying on external identifiers, which raises privacy concerns.

Key Standards and Norms to Observe:

Service Call Protocol Standards
This includes RESTful API paths, HTTP methods, status codes, headers, and JSON Schemes.
Naming Standards
Consistent naming conventions for user IDs, service names, status codes, etc.
Logging and Monitoring Standards
Defined formats for logs, monitoring data, and alerting protocols.
Configuration Standards
Standardization of OS, middleware, and software configurations.
Middleware Usage Standards
Guidelines for using databases, caches, and message queues.
Unified Software and Development Library Versions
Aim to upgrade software versions annually across the organization.

Additional Notes:

RESTful API Standards
Consider successful examples like PayPal and Microsoft to facilitate monitoring and control systems effectively.
Service Call Chain Tracing
Implement tracing systems based on frameworks like Google Dapper, with Zipkin as a popular choice.
Software Upgrades
Regular software version reviews are vital to simplifying system architecture complexity.

Principle 6: Emphasize Scalability and Operability of Architecture

In many architectures I have seen, technical personnel only consider the present, neglecting the system's future scalability and operability. This can be likened to having a child born with severe deformities—future development becomes challenging. Architecture and software are not just completed once written; they require constant modifications and maintenance, with 80% of software costs dedicated to maintenance. Therefore, it is crucial to design your architecture for better scalability and ease of operation. Scalability means I can easily add more features or integrate additional systems, while operability refers to the ability to make any changes to online systems. Scalability requires a standard, loosely coupled business architecture, whereas operability necessitates controllable capabilities through various control systems.

You can reduce inter-service coupling through service orchestration architectures. For example, dedicated services for business processes or middleware like Workflow, Event Driven Architecture, Broker, Gateway, and Service Discovery can minimize dependencies between services. Service discovery or service gateways help reduce the operational complexity brought on by service dependencies, allowing easy service deployment or scaling. It's essential to adhere to various software design principles, such as SOLID principles, IoC/DIP, SOA, or best practices from frameworks like Spring Cloud, as well as practices in distributed system architecture.

Principle 7: Comprehensive Closure of Control Logic

All programs possess two types of logic: business logic, which accomplishes business tasks, and control logic, which assists in areas like multithreading, distributed systems, database vs. file usage, configuration, deployment, operation, monitoring, transaction control, service discovery, elastic scaling, gray release, and high concurrency. Control logic generally requires a deeper technical understanding and higher thresholds than business logic, so it's best for specialized programmers to manage its development cohesively. This includes:

Traffic Closure: Managing both north-south and east-west traffic scheduling mainly through traffic gateways, development framework SDKs, or Service Mesh technologies.
Service Governance Closure: Involving service discovery, health checks, configuration management, transactions, events, retries, circuit breaking, and rate limiting—typically managed through development frameworks like Spring Cloud or Service Mesh technologies.
Monitoring Data Closure: Encompassing logs, metrics, and call chains, primarily using mainstream probes, supplemented by backend data cleansing and storage, preferably employing non-intrusive technologies. Monitoring data must be correlated in a single location to generate information.
Resource Scheduling for Application Deployment Closure: This includes computation, networking, and storage, mainly achieved through containerization solutions like Kubernetes.
Middleware Closure: Involving databases, messaging, caching, service discovery, and gateways, which generally require a unified internal shared cloud middleware resource pool.

The principles here are:

Choose technologies that facilitate the separation of business and control logic. For instance, Java's JVM combined with bytecode injection and AOP-style Spring frameworks offers significant advantages.
Opt for technologies that provide a "first mover advantage," such as those with large communities and compatibility, like Java, Docker, Ansible, HTTP, Telegraf, and Collectd.
Utilize middleware that supports HA clustering and multi-tenancy, as most mainstream middleware options do.

Principle 8: Do Not Accommodate Technical Debt of Old Systems

I have observed that many companies bear significant technical debt, manifested in several ways:

Use of outdated technologies, such as HTTP 1.0, Java 1.6, WebSphere, ESB, socket-based communication protocols, and obsolete models.
Unreasonable designs, such as embedding excessive business logic in gateways, monolithic architectures, tight coupling of data and business logic, and incorrect system architectures (e.g., using cache as a database or message queues for data synchronization).
Lack of supporting infrastructure, such as automated testing, quality documentation, high-quality code, and adherence to standards.

Those who seek my technical assistance often have various issues. I always advise them with the same sentiment: “If you are looking for case-by-case solutions, I’m not particularly interested because you should not expect to easily transform a low-end vehicle into a luxury car or straighten a poorly constructed building. All previous technical debts must be repaid, foundations need to be rebuilt, and necessary infrastructure must be established. Without properly constructed infrastructure, a good system is unattainable, and I cannot help you solve issues individually.” Initially, they assure me they are willing to repay debts, but eventually, the reality of their obligations becomes overwhelming.

They then search for rationalizations for their "technical debts," explaining various historical reasons and constraints. This gives me the impression they wish to improve without making any changes or incurring any costs, even if it means compromising new technologies to accommodate old debts, misusing new technologies in the process. One company I encountered misconfigured its system architecture and technology choices, leading to severe performance issues despite having only tens of millions of data records. Rather than repaying debts or establishing the right infrastructure, they aimed to build more systems, believing their existing system was sufficient and attributing performance issues to the lack of a big data platform.

I have seen numerous companies, including large ones like BAT, continue to build upon their original technical debts, which only grow larger, resulting in an unmanageable burden. I once described a WatchDog architecture model: instead of fixing a broken system, a new one is built to monitor the old. I find it hard to grasp this logic—perhaps it's to create more jobs.

Here are a few principles and methods I strongly advocate:

Rather than exerting significant effort to accommodate technical debt, it's better to repay it directly; long-term pain is worse than short-term pain.
Build "new districts" free of technical debt and prevent any intrusion of technical debt into these areas through an architecture model that incorporates a "corrosion-resistant layer."

Principle 9: Rely on Data and Learning, Not Just Experience

Many people come to me with their technical issues, hoping I can provide a solution. I always say that I need to understand the current state of your system first, which requires a diagnostic approach. Only after obtaining this data can I grasp the true cause of the problem and offer a suitable technical solution. I believe this is a responsible approach because there are numerous technical methods, each suitable for different scenarios and involving various trade-offs. Decisions should only be made after thorough research, similar to how a doctor relies on diagnostic data rather than experience to identify a medical issue. In the face of science, all experiences are unreliable.

Furthermore, if one day you start making technical decisions based on past experiences, you are unlikely to grow. Human progress does not come from repeatedly relying on the past; it comes from learning about the unknown. Therefore, never rely solely on your experience when making decisions. Before deciding anything, take some time to search online for relevant information—technical blogs, articles, papers, etc. Look at how various companies or open-source projects approach similar challenges. Then, compare the pros and cons of multiple solutions to arrive at a well-informed decision. This process will lead to better outcomes.

Principle 10: Be Cautious of X-Y Problems and Clarify Original Requirements

The X-Y problem arises when a user wants to solve problem X and believes that using solution Y will suffice, ultimately asking how to implement Y. However, it often turns out that the real solution to problem X is actually Z. I have encountered many instances of this X-Y issue. Therefore, every time a user approaches me, I consistently inquire about what the actual X problem is.

For example, some users come to me asking for big data streaming solutions. However, upon probing into the specific issues they need to address, it becomes clear that their challenge lies in managing large amounts of state within services, requiring that the data requests of the same user be processed by the same service. The design flaw leads to a slow function dragging down the entire application. Ultimately, performance tuning resolves the issue, and there's no need for a big data streaming solution at all.

I enjoy asking "why," as this encourages clients to rethink their situations. For instance, a client sought my assessment of a technical architecture decision that theoretically seemed suitable for their scenario. However, this scenario and architecture were unfamiliar to me. As I questioned why the scenario was as it was, I discovered that the users themselves recognized various inconsistencies within it. This led to profound discussions, and by the end, the users refined the scenario, transforming the architecture into a common and mature model.

Principle 11: Be Bold Rather Than Conservative; Innovation and Practicality Are Not Mutually Exclusive

I adopt a rather radical approach to technology. However, being radical does not mean acting recklessly or jumping on every new technology bandwagon; rather, it involves actively embracing innovative technologies that will change the future, such as Docker and Go. I quickly engage with these technologies, but I am less enthusiastic about blockchain or Rust because they do not meet the characteristics of the technological trends I consider relevant. Of course, this doesn't mean I ignore them entirely; I still study blockchain and Rust, understanding their advantages, but I won't adopt them on a large scale. I respect conservative decisions; there's no absolute right or wrong here. However, I believe that a radical attitude toward technology offers more benefits than a conservative one. New technologies often exhibit significant competitive advantages, and I have observed many successful companies actively embracing them, while conservative approaches typically lead to diminishing returns.

Some people tell me, "We are pragmatists; we don't need innovation as long as we can solve current problems. We don't require new technologies; existing technologies are sufficient." Companies with this mindset start with a technical debt from day one. While they may solve immediate issues, new problems quickly arise, leading to a cycle of fatigue in addressing various challenges. Eventually, they too will have to adopt new technologies.

The logic here is simple—progress always comes from exploration. Exploration entails costs, but the rewards are far greater. For me, the greatest risk is not daring to take risks; the biggest mistake is avoiding errors. Fear of loss often results in greater losses...

💰 Support Us