I. Background
As the airline business continues to grow, the read performance of the order database has encountered challenges, necessitating a read-write separation for the database. The main goal is to improve the database’s concurrency and scalability. When all write operations are efficient, but read requests are slower, as seen with tens of millions of rows in the order table and queries requiring joins with dozens of related sub-tables, each containing hundreds of millions of rows, the query time can exceed the slow SQL query threshold of 3 seconds. The following are specific reasons and benefits: the read-write separation relies on master-slave replication.
Performance Improvement:
Load Distribution: By separating read and write operations, write operations can focus on the master database, while read operations are distributed across multiple slave databases. This significantly reduces the master database’s load and improves the overall system’s response speed.
Concurrent Processing Capability: Read-write separation can use multiple slave databases to handle concurrent read requests, thereby enhancing the system’s concurrent processing capacity.
Scalability:
Horizontal Scaling: By increasing the number of slave databases, it’s easy to expand the system’s read capacity without complex scaling of the master database.
Load Balancing: Load balancing techniques can distribute read requests to different slave databases, leading to better resource utilization and performance optimization.
Data Security and Fault Tolerance:
Data Backup: Slave databases can serve as part of a data backup, providing redundancy and fault tolerance. If the master database fails, the slave database can quickly take over read operations and, if necessary, be promoted to the new master.
Disaster Recovery: In disaster recovery scenarios, slave databases can act as backups for the master, ensuring data security and recoverability.
Cache and Index Optimization: The slave database can be optimized for specific read operations, such as creating specific indexes or caching strategies, without impacting the write performance of the master database.
Extended: Configuring Slow Query Timeout: In MySQL’s configuration file (typically my.cnf
or my.ini
), set the long_query_time
. Add or modify the following lines in the [mysqld]
section:
[mysqld] slow_query_log = 1 slow_query_log_file = /path/to/your/slow_query.log long_query_time = 2
II. Master-Slave Read-Write Separation
After completing master-slave replication, we also need to implement read-write separation. The master handles data writes, while two slaves handle reads. Due to the large data volume, our project uses Sharding-JDBC for table sharding. Below is a brief on implementing read-write separation with Sharding-JDBC.
Dependency Introduction: Add Sharding-JDBC dependencies in the project.
<dependency> <groupId>org.apache.shardingsphere</groupId> <artifactId>shardingsphere-jdbc-core</artifactId> <version>5.x.x</version> <!-- Use the latest version --> </dependency>
Configuring Data Sources: Configure multiple data sources, typically including one master and one or more slaves. Here’s a simple Spring Boot configuration example:
import org.apache.shardingsphere.driver.api.yaml.YamlShardingSphereDataSourceFactory; import javax.sql.DataSource; import java.io.File; @Configuration public class DataSourceConfig { @Bean public DataSource dataSource() throws Exception { File yamlFile = new File("path/to/sharding-jdbc-config.yaml"); return YamlShardingSphereDataSourceFactory.createDataSource(yamlFile); } }
YAML File Configuration: Configure read-write separation information in the
sharding-jdbc-config.yaml
file. Here’s an example configuration:
dataSources: master: type: com.zaxxer.hikari.HikariDataSource driverClassName: com.mysql.cj.jdbc.Driver jdbcUrl: jdbc:mysql://localhost:3306/master_db username: root password: root slave0: type: com.zaxxer.hikari.HikariDataSource driverClassName: com.mysql.cj.jdbc.Driver jdbcUrl: jdbc:mysql://localhost:3306/slave_db0 username: root password: root slave1: type: com.zaxxer.hikari.HikariDataSource driverClassName: com.mysql.cj.jdbc.Driver jdbcUrl: jdbc:mysql://localhost:3306/slave_db1 username: root password: root rules: - !READWRITE_SPLITTING dataSources: pr_ds: writeDataSourceName: master readDataSourceNames: - slave0 - slave1 loadBalancerName: round_robin loadBalancers: round_robin: type: ROUND_ROBIN
Using Sharding-JDBC Data Source: Use the configured Sharding-JDBC data source. Here’s a simple example:
import org.springframework.beans.factory.annotation.Autowired; import org.springframework.jdbc.core.JdbcTemplate; import org.springframework.stereotype.Service; @Service public class MyService { @Autowired private JdbcTemplate jdbcTemplate; public void performDatabaseOperations() { // Write operation routed to the master jdbcTemplate.update("INSERT INTO my_table (name) VALUES (?)", "John Doe"); // Read operation routed to the slave String name = jdbcTemplate.queryForObject("SELECT name FROM my_table WHERE id = ?", new Object[]{1}, String.class); System.out.println("Name: " + name); } }
Consideration 1: How to handle data latency in MySQL synchronization? We use identical hardware configurations for master and slave and MySQL 5.7, which supports writeSet parallel replication. For high real-time requirements, the master database writes data to Redis with an expiration time, which aligns with the master-slave replication delay.
Consideration 2: For read-write operations within the same thread and database connection, always read from the master to ensure data consistency.
III. Non-Relational Database Selection
Synchronous Mode: Suitable for systems with high consistency and real-time requirements, but increases write time and impacts QPS.
Asynchronous Mode: After data writing, establish queries asynchronously with Kafka, avoiding business process interference, though data consistency must be addressed.
Binlog Mode: This method, used in our custom DRC service, listens to database logs to build query data without main process interference, ensuring non-intrusive code but requires data consistency checks.
IV. Project Challenges
Ensuring Message Idempotency: Achieve idempotency with
updateOrInsert
; if a record exists, update it; if not, insert it. In MongoDB, use upsert withupdateOne
.
db.collection.updateOne( { _id: documentId }, { $set: { fieldName: newValue } }, { upsert: true } );
Ensuring Message Order: By using Kafka for message subscription, we ensure order by partitioning messages by order number, so each order is consumed in sequence within its partition.
Ensuring Message Consistency: Set up a data synchronization system (DRC) for periodic consistency checks.
Why Choose MongoDB for Data Storage?For high-volume data with slow queries, NoSQL options like MongoDB are preferred. Although in-memory databases like Redis offer high performance, they are costly for large datasets. Consider MongoDB, HBase, or Elasticsearch:
MongoDB: A document-based NoSQL database using JSON (BSON) format, with complex query support, horizontal sharding, eventual consistency, and flexible consistency configurations.