MySQL Pagination Upgrade: Token-Based Approach

Time: Column:Databases views:218

When performing pagination in database queries, it's essential to return a token that can be used directly to locate the first data item of the next page. It is not recommended to simply use the LIMIT and OFFSET as a page token, as this method can lead to inefficient queries with large datasets.

Overview

The core idea is to improve the efficiency of pagination queries on large datasets. When dealing with vast amounts of data, using the LIMIT and OFFSET clauses can cause performance issues. This is because the database must skip the number of rows specified by the OFFSET to retrieve the required data. As the OFFSET value increases, the query performance can degrade.

To solve this problem, a cursor-based or "Seek Method" pagination approach can be used. Instead of skipping rows, this method tracks the unique key of the last record from the previous page (usually a primary key or unique index) and queries for the next set of sorted records.

Here is a simple example assuming we have a users table with an auto-incremented id field as the primary key.

First, we need an API that doesn't accept a page number but instead accepts a last_seen_id (the ID of the last user from the previous page) as a parameter to fetch the next page of data.

Code Example

Below is an example using Flask to create a pagination API. For simplicity, database connection details are omitted:

from flask import Flask, request, jsonify

app = Flask(__name__)

# Assuming we have a database query function
def query_db(query, args=(), one=False):
    # This should be the code to execute the query using libraries like pymysql or sqlalchemy
    # This is just for illustration, actual implementation is omitted
    pass

# Pagination API
@app.route('/users', methods=['GET'])
def get_users():
    # Default page size and maximum limit
    default_page_size = 10
    max_page_size = 100
    
    # Get last_seen_id and page_size from the request
    last_seen_id = request.args.get('last_seen_id', None)
    page_size = int(request.args.get('page_size', default_page_size))
    
    # Ensure page_size doesn't exceed the maximum limit
    if page_size > max_page_size:
        page_size = max_page_size
    
    # Create query using last_seen_id to locate the next page of data
    if last_seen_id:
        query = "SELECT * FROM users WHERE id > %s ORDER BY id ASC LIMIT %s"
        args = (last_seen_id, page_size)
    else:
        # If last_seen_id is not provided, return the first page
        query = "SELECT * FROM users ORDER BY id ASC LIMIT %s"
        args = (page_size,)
    
    # Execute query
    users = query_db(query, args)
    
    # Convert query result to list of dictionaries
    # Assuming each user is a tuple (id, name)
    users_list = [{'id': user[0], 'name': user[1]} for user in users]
    last_seen_id = users[-1][0] if users else None
    
    # Return JSON response
    return jsonify({
        'users': users_list,
        'last_seen_id': last_seen_id
    })

# Start the Flask app
if __name__ == '__main__':
    app.run(debug=True)

With this API, the client can specify the last_seen_id parameter to get the next page of data. For example, /users?last_seen_id=100&page_size=20 will fetch the next 20 user records with IDs greater than 100.

Benefits of This Method

The advantage of this approach is that, regardless of whether you are fetching the first page or the one-millionth page, the query efficiency remains the same. This is because the database always starts querying from the last ID of the previous page, without needing to skip any rows. This significantly improves query performance, especially when handling large datasets.