快速排序（QuickSort）及 HLS 实现解析

1. 快速排序简介

快速排序（QuickSort）是一种基于分治（Divide and Conquer）思想的排序算法，平均时间复杂度为 O(n log n)，最坏情况下为 O(n²)，但通常情况下性能优越。其基本思想是：

本实现采用 Hoare 分区方案，将数组划分成两部分，并返回 Pivot 的最终位置。该方案在遍历过程中进行元素交换，以确保所有小于 Pivot 的元素位于左侧，而所有大于 Pivot 的元素位于右侧。

假设我们有以下输入数组：

arr = [10, 7, 8, 9, 1, 6]

我们使用最后一个元素作为 Pivot（即 6），然后进行分区。

原始数组: [10, 7, 8, 9, 1, 6]

Pivot 选取: 6

最终，将 Pivot（6）放到正确位置（索引 1）：

[1, 6, 8, 9, 10, 7]

Pivot 6 位置索引为 1，接下来递归对 [1] 和 [8, 9, 10, 7] 进行排序。

原数组: [8, 9, 10, 7]

Pivot 选取: 7

最终，交换 Pivot 7 到正确位置：

[7, 8, 9, 10]

Pivot 7 位置索引为 0，继续对子数组 [8, 9, 10] 进行排序。

原数组: [8, 9, 10]

Pivot 选取: 10

所有元素均小于 Pivot，无需交换，排序完成。

最终结果：

[1, 6, 7, 8, 9, 10]

以下代码使用 HLS 进行快速排序，主要优化点如下：

swap 使用 #pragma HLS INLINE 进行内联，减少函数调用开销。
partition 使用 #pragma HLS PIPELINE II=1 以流水线方式处理数据，提高吞吐率。
quicksort 使用非递归方式，并使用 stack 模拟递归过程，确保 HLS 兼容性。
#pragma HLS ARRAY_PARTITION variable=stack cyclic factor=2 dim=1 进行部分数组分区，提高访问效率。

#include <ap_int.h>

#define data_number 1000

inline void swap(ap_int<64> &a, ap_int<64> &b) {

    #pragma HLS INLINE

    ap_int<64> temp = a;

    a = b;

    b = temp;

int partition(ap_int<64> arr[], int low, int high) {

    ap_int<64> pivot = arr[high];

    int i = low - 1;

    Partition_Loop:

    for (int j = low; j < high; j++) {

        #pragma HLS PIPELINE II=1

        if (arr[j] < pivot) {

            i++;

            swap(arr[i], arr[j]);

    swap(arr[i + 1], arr[high]);

    return (i + 1);

void quicksort(ap_int<64> arr[], int low, int high) {

    int stack[data_number];

    #pragma HLS ARRAY_PARTITION variable=stack cyclic factor=2 dim=1

    int top = -1;

    stack[++top] = low;

    stack[++top] = high;

    QuickSort_Loop:

    while (top >= 0) {

        #pragma HLS PIPELINE II=1

        high = stack[top--];

        low = stack[top--];

        int pivot = partition(arr, low, high);

        if (pivot - 1 > low) {

            stack[++top] = low;

            stack[++top] = pivot - 1;

        if (pivot + 1 < high) {

            stack[++top] = pivot + 1;

            stack[++top] = high;

void hls_quicksort(ap_int<64> arr[data_number]) {

    #pragma HLS INTERFACE mode=ap_memory port=arr

    #pragma HLS BIND_STORAGE variable=arr type=RAM_1P impl=BRAM

    quicksort(arr, 0, data_number-1);

数据流优化：使用 #pragma HLS PIPELINE II=1 进行流水线优化，加速分区过程。
存储优化：arr 绑定到 Block RAM (#pragma HLS BIND_STORAGE variable=arr type=RAM_1P impl=BRAM)，提高 FPGA 访问速度。
循环展开：使用 #pragma HLS ARRAY_PARTITION 进行栈优化，提高并行性。

此方案适用于 FPGA 高速数据处理，如金融分析、图像处理等领域。