mirror of
https://git.ffmpeg.org/ffmpeg.git
synced 2026-01-15 12:54:45 +00:00
Implement NEON optimization for compute_weights_line. Also update the function signature to use ptrdiff_t for stack arguments (max_meaningful_diff, startx, endx). This is done to unify the stack layout between Apple platforms (which pack 32-bit stack arguments tightly) and the generic AAPCS64 ABI (which requires 8-byte stack slots for 32-bit arguments). Using ptrdiff_t ensures 8-byte slots are used on all AArch64 platforms, avoiding ABI mismatches with the assembly implementation. The x86 AVX2 prototype is updated to match the new signature. Performance benchmark (AArch64) in MacOS M4: ./tests/checkasm/checkasm --test=vf_nlmeans --bench compute_weights_line_c: 151.1 ( 1.00x) compute_weights_line_neon: 62.6 ( 2.42x) Reviewed-by: Martin Storsjö <martin@martin.st> Signed-off-by: Jun Zhao <barryjzhao@tencent.com>