Chiplet-based CPUs, which combine multiple independent dies on a single package, allow hardware to scale to higher CPU core counts at the cost of more memory heterogeneity and performance variability. This introduces challenges when existing query engines are deployed on chiplet-based CPUs, as current designs make assumptions about uniform memory access, cache locality and consistent core performance, e.g., leading to ineffective CPU utilization. In this paper, we analyse the performance impact when query engines ignore chiplet-specific properties. We demonstrate that a naïve deployment can result in a significant degradation of query processing efficiency, exhibiting non-linear scaling even within a single CPU socket domain. Based on comprehensive experiments, we explore approaches to deploy query engines on chiplet-based CPUs with improved performance: we show that distributing processing tasks according to a chiplet-aware strategy achieves higher resource utilization and scalability, yielding an up to 7× speedup compared to hardware-oblivious approaches.