Browse Source

drm/amdkfd: Clamp EOP queue size correctly on Gfx8

Gfx8 HW incorrectly clamps CP_HQD_EOP_CONTROL.EOP_SIZE, which can
lead to scheduling deadlock due to SE EOP done counter overflow.

Enforce a EOP queue size limit which prevents the CP from sending
more than 0xFF events at a time.

Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Jay Cornwall 8 years ago
parent
commit
af68d87cac
1 changed files with 9 additions and 2 deletions
  1. 9 2
      drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c

+ 9 - 2
drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c

@@ -135,8 +135,15 @@ static int __update_mqd(struct mqd_manager *mm, void *mqd,
 			3 << CP_HQD_IB_CONTROL__MIN_IB_AVAIL_SIZE__SHIFT |
 			3 << CP_HQD_IB_CONTROL__MIN_IB_AVAIL_SIZE__SHIFT |
 			mtype << CP_HQD_IB_CONTROL__MTYPE__SHIFT;
 			mtype << CP_HQD_IB_CONTROL__MTYPE__SHIFT;
 
 
-	m->cp_hqd_eop_control |=
-		ffs(q->eop_ring_buffer_size / sizeof(unsigned int)) - 1 - 1;
+	/*
+	 * HW does not clamp this field correctly. Maximum EOP queue size
+	 * is constrained by per-SE EOP done signal count, which is 8-bit.
+	 * Limit is 0xFF EOP entries (= 0x7F8 dwords). CP will not submit
+	 * more than (EOP entry count - 1) so a queue size of 0x800 dwords
+	 * is safe, giving a maximum field value of 0xA.
+	 */
+	m->cp_hqd_eop_control |= min(0xA,
+		ffs(q->eop_ring_buffer_size / sizeof(unsigned int)) - 1 - 1);
 	m->cp_hqd_eop_base_addr_lo =
 	m->cp_hqd_eop_base_addr_lo =
 			lower_32_bits(q->eop_ring_buffer_address >> 8);
 			lower_32_bits(q->eop_ring_buffer_address >> 8);
 	m->cp_hqd_eop_base_addr_hi =
 	m->cp_hqd_eop_base_addr_hi =