Important: In practice, this means that the last convolutional layer before your Yolo2OutputLayer should have output depth of b(5+c). Thus if you change the number of bounding boxes, or change the number of object classes, the number of channels (nOut of the last convolution layer) needs to also change.
Label format: [minibatch, 4+C, H, W]
Order for labels depth: [x1,y1,x2,y2,(class labels)]
x1 = box top left position
y1 = as above, y axis
x2 = box bottom right position
y2 = as above y axis
Note: labels are represented as a multiple of grid size - for a 13x13 grid, (0,0) is top left, (13,13) is bottom right
Note also that mask arrays are not required - this implementation infers the presence or absence of objects in each grid cell from the class labels (which should be 1-hot if an object is present, or all 0s otherwise).