The vast majority of the time spent in the processing of Shor's algorithm is in the discrete Fourier transform step. In the discrete Fourier transform we iterate from 0 to q, and for each possible value in that range we iterate over the entire register and perform some mathematical operations. It is trivial to divide this work among multiple process elements. One can simply iterate on each process element from 0 to q, and for each value in the range iterate over some prescribed subrange of the register.
In general Shor's algorithm simulation seems a good candidate for parallelization. The simulation can roughly be divided into three phases: prepossessing, simulation of the quantum register, and post processing. During the simulation of the quantum register, all the work is done in the form of applying the same operation to an entire array, where each array location represents one of the base states of the quantum register. This agrees with our conception of how a quantum register would function, as in a quantum computer, we are not free to perform an operation on only certain portions of the superposed state of the register, we must perform the operation on all portions.