Rsyslog has both "main" message queues and action queues. [Actually, "main message queues" are queues created for a ruleset, "main message" is an old-time term that was preserved even though it is no longer accurate.]
By default, both queues are set to one worker maximum. The reason is that this is sufficient for many systems and it can not lead to message reordering. If multiple workers are concurrently active, messages will obviously be reordered, as the order now, among others, depends on thread scheduling order.
So for now let's assume that you want to utilize a multi-core machine. Then you most probably want to increase the maximum number of main message queue workers. The reason is that main queue workers process all filters for all rules inside the rule set, as well as full action processing for all actions that are not run on an asynchronous (action) queue. In typical setups, this offers ample of opportunity for concurrency. Intense tests on the v5 engine have shown near linear scalability to up to 8 cores, with still good improvements for higher number of cores (but increasing overhead). Later engines do most probably even better, but have not been rigorously performance tested (doing it right is a big effort in itself).
Action queues have a much limited concurrency potential because they do only a subset of the work (no filtering, for example, just generating the strings and executing the actual plugin action). The output module interface traditionally permits only one thread at a time to be present within the actual doAction() call of the plugin. This is to simply writing output plugins, but would be needed in any case as most can not properly handle real concurrent operations (think for example about writing to sequential files or a TCP stream...). For most plugins, the doAction() part is what takes most processing time. HOWEVER, multiple threads on action queues can build string concurrently, which can be a time consuming operation (especially when regexpes are involved). But even then it is hard to envision that more than two action queue worker threads can make much sense.
So the bottom line is that you need to increase the main queue worker threads to get much better performance. If you need to go further, you can fine-tune action queue worker threads, but that's just that: fine-tuning.
Note that putting "fast" actions (like omfile) on an async action queue just to be able to specify action threads is the wrong thing to do in almost all situations: there is some inherent overhead with scheduling the action queue, and that overhead will most probably eat up any additional performance gain you get from the action queue (even more so, I'd expect that usually it will slow things down).
Action queues are meant to be used with slow (database, network) and/or unreliable outputs.
For all other types of actions, even long-running, increasing the main queue worker thread makes much more sense, because this is where most concurrency is possible. So for "fast" action, use direct action queues (aka "no queue") and increase the main thread workers.
Finally a special note on low-concurrency rulesets. Such rulesets have limited inherent concurrency. A typical example is a ruleset that consists of a single action. For obvious reasons, the number of things that can be done concurrently is very limited. If it is a fast action, and there is little effort involved in producing the strings (most importantly no regex), it is very hard to gain extra concurreny, especially as a high overhead is involved with such fine-grained concurrency. In some cases, the output plugin may come to help. For example, omfile can do background writes, which will definitely help in such situations.
You are in a somewhat better shape if the string builder is CPU intense, e.g. because it contains many and complex regexes. Remember that strings can be build in parallel to executing an action (if run on multiple threads). So in this case, it makes sense to increase the max number of threads. It makes even more sense to increase the default batch size. That is because strings for the whole batch are build, and then the action plugin is called. So the larger the batch, the large these two partitions of work are. For a busy system, a batch size of 10,000 messages does not sound unreasonable in such cases. When it comes to which worker threads to increase, again increase the main queue workers, not the action ones. It must be iterated that this gives the rsyslog core more concurrency to work with (larger chunks of CPU bound activity) and avoids the extra overhead (though relatively small) of an async action queue.
I hope this clarifies worker thread settings a bit more.