I tried splitting input fetching and rendering to two threads. But this seems not as simple than I thought. There are still some questions about:
How could I share some variables?
How could I stop a thread from another?
I have tried to make a small example program, but there I get a bunch of cryptic error messages. So I would be glad if someone would show me what's wrong and how to fix it.
Also, I feel that my example is badly designed, so I would appreciate suggestions on how it could be better made.
First, the game loop is not really parallelizable. It consists of three steps:
1. Input/event handling.
2. Game state update.
3. Render.
Although the steps may be individually parallelized to some extent, the process as a whole cannot. Step 2 can't begin before step 1 is fully completed, and step 3 can't begin before step 2 is fully completed.
Second, most environments won't let you query input events from any thread other than the main.
Third, most graphics libraries won't let you call graphics functions from any thread other than the main.
Because at some genres of games there is a not to neglecting amount of running time which is needed to compute besides the rendering part. E.g. the A.I of a realtime strategy game. Here the render function shouldn't wait until the A.I calculation has finished. You're right, at this example program, it's not needed to run some parallel tasks. That should merely a frame or design study. Also, It's a matter of flavor at me. Parallelizable parts should be runnable as such, so we don't need to care about parts which would maybe longer run than the frame time.
E.g. the A.I of a realtime strategy game. Here the render function shouldn't wait until the A.I calculation has finished.
That's easy enough to do. Just put those specific functions that are allowed to run longer than one frame in their own threads, and when they complete they can notify their result to the game loop by queuing an event.
It's a very different problem from the one you asked about in the OP. Input handling is normally very fast, and if rendering is so slow that it takes longer than one frame, well you may as well start dropping frames. What other alternative is there?
A great deal has to do with the game engine you're using.
As others pointed out, most engines don't support threading, but there are some that do.
Some designs allow for physics to run multiple threads, or on the GPU (or both). It is similar to what @helios pointed out relative to A.I., yet that also brings up the question of where does the A.I. come from?
A.I. may be from the game engine, so it's support becomes the limiting (or enabling) factor.
A.I. from other sources may work like physics engines, in that they attach controllers to renderable objects in some way.
The GPU pipeline is already threaded in a way, because the GPU is a highly parallel engine itself. The rendering phase, however, is rather specifically associated with the visual output and often tied to frame rates of the display system (to avoid image tearing/flickering), which means the game engine usually presents the rendering phase (after CPU prep) as a "black box" of sorts that the application merely waits on (for it to finish a frame).
There's no reason one can't process some types of information while waiting for a frame flip. This is frequently done (physics is a classic in this regard, as it all but runs independently of the game's frame rate from a certain viewpoint - this is to say 'real time' continues no matter what the display is doing).
All of this requires coordination. One of the basic types of coordination is a thread join. If you're not familiar with it, you're still at the beginning of the study of threads.
Yet, thread joins may not be a good choice for an animation cycle. When threads join it means they've exited, and generally an animation completes one step, then immediately loops onto the next step (usually the next frame).
So, you're likely to need other tools, like mutexes, autoreset events, conditional variables - all tools used to synchronize threads or protect single resources from corruption by multiple threads competing for control.
You were correct to attempt and example for study. I would suggest, however, that you dial back a notch and try threaded designs on simpler notions before attempting to figure out how to handle the complex needs of a game engine (which, to this point, could be any of them - we haven't heard you say).
A classic starter example is a serial number issued to multiple threads. This one is basic, fairly simple and clear. A serial number is simply a count, usually starting at an arbitrary index (could be 0, could be 1000). One images that the serial number is the ID of resources to be allocated, and the rule is no duplicates can be permitted. There are multiple ways of doing this.
First, make one that gets it wrong. I find this particularly informative. Since this is a simple study experiment, perhaps you'll permit global variables.
Form a global integer, initialize it to some starting value, then launch as many threads as you have cores (or cores * hyperthreading). Have each thread create a std::vector to hold serial numbers. Then, have them loop some quantity (maybe 10,000) and use the global integer to obtain serial numbers. Each that would read from the serial number, increment it, and use the number it read as the serial number to append to it's own vector.
This will produce duplicates. You should finish by having the main thread test all of the vectors collected by the thread to find and report the duplicates.
Now, if any of that is beyond you, you're not ready for threading. You should be able to figure out how to create vectors the main thread could later test, for example.
Now, to eliminate the duplicates try one of the two strategies below:
1) Use a mutex to protect the global integer which provides the "next" serial number. Each thread must lock on that mutex before reading and incrementing the integer (then release the lock). If you get it right, there will be no duplicates.
2) Instead of a mutex, implement the global integer which provides the "next" serial number using std::atomic< int > (otherwise known as std::atomic_int via a typedef). By nature, the atomic integer is a "lock free" approach to forcing an instruction to be synchronized among threads. The increment operator of the atomic integer is automatically "atomic" - that is, only one thread can do it at a time. If you get it right, there will be no duplicates.
These two would inform you with practice on the basics of thread synchronization.
You should consider looking more broadly into C++ concurrency support in the standard library. Pay particular attention to futures, the associated async functions which 'create' them. This is considered the "modern" approach to fairly simple threading of tasks using a robust and hardware "aware" methods (that is, it knows how many cores the platform has running). You will need to search for tutorials.
Most people think to look for how to launch a thread and start running things in several threads. What they don't usually consider is how many cores the user has (which takes inquiry), and how busy they are (don't launch 30 threads on a 2 core computer if you're not sure why you'd do that). std::async (and futures) alleviates many of the mistakes students make when threading.
Sometimes what you need, instead, are parallel calculations. If you need to rip through 100,000+ data elements with a simple calculation (something like changing the contrast/brightness on an image, for example), you may do well to run that as 4 threads each running through 25,000.
OpenMP may be of interest here. OpenMP, in an oversimplified description, can turn for loops into parallel for loops. There are also parallel algorithms like those from the std::algorithm library, and the Boost library has plenty to offer about parallel computation, processing and more.
> Just put those specific functions that are allowed to run longer than one frame in their own threads, and when they complete they can notify their result to the game loop by queuing an event.
That approach sounds good.
Another reason for why I'm trying parallelizing input handling and rendering is because some libraries (frameworks) provide event handling with via 'wait()'. So if rendering and input handling run at its own thread, we don't need to poll for events at every frame.
Here a very simple example with using SFML library:
#include <SFML/Graphics.hpp>
#include <thread>
#include <chrono>
int main()
{
sf::RenderWindow win( sf::VideoMode(800, 600), "Test" );
while( win.isOpen() )
{
sf::Event event;
// What, when we here use: while( win.waitEvent( event ) ) ?
while( win.pollEvent( event ) )
{
if( event.type == sf::Event::Closed )
win.close();
}
// this stuff I want to run in parallel so we need not poll for events
update();
win.clear(sf::Color::Black);
win.draw( my_stuff );
win.display();
std::this_thread::sleep_for( std::chrono::milliseconds(50) );
}
}
Here we have at each frame a poll request for user input.
So how do we need to change the code if we want fetching user input and rendering in parallel?
Thank you @Niccolo for your hints, I have seen your post just after I responded to helios'. So I will look for some tutorials (or a good book) which handles multi-threading. But as far as I have seen so long, good tutorials to multithreading are rare at the web. So could you recommend a good book or another resource which handles the basic concept of the topic?
// What, when we here use: while( win.waitEvent( event ) ) ?
[...]
// this stuff I want to run in parallel so we need not poll for events
I'm pretty sure SFML will require that both pollEvent() as well as the rendering functions be called from the same thread (let's call it A) that created the sf::RenderWindow instance, so that means that the only way to achieve this is to move the update() call into its own thread (B). But that means that you'll need to synchronize thread B with thread A, because update() needs to know the user's input at least once in a while. For example, if the player the "forward" button, the game should update the player character's state as soon as possible. Additionally, rendering needs the current game state. Again, if the user has pushed "forward", you want to display feedback as soon as possible.
In short, the game loop would have to look somewhat like this:
1 2 3 4 5
while (running()){
auto event_data = handle_events();
auto render_data = synchronize_data_across_threads(std::move(event_data));
render(render_data);
}
(The specifics of how any of this is implemented isn't important.)
Since update() would be running while handle_events() is running, that means it would need to be updating in response to the events from the previous frame.
At this point it should be clear that all you'd be doing would be increasing the input latency by 1 frame (normally ~16 ms), on top of whatever latency the OS has added.
I know it's tempting to think "oh, I can just run these two functions in parallel and save some time", but a lot of the time it's really just not possible. If you need to make a cake, you can't bake it at the same time that you apply the frosting.
Also, always remember to keep a problem-solving-oriented mindset: don't try to solve a problem that doesn't exist yet. Handling events is usually extremely quick, because the data source is a human being. If a frame has to handle ten or so events, that's an unusually high number. Nearly no frames will handle more than one or two events.
Depending on the type of game, nearly all of the frame time will go to update() and render(). If render() is taking too long, there may or may not be something that can be done (Vulkan and DirectX 12 have added parallel rendering). If update() is taking too long, there's usually one or two things that can be done, but it will be highly dependent on the type of game. Even if possible, a lot of the time rather than adding parallelization it makes more sense to spend some optimization effort, or just bite the bullet and make the level simpler so update() can finish on time.
Yeah, I made simple stuff with SFML and got 100.000 fps without curbing. So it's more a technical challenge to me making input processing parallel to the game loop.
So I thought I could make a global game state singleton, which will get used from update(). The other thread will update this state object if a matching event happens. Both threads don't need to wait upon each other.