Please consider supporting us by disabling your ad blocker.
Please consider supporting us by enabling cookies.
Big performance increases (I/O) - Universe update dev dump 4 [30 OCT - 7 NOV]
Posted on November 30th, 2019 08:30 AM EST
This is the fourth universe update dump copied from our official Discord server, in the channel #universe-update-dev-news-dump. To receive universe update news as it happens, join our Discord channel here: Join the StarMade Discord Server!
Previous Discord dumps:
Between the dates of the 30th of October - 7th of November, the following was done:
Write/read overhaul, the goal being to eliminate lag caused by sector changes entirely. Sector changes would be hidden internally, and we'll switch to a more straightforward region system. This is split up into three main parts.
- Decoupling of data accumulation and the actual writing. Essentially we no longer need to synchronise the writing thread during writing, making I/O operations not affect performance.
- System to mark objects in a sector for writing, as well as a spawn/cleanup system for new sectors and for sectors no longer loaded.
- Sector change performance tweaks to make switching sectors a seamless experience.
As part of the decoupling, we've switched block data to off-heap memory (unsafe native memory). This is a lot faster in pretty much all operations. Memory for block chunks is preallocated in one big chunk of memory. This is so the access speed is as fast as possible by reducing cache misses.
It is then segmented for maximum optimisation for memory I/O. For writing, a second chunk is allocated as a buffer and memory can be copied over and queued up to be written. That copy operation is extremely fast, and that subsequent disk I/O operation would be done completely independently in another thread without the need to sync. The only thing we need to make sure of is that the object does not get unloaded while it's writing. This wouldn't cause a bad write (the data is already copied), but a load would read old data. The current system already has the same conditions, so nothing actually changes there.
The drawbacks for this are that if something does go wrong, it goes spectacularly wrong. So far, there does not seem to be any issues since switching to unsafe native memory. Excitingly this same procedure can be used to speed up other aspects of the game, such as lighting calculations. And since the new planet generation is written in C++, we can potentially eliminate overhead of passing arrays back and forth, since we can just tell the C++ library the memory address to store a chunk in.
What's coming up:
- New universe creation details, creating an ultimate goal for the game, conquer the galaxy! This would be a separate gamemode from our standard sandbox experience. Compacting resources and gameplay into a central area.
- Population system plans. A new "resource" to fuel and grow an empire, represented by physical NPCs!
- New planet discussion and screenshots of some of the planet work we've currently got in development! 🪐
likely starting on the write/read overhaul now. The goal is to eliminate lag etc from sector changes, making sector changes in general something that can be hidden internally, and instead use a more simple region system for the game (as was planned)
This update would incorporate different things
step one would be the decoupling of data accumulation and the actual writing. Doing that will enable putting removing any necessity to synchronize the writing thread during the actual writing, making I/O operations not affect the game at all.
The second one is the system that marks objects of a sector for writing as well as the spawn/cleanup system for new sectors and for sectors no longer loaded
The third one would be the actual sector change, making that as smooth as possible for the player.
This is likely one of the biggest parts of the universe update. because once that is in, i'll be able to restructure the universe.
After that I'll likely work on the new planets. I'm aiming to have both done so I can give a small snapshot this year. This snapshot version would be completely dysfunctional of course, but hopefully people could test out some of the new stuff under the hood.
as part of the uncoupling, I'm switching block data to off heap memory. This is a ton faster in pretty much all operations. it's pretty unsafe in case of mistake. However it is worth it. So the plan is that memory for block-chunks is preallocated in one big chunk of memory which is then segmented for the maximum optimization for memory I/O. For writing, a second chunk is allocated as a buffer and memory can be copied over and queued up to be written. That copy operation is extremely fast, and that subsequent disk I/O operation would be done completely independenly in another thread without the need to synch (only thing is to make sure the object doesn't get unloaded while it's writing. not because that would cause a bad write since the data already copied, but because a load would read old data. However, the current system already has the same conditions anyways, so nothing really changes there)
1st of November
Alright, chunk data is now running on unsafe native memory. So far there seems to be no issues. I added a manual range check just for debugging, which can be deactivated later for another little performance boost. It's now also possible to speed up some other aspects using the same tech (e.g. the lighting calculations)
4th of November
Still in the middle of memory stuff. But this is the kind of stuff i love doing most in programming.
7th of November
Ok. Got a nice manager going for the chunks stored in native memory. Memory will be reused as chunks get unloaded. Also protection against leaks my making sure that every chunk unregisters itself from that page.
This is also one huge chunk of memory, which means that access speed is as fast as it can be, by reducing cache misses.
Wouldn't get the same result with a heap array, since it is not guaranteed to be one continuous chunk of memory even. There is a flag for java to use big memory pages, which helps a little. This flag of course is only relevant for the heap. However, the chunks are now outside of the heap in spooky scary manually managed memory.
Using this kind of memory completely bypasses all java heap functionality, including garbage collection. The advantage is of course a fastest possible access speed, the disadvantage is that IF something goes wrong, it goes wrong spectacularly. With raw data as blocks, the potential of complete meltdown is relatively simple, as long as you make sure you only address the memory you allocated.
You could also store whole objects etc on there in which case any misstep would lead to catastrophe from random fields changing to complete object corruption.
Another nice side effect is, that since the new planet generation is written in C++, we can potentially eliminate the overhead of passing arrays back and forth since we can just tell C++ library the memory address to store the chunk in.