RTS Devlog #1 – Back to native & Editor work

In 2018 I started experimenting with Unitys ‘DOTS’ (data orientated tech stack). I was failing to get decent performance in standard MonoBehaviour Unity for a experimental RTS project. Fast forward to 2021 with a few years experience of DOTs I’ve decided to switch to native C++. Previous posts on the blog were using Unity DOTs. The following post details my experience going back to a simple & not overly complex subset of C++ (ie. I not using all ‘modern’ C++ features if there’s no good reason too).

Where ‘native’ is used in this post it is referring to writing in a native language, in this case C/C++, as opposed to Unity C#.

Why switch?

Some of the reasons for switching are not really Unity specific as such, but more just general 3rd party engine/middleware things. As previous blog posts showed, I found myself rewriting the navigation (wanted a deterministic, higher performing dynamic 2d planar navmesh), animation (to cope with thousands of characters), map editor (needs to work in standalone and be usable by players) and eventually even the maths and physics (determinism and other reasons).

The breaking point came writing a delaunay triangulation implementation and having to hack quite a bit of C# code with unsafe regions & just being limited in general by the DOTS container types. I did it, and got it working (see previous blog post), but it was a total square peg in a round hole. I didn’t really feel comfortable with the state of the code and it’s maintanability into the future. This really made me re-evaluate what I was doing. Why go to all this effort if I can’t run it in 1 year without even more work, let alone 3 or 6 years later. My gut feeling is that code written in C/C++ is going to require less catch-up work in 2, 5 or 20 years into the future compared with a 3rd party middleware specific language, like C# DOTs (DOTs is effectively like using another language compared to normal C#). I like not having to worry about the future for DOTs or Unity anymore, I like being able to implement features myself more, and most of all I like not worring about middleware going the way of Microsofts XNA (dead).

Experience going back to C/C++

Despite all this, my initial plan was to convert only the simulation code to C++. I would create a plugin and then continue to leverage a 3rd party engine for graphics & other generalist things. However, I started enjoying programming in a native project again so much that I decided to convert all the way, graphics and all. I’m not a C/C++ super-fan, but it’s still a very viable language for game dev. I like straight-forward code that compiles fast and runs quickly. This can be achieved with C/C++ by keeping code simple, well organised and not over-using certain features of the language.
My iteration time with Unity from a single line change in the average gameplay implementation file to the app running was 13 seconds. In the C++ project it’s 0.5-2 seconds. This alone makes working on the project so much more enjoyable and dramatically cuts down on distractions. Obviously the iteration time may vary as the project continues but I don’t see it changing that drastically at this point. Having said all that, I can still see the pros of working with a shared general engine in a larger team or if the engine is a better fit to the project, but it’s not the use-case for this project at the moment.

Native app task list

There was suddenly quite a few things to code. Fortunately I’ve done most of these things before at somepoint so it wasn’t long before I had a equivalent app in C++. And eventually I passed the features of the Unity version (not Unity the engine, obviously! But the things I need in this project). Here is a limited list of the things I implemented in approximate chronological order:

  • App input, window setup (using GLFW)
  • Basic primitive debug drawing API (OpenGL for now), Camera
  • Custom memory allocation setup, persistent or frame/scratch allocators
  • Custom container classes (Stretchy array, hashmap, multihashmap) that can take custom allocators & other things to avoid std lib
  • Initial heightmap terrain work
  • Hierarchical transform system
  • Import box2d & glm (I’d like to replace glm later with a leaner, smaller maths lib but for now it’ll do)
  • Fix point maths & convert box2d to fixedpoint math
  • Delaunay navmesh
  • All gameplay systems (pathfinding, boids, spatial partitioning, attacking and lots lots more)
  • Tried entt (ECS) then wrote my own simpler & basic entity manager with pools
  • Import ImGui, Notifications and setup debug UI
  • fbx loading, custom binary model format, instanced mesh rendering, skeletal animation
  • Various shaders for terrain, screenspace, skinned mesh & static mesh
  • Basic depth-buffer shadows to add some depth to objects in the scene
  • Minimalist in-app profiler
  • Custom level file format
  • Editor work, Translation & rotation gizmos, map save/load, Terrain brushes, prefab palette
  • Generic config variable file
  • Hot-reloading for the config var file & shaders
  • GameDatabase/ResourceManager & very basic prefab workflow stuff

I had done variations of most of these things before, so it was quite quick to get it all done. It was nice to have a firmer target this time round. A mistake I’d made in the past was over-engineering by trying to write a generalist engine, when I didn’t really need one. I focussed on actual requirements and semantically compressed code or tidied up as appropriate afterwards. With all the above in the project I am still able to have an iterative build time of ~1 second. Full rebuild or changing something in a precompiled header can take lots longer, but it’s very rare I need to do that.

Procedural obstacle problems

In my previous post: RTS Pathfinding 2, I spoke about generating polygon bounds from chasms in the terrain by comparing height deltas of adjacent heightmap cells. Initially, I planned to expand this to other terrain features such as mountains, but when implementing in practice it was quite hard to solve robustly. The images below show problems that come up when I tried to scan height deltas for hills that had ramps leading up to them. In the end I opted for a simpler solution of simply manually placing collider nodes.

Cliff painting

As mentioned above, procedural object generation of terrain features didn’t seem like a good choice anymore. It’s hard to precisely configure the obstacle shapes. There was a lot of noice generated from just using height deltas to draw polygons around mountains/chasms which ultimately meant awkward navmesh/rigid body obstacles were produced. I wanted to be able to control obstacle shapes very well so that I could focus on setting up gameplay/map design just how I wanted it. I didn’t want to worry if a polygon wrapped around a terrain feature correctly.

Lots of RTS games have ramps and cliffs. Cliffs allow for easily setting up mazes and directing the flow of units, & thus tactics across the map. Blizzard RTS games tend to have well defined cliffs. These are essential in a way for the Terran race in Starcraft2 which depends on ‘turtling’ up with strong base defences. Total Annihilation or Supreme Commander style RTS games tend to have lots more open maps but with quite interesting features. I wanted a bit of both but I thought the way Blizzard RTS games like Warcraft3/Starcraft2 maps were setup might provide for more interesting tactics & gameplay, which is why I added a cliff painting tool.

The cliffs are baked into the terrain with a preset height gradient brush. Each cliff covers 5×5 height map cells with strengths applied between a lower and upper cliff height. Ramps can also be painted to allow paths between cliff levels. Ramps are done in the same way but with a smoother elevation gradient. Rigid bodies are automatically generated at cliff boundaries. These will act as physical barriers to units at or below the cliff level, and will also be inserted into the navmesh as obstacles.

A more complex cliff base ramp with multiple entry ramps & visualised in the navmesh.

The cliff obstacles are 100% predictable allowing for the designer to focus on crafting the map. However, I didn’t want the game world to look too uniform and grid-like. I still wanted more varied terrain features, if only for background visuals, so I added a manual obstacle painting tool to the editor.

Terrain brushes

I added a terrain brush to the editor that allowed basic height altering operations such as stamp, average & invert and then sourced a few textures for it. This allowed me to very quickly paint down complex and interesting terrain around the map with a few clicks.

I painted down some mountainous terrain near the map boundaries and next to base cliffs. This made the cliffs appear to be merged into more natural terrain formations. I was quite happy with the end result which allows endless room for visual variation in the terrain with simple, predictable & 100% controllable obstacles for gameplay.

The map after painting mountains and chasms into the terrain.

Gameplay ready map

The focus now was to get a map ready that could be used for unit gameplay experimentation. This really just required obstacles for cliffs, mountains & chasms (navmesh only obstacle with no physical barrier). But it was nice to add some basic graphics in the form of terrain brushes to see these things better. I went ahead and threw together a very basic splatmap terrain shader too, so that I could visualise things a bit better.

I wanted one more map feature before starting with gameplay prototyping, which was prefab obstacles, such as rocks. In a Jonathan Blow stream I had previously seen a very neat looking prefab palette tool. I added a similar tool to my editor which lets me visually select objects to place on the map. I added a few rock meshes to the prefab database and scattered a few around a part of the map.

The map now feels ready to use as a base for some gameplay experimentation. The embedded video for this post shows more of the general C++ tech progress and map editing.

RTS Pathfinding 1 – Flowfields

Since Unity’s data-orientated tech stack (DOTS) appeared there have been quite a few tech demos testing it out. Writing flowfield implementations seems to have recently become popular in the niche crowd of early adopter Unity game tech enthusiasts. Flowfields were first used in games such as Supreme Command 2 and then later Planetary Annihilation. They make sense when you have very large numbers of agents path-finding. The initial cost for a single agent is going to be higher than using something like a NavMesh but as you get into the hundreds or thousands of agents they may scale better.

For an RTS pathfinding prototype I initially used Unity’s built in NavMesh with the hybrid-DOTs navmesh query API. It was very fast for path queries but struggled performance wise when updating the NavMesh with dynamic obstacles. When weighing up the pros/cons of various methods I thought flowfields could be well suited since my problem-case involves large numbers of units. The grid data-structure inherent with flowfields can also be useful for storing other map information such as additional movement costs, heights, island IDs (an island is a single isolated area of the the grid that can’t path to another island). The grid data structure is also quite quick to look-up information in. Finally, for an RTS style project, pathfinding is one of the most critical bits of tech and having full control & understanding over it with no blackboxes was desirable (which wasn’t the case with using Unitys out-of-the-box implementation) and I had some ambitious highly specific goals that I wasn’t sure would be accommodated by Unitys built-in solution.

Making a basic dijkstra flowfield is one thing but a robust RTS game can be a lot more involved, and may require a fair few additional items such as:

  • Flowfield tiles
  • Hierarchical A* or some other high level pathing between tile portals
  • Caching of paths & flowfield tiles
  • Eikonal solver for smoother cost field generation
  • Greater than 8 directions for each cell for smoother movement
  • Line of sight (LOS) flags for smoother movement
  • Parallelised generation of paths and flowfields

There’s lots of media on the internet of people doing a basic implementation but it’s hard to find many that are using it in more complex situations such as an RTS game. The basics of flowfields are already well explained in numerous blogs and youtube videos such as this great one at HowToRTS https://howtorts.github.io/2014/01/04/basic-flow-fields.html. Although I might touch on some basics below, in general the rest of this post will assume a knowledge of basic flowfields. The following is more a collection of notes & workings I collected recently whilst trying to implement flowfields in a prototype RTS, rather than a robust tutorial.

Dijkstra vs Eikonal cost spread

The Dijkstra flood filling algorithm is used to expand the cells of the flowfield. Given a starting cell, the path goal, it will spread to all valid neighbour cells.

Flowfield spread visualised with different iteration counts. Cost increases as colour changes outwards from centre.

Whilst running the Dijkstra algorithm you can calculate cost in a basic way by simply incrementally increasing as it spreads (like image 2 below), however, this can lead to diamond like patterns forming in the cost field. This may in turn lead to slightly geometric looking path movement.

Image 2. Dijkstra cost field with a very low cost colour wavelength, basically the colour changes more frequently as the cost changes. This can help to see cost-spread patterns better.

Smoother cost spread can be generated by using using an Eikonal solver as shown in image 3. This can lead to more optimal paths, especially if there are up to 16 possible directions per a cell available.

Image 3. Eikonal equation generating a smoother more spherical cost field.

A sample Eikonal equation implementation is available here: https://github.com/SCIInstitute/StructuredEikonal/blob/master/src/cuda_fim_kernel.cu

Flowfield Tiles

To create flowfields across the entire map area would be very costly so an initial optimisation is to only generate the flowfield across a high-level (or hierarchical) path. To accomplish this the map is divided up into tiles. Flowfields will be generated for the tiles that are on the hierarchical path. After some basic optimisation a single 50*50 flowfield area generates in ~0.3ms. The map itself is ten times this size. The map is represented by a 500*500 grid, which is divided into 100 50*50 tiles.

The underlying map grid. Green cells are passable, red cells are obstacles.

When a path is required, hierarchical A* is run through tile portals (explained more below). The result of this is a list of tiles that the path runs through. Flowfields are requested for these tiles and the units read the direction value on their cell to get to the destination.

Flowfield tiles along a path

Each of these map tiles has portals along the edges that act as hierarchical A* path nodes. These portal nodes are generated at initialisation or even offline entirely and loaded from disk. A better description of the portals is explained by Eiljah who did the SupCom2 pathfinding here: http://www.gameaipro.com/GameAIPro/GameAIPro_Chapter23_Crowd_Pathfinding_and_Steering_Using_Flow_Field_Tiles.pdf

A map tile with portal cells in pink and paths in green between the portals. The length of these paths is the cost of the edge in the hierarchical A* graph.
Green lines visualising the hierarchical graph edges

Other tips

Debug visualisation is very important obviously with this sort of work. It’s really worth putting some time in here to have a robust visualisation system to keep you sane & discover bugs quickly.

Using a branchless math.select in GetNeighbours to replace if branches gave a crazy 10* speed-up compared to the if else branching I was doing previously. Although this will vary on the compiler it’s probably worth giving some serious consideration to, profiling before & after.

eg. here is a snippet of the branching version of some code to find neighbour indices on a grid:

if (x > 0)
{
	int neighbourIndex = ((x - 1) + (gridWidth * z));
	neighbours [neighboursCount ++] = neighbourIndex;
	
	if (z > 0)
	{
		neighbourIndex = ((x - 1) + (z - 1) * gridWidth);
		neighbours [neighboursCount ++] = neighbourIndex;
	}

	if (z < (gridHeight - 1))
	{
		neighbourIndex = ((x - 1) + (z + 1) * gridWidth);
		neighbours [neighboursCount ++] = neighbourIndex;
	}
}

And the branchless version:

bool leftValid  = (x > 0);
bool rightValid = (x < (gridWidth - 1));
bool upValid    = (z < (gridHeight - 1));
bool downValid  = (z > 0);

neighbours [neighboursCount] = math.select (neighbours [neighboursCount], (x - 1) + (gridWidth * z), leftValid );
neighboursCount = math.select (neighboursCount, neighboursCount + 1, leftValid );

neighbours [neighboursCount] = math.select (neighbours [neighboursCount], (x - 1) + (z - 1) * gridWidth, leftValid && downValid );
neighboursCount = math.select (neighboursCount, neighboursCount + 1, leftValid && downValid );

neighbours [neighboursCount] = math.select (neighbours [neighboursCount], ((x - 1) + (z + 1) * gridWidth), leftValid && upValid );
neighboursCount = math.select (neighboursCount, neighboursCount + 1, leftValid && upValid );

Making sure to access large grid/tile based arrays in a cache aware way was very important. eg. if you are iterating over large arrays (such as flowfield tile data) make sure you are going in the right column/row major direction to avoid jumping about in memory, or just use 1 rather than 2 dimension loop statements. This was another thing I got wrong first but fixed after profiling and checking the code.

Writing the flowfield implementation started with the need to have more control over the pathfinding implementation but it ended up being a decent revision for writing faster data-orientated code.

Closing Notes – movement behaviours

Having got reasonably well optimised and usable flowfields working I soon found out I was still someway from having large group movement work without any issues. A single unit could get to its destination fine. A small group was ok. But when 100 or so units all had to move around a corner to a common destination they tended to cluster quickly and end up forming into a single/double file line, like a line of ants. This was not what I had in mind. My original goal was to achieve something like Starcraft2 which has with a very smooth & responsive level of control for group movement. Specifically, when going around a corner, I wanted the group to move around a centralised centre as one, more or less maintaining relative offsets to each other. A lot of this would be dramatically helped by adding some boids style localised forces. However, even after some initial work in that area I couldn’t quite get it to fix the problem entirely with large groups of units when using flowfields.

Formations forces applied in addition to the flowfield are a potential idea to solve this problem, but I’ve not been able to get this to work yet in practice. The group movement just didn’t look well suited to the situation. Although it could be more appropriate when trying to model an actual swarm of insects or liquid like things. I might revisit flowfields for swarm style units or multiple goal based paths (such as close combat fighting with groups) in the future.

I shelved the flowfield work whilst keeping it as an option to switch back to if required and started on a new approach. I went back to using a more traditional pathing solution involving waypoints at path corners which you can then apply offsets to and assign individual positions per unit. This method would also need a lot of boid forces interplay but the path destinations would hopefully be more precise at least.

Resisting the urge to go back to Unity NavMesh and its slow rebuild/update times, I started using my existing ‘NavGrid’ with plain old grid based A* searches. Although the path search cost for a grid is higher than I’d expect from a good NavMesh implementation the insertion or rebuild time for a grid based graph is trivial, and I had ambitious ideas that involved regular updates of the graph. I tweaked the heuristics and added a tie-breaker http://theory.stanford.edu/~amitp/GameProgramming/Heuristics.html to improve the efficiency. Then I calculated the path ‘corners’ or waypoints along the path.

The path corner positions could then be used to calculate position offsets of varying length depending on the group size allowing for better formation movement.

This will be continued in RTS pathfinding part 2.

Deterministic Prototyping in Unity DOTs

I wanted to do some prototyping with time-manipulating mechanics in a casual RTS project. I always found the gameplay and technical implementation details of games that have experimented in this area (such as Braid and Achron) very interesting. To begin with I’m curious to see what it would be like to allow players to see the future and also have the ability to rewind back to the past. I thought it would be fun to figure out the technical implementation and was curious if it would make for interesting gameplay too.

Past and Future

Rewinding to past states can be accomplished with recording game state. This could be similar to the method described in Braid, though I’m not sure yet if I’d want all the frames up to the rewind time or just a single snapshot of the world at the target time in the past.

To know the ‘current future’ (the future assuming no new player inputs) I plan to run a second version of the game simulation ahead of the present. For the future to be true the simulation must be deterministic. I’ve made a deterministic RTS in the past using fixed-point maths but Unity’s new DOTs (data-orientated-technology-stack) comes with a Burst compiler that is planning to support cross-platform deterministic floating point maths eventually. In addition it has multi-threading job and entity component systems that have got a lot better over the last two years. It seemed like a good fit to start using these to write my simulation code in. I did also dabble in writing a native plugin in C++ but in the end it became apparent I was going to spend a great deal of time recreating things that Unity can adequately do. The burst compiler doesn’t give cross-platform determinism yet but Unity ECS, and DOTS Physics are deterministic on the same CPU architectures. This is enough to start prototyping with for now.

Simulation & Presentation

I setup a basic casual ClashRoyale style RTS with debug rendering. The game battle code is split between the simulation, which is the battle logic, and the presentation, which is the rendering of the simulation state. The simulation can be stepped multiple times independently of a presentation update. This is useful when running multiple worlds (such as a predicted future world in this case) where you may not want to render after every update. It’s also a great debug feature. To display the future I plan to run a second simulation world 7 seconds ahead of the present time. The first task is to ensure that the simulation is deterministic so that the future is true.

A reasonably busy test battle scenario was setup and run multiple times to start testing determinism. It was quickly obvious that the sim wasn’t deterministic at all, with units in different positions & health states across multiple runs. To address this, checksum hash calculations were added for key data such as unit positions, health and locomotive forces in order to diagnose where determinism was lost.

The checksum calculations are logged to a file at the end of the sim update. Each checksum entry has the value and a description which is usually the file and line number. After a few checksum calculations were added the determinism break was found to come from some multi-threaded jobs in the locomotion & health system. Realising that Unity’s NativeMultiHashmap.Concurrent is not deterministic; I removed the use of ScheduleParallel calls involving this container (it wasn’t especially necessary anyway). This lead to identical checksum values on every run. Having the ability to step the sim up to and over 1000 updates in approximately one second (this skips rendering and updates the sim in a loop from a single FixedUpdate call), really helped increase iteration time.

Being able to step 1000 updates in the sim instantly can speed up debugging

Implementation Details

To have this level of control of the simulation and presentation systems, ICustomBootstrap was used to override Unitys ECS world creation. All the standard Unity systems are created but not added to the automatic player update loop. Game systems are not added to any of the Unity system groups or the player loop; I manually update the game systems inline as shown below.

int logicFramesToProcess = m_paused ? m_singleStepCount : 1;

if (logicFramesToProcess > 0)
{
   for (int step = 0; step < logicFramesToProcess; ++step)
   {
    StepSimLogic ();

    int simStepsProcessed = m_simStepCount - 1;
    Checksum.FlushToFile (simStepsProcessed);
  }

  StepViewLogic ();
}
void StepSimLogic ()
{
  // Unity initialisation systems
  m_world.GetExistingSystem<InitializationSystemGroup> ().Update ();

  // Update battle simulation systems here
  m_world.GetExistingSystem<Sim.ExportPhysicsEventsSystem> ().Update ();
  m_world.GetExistingSystem<Sim.AttackTargetSeekSystem> ().Update ();
  m_world.GetExistingSystem<Sim.LocomotionSystem> ().Update ();
  ...

  // Unity simulation systems
  // + Sim.ExportPhysicsEventsSystem
  m_world.GetExistingSystem<SimulationSystemGroup> ().Update ();

  SimRunningTime += m_simDeltaTime;
  m_simStepCount += 1;
}
void StepViewLogic ()
{
  // Update battle presentation systems here
  m_world.GetExistingSystem<View.UIInputProcessor> ().Update ();
  m_world.GetExistingSystem<View.WorldToScreenSystem> ().Update ();
  m_world.GetExistingSystem<View.ProjectileRenderCreator> ().Update ();
  m_world.GetExistingSystem<View.LinearMovementSystem> ().Update ();
  ...

  // Unity presentation systems
  m_world.GetExistingSystem<PresentationSystemGroup> ().Update ();
}

For now I’ve preferred to update most of my own systems manually inline rather than using Unitys [UpdateBefore(systemA),UpdateAfter (systemB)] attributes. However, it is important to note that there is a CollisionEvents system that must be updated after StepPhysicsWorld and before EndFramePhysicsSystem. CollisionEvents is therefore added to the standard SimulationSystemGroup so its update is coordinated correctly with the Unity.Physics systems.

ICustomBootstrap is overridden to store sim, view or unity systems in lists and create all worlds required. Note there is no call for ScriptBehaviourUpdateOrder.UpdatePlayerLoop(world); because I want to manually control system update order and timing.

public bool Initialize (string defaultWorldName)
{
  Debug.Assert (s_instance == null); 
  s_instance = this;

  Debug.Log ("BattleManager.ICustomBootstrap.Initialize");

  IReadOnlyList<Type> systems = DefaultWorldInitialization.GetAllSystems (WorldSystemFilterFlags.Default);
  List<Type> simGameSystems = new List<Type> ();
  List<Type> viewGameSystems = new List<Type> ();
  List<Type> unitySystems = new List<Type> ();

  for (int i = 0; i < systems.Count; i ++)
  {
    if (systems [i].Namespace == "Sim")
    {
      simGameSystems.Add (systems [i]);
      continue;
    }
    else if (systems [i].Namespace == "View")
    {
      viewGameSystems.Add (systems [i]);
      continue;
    }

    unitySystems.Add (systems [i]);
  }

  //
  // Make an exception for ExportPhysicsEventsSystem as it must be updated
  // between various Unity.Physics.Systems

  simGameSystems.Remove (typeof (Sim.ExportPhysicsEventsSystem));
  unitySystems.Add (typeof (Sim.ExportPhysicsEventsSystem));

  //
  // Initialise worlds

  m_presentWorld = new WorldController ();
  m_presentWorld.Initialise ("sim", simGameSystems, viewGameSystems, unitySystems);

  World.DefaultGameObjectInjectionWorld = m_presentWorld.m_simWorld;

  m_alternateWorld = new WorldController ();
  m_alternateWorld.Initialise ("past", simGameSystems, viewGameSystems, unitySystems);

  const bool defaultWorldInitialisationDone = true;
  return defaultWorldInitialisationDone;
}

Later in the initialisation flow a single update of m_presentWorld is done before its data is copied. m_presentWorld was set to the DefaultGameObjectInjectionWorld so it contains all the converted GameObject data. Copying the data over gives both worlds the exact same starting state.

//
// Update present world once for game object conversion systems and then
// copy all entities into the alternate world.

m_presentWorld.m_simWorld.Update ();
MemoryBinaryWriter memoryWriter = new MemoryBinaryWriter ();
SerializeUtility.SerializeWorld (m_presentWorld.m_simEntityManager, memoryWriter, out object [] unityObjects);

unsafe
{
  MemoryBinaryReader memoryReader = new MemoryBinaryReader (memoryWriter.Data);
  ExclusiveEntityTransaction transaction = m_alternateWorld.m_simEntityManager.BeginExclusiveEntityTransaction ();
  SerializeUtility.DeserializeWorld (transaction, memoryReader, unityObjects);
  m_alternateWorld.m_simEntityManager.EndExclusiveEntityTransaction ();
}