project Morrowind, part 2

Today on project Morrowind, we will start turning some horrible binaries into beautiful data structures in the memory space of a Python interpreter. They are still technically binary, but let's not dwell on it too much. Otherwise we will realise we're all made of atoms and will have an existential crisis and that wouldn't be very nice.

I…I don't even see the code. All I see is tree, rock, marshmerrow...

Exposition dump

Elder Scrolls games made by Bethesda Softworks, including Morrowind and its successors (Oblivion, Skyrim and that peculiarly also kind of includes Fallout 3 and 4) store their core game data (that is, maps and locations of various objects, but not textures/audio/meshes) in the ESM (Elder Scrolls Master) format. It's been evolving ever since Morrowind as the Bethesda developers have been adding more and more features to it, but its main idea remains the same: these files are a collection of records of different types.

For example, we can have an NPC_ record, defining a character in the game, which will contain entries for the character's gender, race, AI behaviour etc. It can also have references to other records, for example, the inventory of a character, which refer to ARMO and WEAP records. CELL records describe in-game cells (actual locations) and contain references to, well, everything that is located in that cell, like NPC_, ARMO, WEAP or CONT (containers, e.g. chests). The actual binary format for Morrowind is described very well here and every release of a new Bethesda game promises players lots of fun in reverse engineering their ever so slightly minor alterations to the game data file format.

One clever idea that Bethesda had was making game save files an overlay on the game data files in this format. For example, if you were to kill someone in a certain location (pretty much what usually happens in Elder Scrolls games), your save file would have a redefinition of the CELL record that would list the NPC in question as perished. Sadly, this idea has no relevance to this project, just like lots of other clever ideas, but it's interesting nonetheless.

There are more complications though: cells can be exterior or interior. Exterior cells are square-shaped and are joined together edge-to-edge to create the actual great (dubious) outdoors of Morrowind. With interior cells, all bets are off - each of them resides in its own little reality and is joined to other cells by doors which basically function as teleports in this case. A small house in the exterior cell often is quite a bit larger from the inside, which means that you can't reliably judge where the player actually is when they're indoors.

So if we want to reconstruct a graph of how you can travel around in Morrowind, we have to take care of doors, amongst all other means of movement.

With regards to the Almsivi/Divine Intervention spells, there are special marker objects in every Temple and Imperial fort - this is how the game determines where to teleport the player when they cast a particular spell. It's again easy with the exterior cells (as all markers are located outside), but gets more complicated with interiors. Some people claim Morrowind uses the last exterior cell you've been to (which has some pathological cases - say you use a Guild Teleport that teleports you from the indoors to the indoors again, so casting an Intervention spell will warp you to the closest marker to the first Guild, not the second one) and OpenMW, an open-source reimplementation of the Morrowind engine, tries to fix that by using the closest exterior to you as a reference. My copy of Morrowind behaves the correct way for some reason, so I'll emulate that.

In much better news, if NPCs offer travel services (be it silt strider, boat or Guild teleport), it will be encoded in their record.

All in all, it seems like we want to scrape the hell out of all CELL and NPC_ records, as they contain everything we need for now.

Scraping the hell out of all CELL and NPC_ records

Now, as much as I thought it would be feasible and fun to decode the binary data according to that excellent spec, I still decided to cheat and used Morrowind Enchanted Editor, a low-level editor for ESM files. In particular, I used the "Dump to Text File" function, which turned the unreadable binary mess into a readable ASCII mess.

Meet Todd's Super Tester Guy, presumably made by Todd Howard himself.

This is something we can work with: each entry in the record is on a separate line and is clearly keyed by the subrecord (e.g. FNAM is the full name, RNAM is the race name etc). As a good starting point, we can easily extract just the NPC_ and CELL records and tokenize the data by just converting it into a stream of key-value pairs (so a line NPC_ NAME todd would get turned to a tuple (NAME, todd) since we already know it belongs to an NPC_ record).

(I was going to put source code and explain it, block-by-block, here, but WordPress decided to not be on my side today. I'll post it on GitHub later, promise. I mean, seriously, who the hell converts > to > after a save cycle and then again to &gt?)

In the end, we get something like this:

In [6]: cells[:10]
Out[6]:
[('NAME', ''),
('DATA', '\x02\x00'),
('DATA', '23'),
('DATA', '7'),
('RGNN', "Azura's Coast Region"),
('NAME', ''),
('DATA', '\x02\x00'),
('DATA', '23'),
('DATA', '6'),
('RGNN', "Azura's Coast Region")]

npcs[:10]
Out[7]:
[('NAME', 'player'),
('FNAM', 'player'),
('RNAM', 'Dark Elf'),
('CNAM', 'Acrobat'),
('ANAM', ''),
('BNAM', 'b_n_dark elf_m_head_01'),
('KNAM', 'b_n_dark elf_m_hair_01'),
('NPDT', '1'),
('NPDT', ''),
('NPDT', '')]

Parsing the stream of NPC_ records into a list of NPCs isn't that difficult. I found the neatest way was to pass the stream to a class constructor and allow it to consume as much from it as it needs to initialize itself. But keep in mind that we need to stop parsing when we see the next NPC's NAME subrecord and if we've already consumed that, it's too late, so we need to define an iterator that allows us to peek at the next item without consuming it.

Parsing the list of destinations, one of the Holy Grails that we're looking for, is easy too - just look at this example (which is one of the places that Todd's Super Tester Guy can take us):

NPC_    DODT    1822.641
NPC_    DODT    -231.5323
NPC_    DODT    -292.9501
NPC_    DODT    0
NPC_    DODT    0
NPC_    DODT    0.5
NPC_    DNAM    ToddTest

We literally get a list of 6 numbers: the x, y, z coordinates and the angle (which we don't really care about). Sometimes there's also a DNAM subrecord if we're in an interior cell.

Add a repr method and we can see a list of actual NPCs!

npcs[:10]
Out[15]:
[NPC (player, player, Dark Elf, Acrobat),
 NPC (todd, Todd's Super Tester Guy, Dark Elf, Guard),
 NPC (Imperial Guard, Guard, Imperial, Guard),
 NPC (agronian guy, Tarhiel, Wood Elf, Enchanter),
 NPC (murberius harmevus, Murberius Harmevus, Imperial, Warrior),
 NPC (madres navur, Madres Navur, Dark Elf, Acrobat),
 NPC (farusea salas, Farusea Salas, Dark Elf, Commoner),
 NPC (erval, Erval, Wood Elf, Commoner),
 NPC (Dralas Gilu, Dralas Gilu, Dark Elf, Rogue),
 NPC (uulernil, Uulernil, High Elf, Smith)]

npcs[1].inventory
Out[16]: 
[('steel battle axe', 1),
 ('glass war axe', 1),
 ('steel mace', 1),
 ('chitin guantlet - right', 1),
 ('chitin guantlet - left', 1),
 ('chitin boots', 1),
 ('chitin greaves', 1),
 ('chitin pauldron - right', 1),
 ('chitin pauldron - left', 1),
 ('chitin cuirass', 1)]

(Interestingly, there are three problems with the "agronian guy" named Tarhiel over there. Firstly, that race name is spelled Argonian. Secondly, he's not an Argonian, he's a Wood Elf. And finally, he has some mental issues but also talents.

Next time on project Morrowind, we'll move on to trying to decode CELL data, which has some more peculiarities (like the fact that it contains most of what the player can perceive). But now that we've gotten through the background and the boring bits, we will start moving faster and might even get around to constructing an actual travel graph!