An Introduction to Data Array

Data Array is my unofficial name for the scripting language that Harmonix uses for GH2. While many functions of the game are hardcoded into the executable, many actually aren't—and that means modders can edit them to re-enable existing (say, debug) functions or expose formerly preset controls in the game (like the track speed, which can be set in GH2:DX's Modifiers menu to the player's liking).

Some other functions handled in Data Array (and thus ripe for the picking by intrepid modders) include:

  • Cheat codes, both keyboard and controller
  • Menu flow (menu definitions, the Milo file they use, where they lead, forwards and backwards, sounds on them) and also which menu songs the game tries to load
  • Guitars/finishes
  • Songs, finishes, videos, and character outfits in the store
  • Song definitions (names, artists, custom "performed by" captions, preview times, quickplay guitarists/guitar/venue)
  • Track parameters (colors on the strikeline/notes/sustains, track speed)
  • Any and all localization strings, including defining new ones
  • Which loading tip strings the game tries to load
  • Memory card parameters (which model it uses for the save icon, the colored gradient background when it's selected in the PS2 BIOS, what it names the save)

And I'm sure a ton else. It's through our understanding of DTA hacking that the quality-of-life upgrades in Guitar Hero II Deluxe were even possible. DTA hacking makes adding, rather than replacing, content in GH2 an absolute cinch.

Data Array is stored in one of two file types, either DTA, which is the plain text of the script, and DTB, which is a tokenized (or encrypted) form of it, and what actually shipped on-disc with the game. Harmonix jokingly said in their very highly recommended "Data-Driven Programming Made Easy" GDC panel from 2005 that this was so players didn't mail them their cheat codes. Of course, if you're gonna do any serious Data Array hacking, you'll want to use dtab to convert them to DTA.

The structure of Data Array

DTAs are, essentially, giant trees of data and scripting. Whereas a traditional programming language will use keywords to define, say, an array:

var fruits = new Array ();
fruits[0] = "Apples"
fruits[1] = "Bananas"
fruits[2] = "Peaches"

Data Array uses symbols:

(fruits ("Apples" "Bananas" "Peaches"))

The symbols used define the function of that node of the tree. Parentheses like () mean arrays, so specifically data the game loads. Curly brackets like {} mean functions, or actual instructions for the game to execute. Angle brackets like [] mean properties or macros.

Of note (and this is why songs.dta has the structure that it does) is that Data Array allows any kind of block as an item in an array, including other arrays, functions, or properties. (This doesn't mean you can do whatever you want wherever you want in any random block of script, but technically, stuff's allowed.) These are, naturally, defined with another set of symbols, often more parentheses.

(characters
   (punk
      (punk1)
      (punk2))
   (alterna
      (alterna1)
      (alterna2))
   (glam
      (glam1)
      (glam2))
   (goth
      (goth2)
      (goth1))
   (metal
      (metal1)
      (metal2))
   (rockabill
      (rockabill1)
      (rockabill2))
   (rock
      (rock2)
      (rock1))
   (deathmetal
      (deathmetal1)
      (deathmetal2))
   (classic (classic))
   (funk1 (funk1))
   (grim (grim)))

Here, you can see the characters array in gh2.dta, which defines all the available character shortnames and which outfits they can use. deathmetal2 is an array inside the deathmetal array, which is an array inside the characters array.

If you're not quite getting the tree metaphor, here's how this looks in the old DTB editor, expanded out:

A tree view inside the DTB Editor

You can see that the first item in each array is the shortname of the array (like characters), and then another array is defined immediately after—hence another set of parentheses and more strings.

This mixture of data and code allows for quicker prototyping, a more reusable engine (since menus and song data can easily be rewritten and switched out without needing to edit the big, nasty C++ project), and a lower barrier of entry to non-programmers like their visual artists. (Fun fact: for their first game FreQuency, Harmonix actually used Python, but found it more than a bit unwieldy, so they came up with their own language in-house.)

Localization quirks (strings versus shortnames in array)

While my first example uses strings inside an array, and while it is technically allowed (songs.dta does it, after all), this isn't often how you'll see data defined. Often, shortnames are used inside an array, and then the game will separately look for plain text names for those shortname, showing the shortname if it's otherwise not found. A good example of this is when you define a new guitar in guitars.dta, but don't give it matching localization strings in locale.dta or locale_milo.dta:

A fully-functional guitar skin, though with no text to go along with it

While this skin works perfectly fine in-game, it's missing localization strings to give it its flavor text.

The game is hardcoded to look for [the name of the guitar]_desc when it displays the description on the "Select Finish" screen. Similarly, the game will look for [the name of the guitar]_shop_desc when you go to buy the finish in the store. Forget one, and you'll only get the shortname. This also means you can have two different descriptions for wherever the finish appears (and Harmonix themselves use this often).

Features of Data Array

This is the more granular stuff, taking a look at the structure of various bits of script and explaining how they work. I'm gonna be picking admittedly fairly easy examples, seeing as I'm not an expert in this stuff like Scott is. Still, you'll get an idea.

Arrays

As said earlier, arrays are defined in Data Array by parentheses, or (). They'll contain one or more items, which can consist of strings, integers, floats, symbols, or functions/other arrays/etc.

Sometimes, the need for a subarray can be a bit ambiguous, say for channel mappings in config/songs.dta. The game will either accept a mono track, in which case you can simply specify the mono channel as the array's second item, or a stereo track, in which case you'll need a subarray. The vols, pans, and cores arrays all have multiple items and thus need subarrays.

(song
   (name songs/aceofspades/aceofspades)
   (tracks
      ((guitar
         (2 3))
      (bass 4)))
      (pans
         (-1.0 1.0 -1.0 1.0 0.0))
      (vols
         (-3.5 -3.5 -3.0 -3.0 0.0))
      (cores
         (-1 -1 1 1 -1))
      (midi_file songs/aceofspades/aceofspades.mid))

Functions

Functions are defined by curly brackets, or {}. These are actual instructions for the game to run, often seen in the various ui scripts for controlling game flow and menu flow. Here's a chunk from ui/game.dta for getting the song name, artist, and caption on screen when you start a song:

(setup_text
   {do
      ($song_text
         {game get_song_text})
      ($artist_text
         {game get_song_artist_text})
      ($song_caption
         {game get_song_caption})
      {do
         ($prefix
            "mtv_campaign_line")
         {mtv_campaign_song_id.view set_showing TRUE}
         {$this set_line $prefix 1 $song_text}
         {$this set_line $prefix 2 $song_caption}
         {$this set_line $prefix 3 $artist_text}}})

During the enter block (when the panel is first displayed), the game runs an if-else to check if the song loaded is a tutorial (which are actually defined as songs in the files):

(enter
   {if_else
      {game is_tutorial_running}
      {$this show_overlay FALSE}
      {$this setup_text}})

If it is, the overlay runs as normal, but it's hidden. Otherwise, the game advances to the setup_text block, as seen above. This runs a do function, which runs a series of additional functions to insert the song name, artist name, and caption into $song_text, $artist_text, and $song_caption, respectively. Another do block to set $prefix to act as a newline character and then prefix each of the variables with said newline character, and the whole thing is passed to the set_line block shortly thereafter in the file, which handles the nitty-gritty of interfacing with the milo.

Properties

Properties are defined in Data Array with angle brackets, or []. These usually involve very broad game states, like [won], [attract_mode], and [intro_complete], which the game can check and set the value of at will. From ui/game.dta:

(intro_start_msg
   ($fast $encore)
   {track_panel intro_start_msg}
   {mtv_overlay_panel show_overlay FALSE}
   {if
      {! $fast}
      {script_task
         (delay 1)
         (units kTaskSeconds)
         (script
            {mtv_overlay_panel show_overlay TRUE})}
            {script_task
               (delay 6)
               (units kTaskSeconds)
               (script
                  {mtv_overlay_panel show_overlay FALSE})}}
            {set
               [intro_complete]
               FALSE}
               kDataUnhandled)

Here, the game gets ready to start the song, setting the song name overlay to false and checking if the venue should play its entire intro animation again or if it should just start the song over (say, during a quick-restart). The game then sets [intro_complete] to FALSE to signify that the intro cinematic has yet to complete. When it does, the game sets [intro_complete] to TRUE and can then, say, check for it when it's deciding whether to do a full song restart or a quick restart.

Variables

Variables are defined in DTA by a dollar sign ($). They're usually global. Variables don't need to be declared before their use.

A special variable, $this, is used pretty constantly throughout the game's scripts, and it refers to the object or panel the function block is currently running under. In the case of the if-then during the song name overlay enter block (as seen above), $this refers to the mtv_overlay_panel panel, which has a block called show_overlay. Thus, in that function, if the if-then returns true, the game sets that block not to run.

Floats and integers

Floats and integers are specified inside arrays as simply their values. See "Arrays" above.

#define statements

#define statements mean macros, which in the context of GH2, simply mean a named representation of a value. These are followed by an array, which can have as many items as needed. Again from ui/game.dta:

#define GAME_PANELS
(midi_loader_panel game TRACK_MASK world_panel track_panel hud mtv_overlay_panel)
			

ui/game.dta provides another great example of a #define in action in DRUMMER_EXPLODE_DELAY:

#define DRUMMER_EXPLODE_DELAY (3.00)

Later, in the game_won_msg block, if the game determines the venue to be the Battle of the Bands venue, encore effects to be on, and the song played to be "Tonight I'm Gonna Rock You Tonight", the game will wait for DRUMMER_EXPLODE_DELAY (or three seconds) before making him explode.

{if_else
   {gamecfg get game_over_sequence}
   {do
      {if
         {&&
         {==
            {game get_venue}
            battle}
         {&&
         {==
            {game want_encore_fx}
            TRUE}
            {==
               {gamecfg get_song}
               tonightimgonna}}}
      {do
         {script_task
         (delay DRUMMER_EXPLODE_DELAY)
         (units kTaskSeconds)
         (script
            {handle
               ($this drummer_explode)}
               {play_sfx drummer_exp} ; not the actual end of the block

#include and #merge statements

#include and #merge are two statements with similar but distinct functions. Both are used to load in a DTA from another in case the data in both should be loaded together at the same time to work properly, though what happens when those DTAs conflict (say, if they try to write the same array) makes the difference.

In short, #merge won't overwrite what's in the host DTA; if they conflict, the values set in the main DTA will take precedence over the merged one. DTAs loaded through #include very much will overwrite values in the host DTA. At the very top of config/gh2.dta:

(system
init
   #include system_script.dta) ; not the actual end of the block

At the start of the init block of system, config/system_script.dta will be loaded in wholesale. While at the bottom of the same file:

#merge ../../../system/run/config/default.dta

The dotdot defaults (which are very, very low-level engine DTAs) are simply merged in. If something in gh2.dta conflicts with something in default.dta, gh2.dta wins out.

In any case, theoretically, one could use these to, say, split out the big mess in ui/eng/locale.dta into smaller files for easier adding and editing of new strings. #include statements can also be used to get the game to load new script outright, say for new features where you'd like to keep your changes separate from the original files.

Comments

Data Array comments come in two forms, single line (and prefixed with a semicolon, or ;), and multi-line, sandwiched between /* and */. At the moment, dtab doesn't support the multi-line ones, but Harmonix used them internally, as seen in the Rock Band PS2 DTAs (system/run/config/objects.dta):

   ; remapping of subdirs and proxies.
   ; Use this temporarily if you need to rename files which are
   ; subdirs.  Rename the files in perforce.  Put the name changes
   ; below, excluding the .milo extension, in the format <oldname> <newname>
   ; Then load up the .milo file(s) subdiring the changed dirs, and resave.
   ; Voila, the subdir name has been changed in the milo file.
   ; You can even do entire subdir trees at a time,
   ; like if a subdirs b subdirs c subdirs d, you can rename all four files,
   ; put the renamings in the list, open them all up in Milo, and save them out again.
   ; KEEP this commented out. If it finds it, it turns all file names into symbols
/*
   (remap_objectdirs
      (theater_lighting theater_geom)
      (theater_01_geom theater_geom)
      (male_bass male_guitar)
   )
*/