What about using BSON or a Base64 encoded binary string in JSON/Hocon?
Of course this makes it more complicated to create songs on a text-file basis, but it would make the files smaller, and you don’t need to think of humans when creating the file format.
Example:
0xABCD - A: Instrument, B: Volume (0-15), CD: Pitch (0-255)
0x00AB - Pause (AB ticks)
Or if you want to do it without binary:
IVVPP (e.g. G25A2): Instrument, Volume, Pitch (e.g. Bass Guitar, Volume = 25/99, A in the 2nd octave)
PTT: Pause (TT ticks)
Then every tick one note would be played, and there could be multiple tracks in a file (e.g. BSON array)
Then you could make a (web?) application which lets you create songs with an easy interface, which would drastically increase the number of songs for your plugin…