Each byte spans eight scan lines (or rows), so the first byte contains the first pixel of row 0-row 7 the least significant bit is row 0, the second byte contains the pixels for the second column rows 0-7 etc.
The image format works with images that are larger (or smaller) than the nxt screen so long as you correctly specify the width and height. leJOS even comes with a program that will convert images from existing formats into this format. Take a look at the sample code it is not that hard, and the bitblt function takes care of most of the hard work... As Sven has already pointed out the Graphics class is what you need. This is not tied to just screen sized items, you can easily use offsets into larger images, or display smaller images on the screen.
If you load your image data into a byte array then wrap it in an Image object using createImage, this will simply reference your byte array and will not copy the data, so you can even change the data after you have created the image (to load new data etc).
If you still do not understand the format, simply create an image and then use the graphics functions to draw a few lines into the image and then dump the contents of the image, that way you can see how the pixels map to the bytes...