Skip to content

Latest commit

 

History

History
439 lines (327 loc) · 10.3 KB

obml.md

File metadata and controls

439 lines (327 loc) · 10.3 KB

Opera Mini OBML file format

OBML (Opera Binary Markup Language) files are self-contained, rendered versions of HTML documents generated by the Presto v2 engine. They are static, containing pixel-positioned regions adapted for a specific device's screen size & font metrics. (Thus OBML documents generated for one device tend to look slightly 'off' everywhere else, and a perfect rendering is impossible without knowing the original device.)

Over time, various OBML versions were used, and each Opera Mini version is only compatible with one OBML format, thus an upgrade might leave old saved pages unreadable. (The OBML version used can be seen by visiting debug:.)

Most saved pages use OBML v12, v13, v15, or v16; I haven't investigated the format used by earlier "modded" Opera Mini versions which had this feature added unofficially.

Data types

OBML uses these primitive types:

  • byte – unsigned integer (1 byte)
  • short – signed integer (2 bytes, big-endian)
  • medium – signed integer (3 bytes, big-endian)
  • blob – { length: short, data: byte[length] }
  • char – a byte containing an ASCII character
  • string – a blob containing UTF-8 encoded text

It also has a few more complex types:

url

url := string

Each OBML file has a "base URL", essentially a reusable prefix. When other URLs start with a null byte, it is to be replaced with the global prefix. For example, if the base is http://example.com/dir and you have an URL value \x00/index.html, it expands to http://example.com/dir/index.html.

color

color := { a: byte, r: byte, g: byte, b: byte }

Colors are stored as ARGB tuples, with one byte (0–255) per component.

coords

coords := { x: short, y: medium }

Coordinates are stored as a short for the X position followed by a medium for the Y position. The origin (0, 0) is in the top-left corner.

In format versions ≤ 13, all coordinates are absolute.

In format version ≥ 15, "size/dimension" coordinates are absolute, but "position" coordinates are relative to the previous position coordinate and may be negative. Those will be indicated as coords(relative).

(Note that only relative coordinates update the "last position" state. Sizes/dimensions are always stored as absolute coordinates and do not affect relative coordinates.)

Header

The file starts with:

header := {
	(if version >= 15) {
		fake_file_size: medium = 0x02d355
		fake_version: byte     = 16
	}
	file_size: medium
	version: byte
	page_size: coords
	(if version == 16) {
		unknown: byte[3]      // always S\x00\x00
	}
	unknown: short                // always -1
	page_title: string
	unknown: blob
	page_url_base: string
	page_url: url
	(if version >= 15) {
		unknown: byte[6]
	}
	(if 6 < version <= 13) {
		unknown: byte[5]
	}
	(if version == 6) {
		unknown: byte[1]
	}
	metadata: chunk[]
	content: chunk[]
}

In v≥15, the initial file_size is always 0x02d355 and version is always 16; they're followed by a second identical header containing the real values. The reason for that is unknown.

Note that file_size only includes the bytes following it. It doesn't include the field's own size, nor the preceding fields.

The unknown blob seems to always start with C\x10\x10... on v15, empty otherwise.

The header is then followed by the "metadata" section and the "content" section, both composed of tagged chunks.

Metadata section

This section consists of several chunks. Each chunk starts with type: char (an ASCII letter), followed by variable amount of fields.

chunk := {
	type: char
	...
}

Metadata: 'C' chunks

Only seen in v≥15:

C_chunk := {
	type: char = 'C'
	unknown: byte[23]
}

Metadata: 'M' chunks

Appears to be extensible metadata fields which contain further subtypes.

M_chunk := {
	type: char = 'M'
	subtype: char
	unknown: byte[1]  // always 0x00
	data: blob
}

'S' sub-type

Secure connection (TLS) information.

M_tls_chunk := {
	type: char = 'M'
	subtype: char = 'S'
	// This might be inaccurate
	unknown: byte[1]
	data: blob containing {
		unknown: byte[6]
		cert_expiry: string
		secure_status: string
		tls_details: string
		cert_common_name: string
	}
}

Metadata: 'S' chunks

Contains information about hyperlinks on the page.

S_chunk := {
	type: char = 'S'
	links_size: medium
	links: byte[link_size] // sub-section containing its own chunks
}

Links sub-section

This section consists of chunks and appears to be a sub-section of the preceding 'S' chunk.

Links: '\x00' chunks

Each of these chunks seems to store the <option> choices for a HTML <select> widget.

links_null_chunk := {
	type: char = 0x00
	unknown: byte
	count: byte
	choices: array[count] of { id: string, label: string }
}

Links: all region chunks

Most other chunks in this sub-section define 'regions' and share the same data format.

links_region_chunk := {
	type: char
	box_count: byte
	box_coords: array[box_count] of rectangle
	(if version >= 15) {
		link_target: blob
		unknown: byte[2] // always \x01\x74
		link_type: blob
	}
	(if version == 13) {
		link_target: blob
		unknown: byte[2]
		link_type: blob
	}
	(if version <= 12) {
		link_type: blob
		link_target: blob
	}
}
rectangle := {
	(if version >= 15) { pos: coords(relative), size: coords }
	(if version <= 13) { pos: coords, size: coords }
}

link_type, if non-empty, seems to be a string with the MIME type.

link_target can be an url, a string, or an unknown blob.

Links: 'I' chunks

Unknown-purpose region. (Sometimes the first field contains a URL but sometimes it's empty. The second field seems to always contain 1 medium inside.)

Links: 'N' chunks

Link region pointing to an internal anchor within the same page (e.g. <a href="#top">).

At least in v12, link_type is empty while link_target is encoded as:

link_target: blob containing {
	target_scroll_coords: coords
	target_anchor_name: string
}

Links: 'C' chunks

Unknown-purpose region usually near the top-left corner. (It seems the target contains array[3] of medium.)

Links: 'S' chunks

Unknown-purpose region. (The target field contains things like -1:2iS/5/sa.)

Links: 'i' chunks

Image region (link_target: url links to the original image). Note that this doesn't actually render an image, only define a link region for the original URL. The image itself is drawn by the content section.

Links: 'L' chunks

Link region (link_target: url is the link target). Note that this isn't directly associated with link text in any way; it merely defines the 'active' rectangle overlayed on top of the text.

URLs starting with b: seem to be JavaScript links.

Links: 'P' chunks

Link region similar to 'L' but containing a "platform" link (usually mailto:).

Links: 'w' chunks

Link region similar to 'L' but meant to trigger a file download dialog (for image "Save" buttons). The target URL is hosted by the Opera Mini proxy, and expires after some time.

Links: 'W' chunks

Link region similar to 'w' but meant to open the target in platform's native web browser (for image "Open" buttons).

Content section

Content: 'B' chunks

Define a filled rectangle (a "box"); used to draw background colors, borders, other lines (including even link underlines).

B_chunk := {
	type: char = 'B'
	(if version >= 15) {
		pos: coords(relative)
		size: coords
	}
	(if version <= 13) {
		pos: coords
		size: coords
	}
	fill: color
}

Content: 'F' chunks

Form fields.

F_chunk := {
	type: char = 'F'
	pos: coords(relative)
	size: coords
	foreground: color
	field_type: char
	unknown: byte
	field_id: string
	field_value: string
	(if version >= 15) {
		unknown: byte[5]
	}
	(if version <= 13) {
		unknown: byte[3]
	}
}

Field types:

  • a is a multi-line input box (textarea)
  • c is a checkbox
  • r is a radio button
  • x is a single-line input box
  • s is a select drop-down

Content: 'I' chunks

Image.

I_chunk := {
	type: char = 'I'
	pos: coords(relative)
	size: coords
	fill: color
	(if version == 16) {
		file_addr: medium
		something_count: byte
		something: array [something_count] of { unknown: byte, unknown: blob }
	}
	(if version == 15) {
		unknown: byte[14]
	}
	(if version <= 13) {
		unknown: byte[3]
		file_addr: medium
	}
}

fill is the image's average color, for use as placeholder when images are disabled/loading.

file_addr is the byte offset within the 'S'-chunk, relative to the end of data_size.

In v6, file_addr is relative to the end of version field in the initial header (i.e. offset is always 3).

Content: 'L' chunks

L_chunk := {
	type: char = 'L'
	unknown1: medium
	unknown2: medium
	unknown3: medium
}

Unknown, but the 2nd field seems to refer to the start of "Links" subsection.

Content: 'M' chunks

M_chunk := {
	type: char = 'M'
	unknown: byte[2]
	unknown: blob
}

Content: 'S' chunks

Embedded images.

S_chunk := {
	type: char = 'S'
	data_size: medium
	file_data: blob[]
}

The blob count isn't given, so keep reading blobs until you've consumed at least data_size bytes.

Each blob contains an image (PNG or JPEG) to be drawn in all 'I'-chunks whose file_addr matches the blob's offset relative to the end of data_size.

Content: 'T' chunks

Text.

T_chunk := {
	type: char = 'T'
	(if version >= 15) {
		pos: coords(relative)
		size: coords
	}
	(if version <= 13) {
		pos: coords
		size: coords
	}
	foreground: color
	(if version == 16) {
		unknown: byte
		font: byte
		something_count: byte
		something: array [something_count] of { unknown: byte, unknown: blob }
	}
	(if version <= 15) {
		font: byte
	}
	text: string
}

In v16, it seems that the unknown pairs define some sort of links.

In font, the least-significant bit indicates bold text. With the 'bold' bit masked out, the remaining value indicates the font size:

  • 0 – medium (approx. 11px)
  • 2 – large (approx. 12px)
  • 4 – extra large (approx. 13px)
  • 6 – small (approx. 10px)

The following CSS results in an acceptable rendering:

font-family: sans-serif;
line-height: 1.1;
white-space: pre;

Content: 'z' chunks

Unknown. Rare. The only occurence seen contains byte[6].

Miscellaneous notes

Forms

Form buttons do not have special representation, they just consist of an image + text + link region, using special b:… URLs.

Input fields and select dropdowns haven't been fully researched yet.