OBML (Opera Binary Markup Language) files are self-contained, rendered versions of HTML documents generated by the Presto v2 engine. They are static, containing pixel-positioned regions adapted for a specific device's screen size & font metrics. (Thus OBML documents generated for one device tend to look slightly 'off' everywhere else, and a perfect rendering is impossible without knowing the original device.)
Over time, various OBML versions were used, and each Opera Mini version is only compatible with one OBML format, thus an upgrade might leave old saved pages unreadable. (The OBML version used can be seen by visiting debug:
.)
Most saved pages use OBML v12, v13, v15, or v16; I haven't investigated the format used by earlier "modded" Opera Mini versions which had this feature added unofficially.
OBML uses these primitive types:
- byte – unsigned integer (1 byte)
- short – signed integer (2 bytes, big-endian)
- medium – signed integer (3 bytes, big-endian)
- blob –
{ length: short, data: byte[length] }
- char – a byte containing an ASCII character
- string – a blob containing UTF-8 encoded text
It also has a few more complex types:
url := string
Each OBML file has a "base URL", essentially a reusable prefix. When other URLs start with a null byte, it is to be replaced with the global prefix. For example, if the base is http://example.com/dir
and you have an URL value \x00/index.html
, it expands to http://example.com/dir/index.html
.
color := { a: byte, r: byte, g: byte, b: byte }
Colors are stored as ARGB tuples, with one byte (0–255) per component.
coords := { x: short, y: medium }
Coordinates are stored as a short for the X position followed by a medium for the Y position. The origin (0, 0) is in the top-left corner.
In format versions ≤ 13, all coordinates are absolute.
In format version ≥ 15, "size/dimension" coordinates are absolute, but "position" coordinates are relative to the previous position coordinate and may be negative. Those will be indicated as coords(relative)
.
(Note that only relative coordinates update the "last position" state. Sizes/dimensions are always stored as absolute coordinates and do not affect relative coordinates.)
The file starts with:
header := {
(if version >= 15) {
fake_file_size: medium = 0x02d355
fake_version: byte = 16
}
file_size: medium
version: byte
page_size: coords
(if version == 16) {
unknown: byte[3] // always S\x00\x00
}
unknown: short // always -1
page_title: string
unknown: blob
page_url_base: string
page_url: url
(if version >= 15) {
unknown: byte[6]
}
(if 6 < version <= 13) {
unknown: byte[5]
}
(if version == 6) {
unknown: byte[1]
}
metadata: chunk[]
content: chunk[]
}
In v≥15, the initial file_size is always 0x02d355 and version is always 16; they're followed by a second identical header containing the real values. The reason for that is unknown.
Note that file_size only includes the bytes following it. It doesn't include the field's own size, nor the preceding fields.
The unknown blob seems to always start with C\x10\x10...
on v15, empty otherwise.
The header is then followed by the "metadata" section and the "content" section, both composed of tagged chunks.
This section consists of several chunks. Each chunk starts with type: char
(an ASCII letter), followed by variable amount of fields.
chunk := {
type: char
...
}
Only seen in v≥15:
C_chunk := {
type: char = 'C'
unknown: byte[23]
}
Appears to be extensible metadata fields which contain further subtypes.
M_chunk := {
type: char = 'M'
subtype: char
unknown: byte[1] // always 0x00
data: blob
}
Secure connection (TLS) information.
M_tls_chunk := {
type: char = 'M'
subtype: char = 'S'
// This might be inaccurate
unknown: byte[1]
data: blob containing {
unknown: byte[6]
cert_expiry: string
secure_status: string
tls_details: string
cert_common_name: string
}
}
Contains information about hyperlinks on the page.
S_chunk := {
type: char = 'S'
links_size: medium
links: byte[link_size] // sub-section containing its own chunks
}
This section consists of chunks and appears to be a sub-section of the preceding 'S' chunk.
Each of these chunks seems to store the <option>
choices for a HTML <select>
widget.
links_null_chunk := {
type: char = 0x00
unknown: byte
count: byte
choices: array[count] of { id: string, label: string }
}
Most other chunks in this sub-section define 'regions' and share the same data format.
links_region_chunk := {
type: char
box_count: byte
box_coords: array[box_count] of rectangle
(if version >= 15) {
link_target: blob
unknown: byte[2] // always \x01\x74
link_type: blob
}
(if version == 13) {
link_target: blob
unknown: byte[2]
link_type: blob
}
(if version <= 12) {
link_type: blob
link_target: blob
}
}
rectangle := {
(if version >= 15) { pos: coords(relative), size: coords }
(if version <= 13) { pos: coords, size: coords }
}
link_type
, if non-empty, seems to be a string with the MIME type.
link_target
can be an url, a string, or an unknown blob.
Unknown-purpose region. (Sometimes the first field contains a URL but sometimes it's empty. The second field seems to always contain 1 medium
inside.)
Link region pointing to an internal anchor within the same page (e.g. <a href="#top">
).
At least in v12, link_type
is empty while link_target
is encoded as:
link_target: blob containing {
target_scroll_coords: coords
target_anchor_name: string
}
Unknown-purpose region usually near the top-left corner. (It seems the target contains array[3] of medium
.)
Unknown-purpose region. (The target field contains things like -1:2iS/5/sa
.)
Image region (link_target: url
links to the original image). Note that this doesn't actually render an image, only define a link region for the original URL. The image itself is drawn by the content section.
Link region (link_target: url
is the link target). Note that this isn't directly associated with link text in any way; it merely defines the 'active' rectangle overlayed on top of the text.
URLs starting with b:
seem to be JavaScript links.
Link region similar to 'L' but containing a "platform" link (usually mailto:
).
Link region similar to 'L' but meant to trigger a file download dialog (for image "Save" buttons). The target URL is hosted by the Opera Mini proxy, and expires after some time.
Link region similar to 'w' but meant to open the target in platform's native web browser (for image "Open" buttons).
Define a filled rectangle (a "box"); used to draw background colors, borders, other lines (including even link underlines).
B_chunk := {
type: char = 'B'
(if version >= 15) {
pos: coords(relative)
size: coords
}
(if version <= 13) {
pos: coords
size: coords
}
fill: color
}
Form fields.
F_chunk := {
type: char = 'F'
pos: coords(relative)
size: coords
foreground: color
field_type: char
unknown: byte
field_id: string
field_value: string
(if version >= 15) {
unknown: byte[5]
}
(if version <= 13) {
unknown: byte[3]
}
}
Field types:
a
is a multi-line input box (textarea)c
is a checkboxr
is a radio buttonx
is a single-line input boxs
is a select drop-down
Image.
I_chunk := {
type: char = 'I'
pos: coords(relative)
size: coords
fill: color
(if version == 16) {
file_addr: medium
something_count: byte
something: array [something_count] of { unknown: byte, unknown: blob }
}
(if version == 15) {
unknown: byte[14]
}
(if version <= 13) {
unknown: byte[3]
file_addr: medium
}
}
fill is the image's average color, for use as placeholder when images are disabled/loading.
file_addr is the byte offset within the 'S'-chunk, relative to the end of data_size.
In v6, file_addr is relative to the end of version field in the initial header (i.e. offset is always 3).
L_chunk := {
type: char = 'L'
unknown1: medium
unknown2: medium
unknown3: medium
}
Unknown, but the 2nd field seems to refer to the start of "Links" subsection.
M_chunk := {
type: char = 'M'
unknown: byte[2]
unknown: blob
}
Embedded images.
S_chunk := {
type: char = 'S'
data_size: medium
file_data: blob[]
}
The blob count isn't given, so keep reading blobs until you've consumed at least data_size bytes.
Each blob contains an image (PNG or JPEG) to be drawn in all 'I'-chunks whose file_addr matches the blob's offset relative to the end of data_size.
Text.
T_chunk := {
type: char = 'T'
(if version >= 15) {
pos: coords(relative)
size: coords
}
(if version <= 13) {
pos: coords
size: coords
}
foreground: color
(if version == 16) {
unknown: byte
font: byte
something_count: byte
something: array [something_count] of { unknown: byte, unknown: blob }
}
(if version <= 15) {
font: byte
}
text: string
}
In v16, it seems that the unknown pairs define some sort of links.
In font, the least-significant bit indicates bold text. With the 'bold' bit masked out, the remaining value indicates the font size:
0
– medium (approx. 11px)2
– large (approx. 12px)4
– extra large (approx. 13px)6
– small (approx. 10px)
The following CSS results in an acceptable rendering:
font-family: sans-serif;
line-height: 1.1;
white-space: pre;
Unknown. Rare. The only occurence seen contains byte[6]
.
Form buttons do not have special representation, they just consist of an image + text + link region, using special b:…
URLs.
Input fields and select dropdowns haven't been fully researched yet.