Thanks for the tool. Question #25

otheus · 2021-07-08T13:20:49Z

I really appreciate what you have done here. I have used stream-parser on an IBD table-only file, and it returned 3 index page-files. I used c-parser on each one. Only the first had valid data. The second and third were completely garbled (the field data was garbled). I used the -6 option, and both the -U and -D options.

Can you explain why sometimes the output is garbled?

Could you update the README or wiki documentation to explain the difference between the c_parser -U and -D options? The latter results in more corrupted output.

In my runs, I see a lot of errors like:

-- #####CannotOpen_./0000000368541696.page;
-- print_field_value_with_external(): open(): No such file or directory

Here is the table structure. Perhaps this provides insights?

CREATE TABLE `xxxx` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `book_datetime` datetime DEFAULT NULL,
  `start_datetime` datetime DEFAULT NULL,
  `end_datetime` datetime DEFAULT NULL,
  `notes` mediumtext COLLATE utf8mb4_german2_ci,
  `hash` mediumtext COLLATE utf8mb4_german2_ci,
  `is_unavailable` tinyint(4) DEFAULT '0',
  `id_users_provider` int(11) DEFAULT NULL,
  `id_users_customer` int(11) DEFAULT NULL,
  `id_services` int(11) DEFAULT NULL,
  `id_google_calendar` mediumtext COLLATE utf8mb4_german2_ci,
  `appointment_type` varchar(40) COLLATE utf8mb4_german2_ci DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `id_users_customer` (`id_users_customer`),
  KEY `id_services` (`id_services`),
  KEY `id_users_provider` (`id_users_provider`),
) ENGINE=InnoDB AUTO_INCREMENT=8869 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_german2_ci;

(constraints removed)

The text was updated successfully, but these errors were encountered:

akuzminsky · 2021-07-08T21:31:43Z

Could you update the README or wiki documentation to explain the difference between the c_parser -U and -D options? The latter results in more corrupted output.

An InnoDB records has a is_deleted flag in its header. The tool can be instructed to look for records where the flag is set (-D option), where this flag is unset (-U option), or ignore the flag which is the default behavior.

If the -D option is not specified c_parser tries to validate the page. Each record has a pointer to the next one. c_parser follows the pointers and if it can travel from the infimum record to the supremum record, then the page is considered non-corrupt.
If the -D option is specified assumes the page is corrupt and scans the page byte by byte.
When c_parser scans the page byte by byte many false matches are possible (that's why you see many garbage records with -D). To prevent the false matches one should use field values constraints (filters).

akuzminsky added the question label Jul 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thanks for the tool. Question #25

Thanks for the tool. Question #25

otheus commented Jul 8, 2021

akuzminsky commented Jul 8, 2021

Thanks for the tool. Question #25

Thanks for the tool. Question #25

Comments

otheus commented Jul 8, 2021

akuzminsky commented Jul 8, 2021