Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thanks for the tool. Question #25

Open
otheus opened this issue Jul 8, 2021 · 1 comment
Open

Thanks for the tool. Question #25

otheus opened this issue Jul 8, 2021 · 1 comment
Labels

Comments

@otheus
Copy link

otheus commented Jul 8, 2021

I really appreciate what you have done here. I have used stream-parser on an IBD table-only file, and it returned 3 index page-files. I used c-parser on each one. Only the first had valid data. The second and third were completely garbled (the field data was garbled). I used the -6 option, and both the -U and -D options.

Can you explain why sometimes the output is garbled?

Could you update the README or wiki documentation to explain the difference between the c_parser -U and -D options? The latter results in more corrupted output.

In my runs, I see a lot of errors like:

-- #####CannotOpen_./0000000368541696.page;
-- print_field_value_with_external(): open(): No such file or directory

Here is the table structure. Perhaps this provides insights?

CREATE TABLE `xxxx` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `book_datetime` datetime DEFAULT NULL,
  `start_datetime` datetime DEFAULT NULL,
  `end_datetime` datetime DEFAULT NULL,
  `notes` mediumtext COLLATE utf8mb4_german2_ci,
  `hash` mediumtext COLLATE utf8mb4_german2_ci,
  `is_unavailable` tinyint(4) DEFAULT '0',
  `id_users_provider` int(11) DEFAULT NULL,
  `id_users_customer` int(11) DEFAULT NULL,
  `id_services` int(11) DEFAULT NULL,
  `id_google_calendar` mediumtext COLLATE utf8mb4_german2_ci,
  `appointment_type` varchar(40) COLLATE utf8mb4_german2_ci DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `id_users_customer` (`id_users_customer`),
  KEY `id_services` (`id_services`),
  KEY `id_users_provider` (`id_users_provider`),
) ENGINE=InnoDB AUTO_INCREMENT=8869 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_german2_ci;

(constraints removed)

@akuzminsky
Copy link
Member

Could you update the README or wiki documentation to explain the difference between the c_parser -U and -D options? The latter results in more corrupted output.

An InnoDB records has a is_deleted flag in its header. The tool can be instructed to look for records where the flag is set (-D option), where this flag is unset (-U option), or ignore the flag which is the default behavior.

If the -D option is not specified c_parser tries to validate the page. Each record has a pointer to the next one. c_parser follows the pointers and if it can travel from the infimum record to the supremum record, then the page is considered non-corrupt.
If the -D option is specified assumes the page is corrupt and scans the page byte by byte.
When c_parser scans the page byte by byte many false matches are possible (that's why you see many garbage records with -D). To prevent the false matches one should use field values constraints (filters).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants