Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add base32 decode base32 encode #1

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
171 changes: 171 additions & 0 deletions ext/standard/base32.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
/*
+----------------------------------------------------------------------+
| Copyright (c) The PHP Group |
+----------------------------------------------------------------------+
| This source file is subject to version 3.01 of the PHP license, |
| that is bundled with this package in the file LICENSE, and is |
| available through the world-wide-web at the following url: |
| https://www.php.net/license/3_01.txt |
| If you did not receive a copy of the PHP license and are unable to |
| obtain it through the world-wide-web, please send a note to |
| [email protected] so we can mail you a copy immediately. |
+----------------------------------------------------------------------+
| Authors: Ignace Nyamagana Butera <[email protected]> |
+----------------------------------------------------------------------+
*/

#include <string.h>

#include "php.h"
#include "base32.h"

#define PHP_BASE32_ASCII "ABCDEFGHIJKLMNOPQRSTUVWXYZ234567"
#define PHP_BASE32_HEX "0123456789ABCDEFGHIJKLMNOPQRSTUV"

/* reserved invalid characters for padding or alphabet. */
static const char reserved[4] = "\r\n\t ";
static const char base32_pad = '=';

/* {{{ functions to allow encoding a string using RFC4648 base32 algorithm. */
PHP_FUNCTION(base32_encode)
{
zend_string *decoded, *padding = base32_pad, *alphabet = PHP_BASE32_ASCII;
int offset = 0, bitLen = 0, val = 0, len, shift;

ZEND_PARSE_PARAMETERS_START(1, 3)
Z_PARAM_STR(decoded)
Z_PARAM_OPTIONAL
Z_PARAM_STR(alphabet)
Z_PARAM_STR(padding)
ZEND_PARSE_PARAMETERS_END();

if (ZSTR_LEN(padding) != 1) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably move all the things below into a php_base32_encode() function, just like is done for base64 in the PHP_FUNCTION(base64_encode)/PHP_FUNCTION(base64_decode) functions in ext/standard/base64.c. You will also have to add an equivalent PHPAPI extern zend_string *php_base64_encode(...) declaration in base32.h (again, see how base64.h has done that).

By splitting it out, you make this function available to other PHP extensions as well.

zend_argument_value_error(3, "The padding character must be a single byte character.");
RETURN_THROWS();
}

if (strcspn(reserved, ZSTR_VAL(padding)) != 4) {
zend_argument_value_error(1, "The padding character can not be a reserved character.");
RETURN_THROWS();
}

if (ZSTR_LEN(alphabet) != 32) {
zend_argument_value_error(1, "The alphabet must be a 32 bytes long string.");
RETURN_THROWS();
}

zend_string *upper_alpha = zend_string_toupper(alphabet);
zend_string *upper_padding = zend_string_toupper(padding);
zend_string *reserved_chars = zend_string_concat2(reserved, 4, ZSTR_VAL(upper_padding), 1);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These three calls, allocated new memory. That means that this function should also be responsible for freeing them, before returning with zend_string_release(). You return in lines 63, 71, 80, and 106. It is not unheard of to use goto failure, with a failure: block and the end of the function for this.

You could also only call zend_string_* just before they are actually needed. The new upper_padding and reserved_chars aren't used in the section on lines 61-63, so they could come below.

You can find this kind of memory leak, by compiling PHP in debug mode with the --enable-debug flag. When running your tests, you should then see warnings.

Alternatively, you can run your tests with php run-tests.php TESTS="-m ext/standard/tests/url" — the -m will use valgrind (which you need to install) to run memory analyses.


if (strcspn(ZSTR_VAL(upper_alpha), reserved_chars) != 32) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a PHP variant of strcspn, called php_strcspn (defined in ext/standard/php_string.h). I think because not all platforms might have it. The API is slightly different, but it is NULL safe:

PHPAPI size_t php_strspn(const char *s1, const char *s2, const char *s1_end, const char *s2_end);

zend_argument_value_error(1, "The alphabet can not contain a reserved character or the padding character.");
RETURN_THROWS();
}

smart_str unique_chars = {0};
for (int i = 0; i < 32; i++) {
char c = upper_alpha[i];
if (strstr(unique_chars, c)) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a NULL-safe PHP variant of strstr too: zend_memnstr(const char *haystack, const char *needle, size_t needle_len, const char *end);

zend_argument_value_error(1, 'The alphabet must only contain unique characters.');
RETURN_THROWS();
}

smart_str_appends(&unique_chars, c);
}
smart_str_free(&unique_chars);

Copy link
Member Author

@nyamsprod nyamsprod Apr 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference the part that validates the padding and the alphabet should be extract in an internal function as this part is identical for both base32_encode and base32_decode functions,
So I believe one or two inline static functions should be used if I understood correctly 🤔

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you should use a static function for that, but marking it as inline is probably not needed, as compilers usually decide this on their own, for only small functions.

len = ZSTR_LEN(decoded);
if (len == 0) {
RETURN_EMPTY_STRING();
}

char chars[] = ZSTR_VAL(decoded);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's probably usually defined as: const char *chars = ZSTR_VAL(decoded);.

The [] isn't really wrong, but not common.

The const indicates that the function is not going to modify the contents, which is a good hint to the compiler (and reader of code).


smart_str encoded = {0};
while (offset < len || bitLen != 0) {
if (bitLen < 5) {
bitLen += 8;
offset++;
val = (val << 8) + chars[offset];
}

shift = bitLen - 5;
if (offset - (bitLen > 8 ? 1 : 0) > len && 0 === val) {
smart_str_appends(&encoded, padding);
} else {
smart_str_appends(&encoded, alphabet[val >> shift]);
}

val &= ((1 << shift) - 1);
bitLen -= 5;
}

zend_string *ret_val = smart_str_extract(encoded);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You haven't freed the encoded smart_str, like you did above for with smart_str_free(&unique_chars);.

In this case, that is right, as RETURN_STR(ret_val) doesn't create a duplicate.

Usually, it's folded into one line though:

RETURN_STR(smart_str_extract(&buf));


RETURN_STR(ret_val);
}
/* }}} */

/* {{{ functions to allow decoding a base32 encoded string using RFC4648 base32 algorithm. */
PHP_FUNCTION(base32_decode)
{
zend_string *encoded, *padding = base32_pad, *alphabet = PHP_BASE32_ASCII;
int offset = 0, bitLen = 0, val = 0, len, shift;
bool strict;

ZEND_PARSE_PARAMETERS_START(1, 4)
Z_PARAM_STR(encoded)
Z_PARAM_OPTIONAL
Z_PARAM_STR(alphabet)
Z_PARAM_STR(padding)
Z_PARAM_BOOL(strict)
ZEND_PARSE_PARAMETERS_END();

if (ZSTR_LEN(padding) != 1) {
zend_argument_value_error(3, "The padding character must be a single byte character.");
RETURN_THROWS();
}

if (strcspn(reserved, ZSTR_VAL(padding)) != 4) {
zend_argument_value_error(1, "The padding character can not be a reserved character.");
RETURN_THROWS();
}

if (ZSTR_LEN(alphabet) != 32) {
zend_argument_value_error(1, "The alphabet must be a 32 bytes long string.");
RETURN_THROWS();
}

zend_string *upper_alpha = zend_string_toupper(alphabet);
zend_string *upper_padding = zend_string_toupper(padding);
zend_string *reserved_chars = zend_string_concat2(reserved, 4, ZSTR_VAL(upper_padding), 1);

if (strcspn(ZSTR_VAL(upper_alpha), reserved_chars) != 32) {
zend_argument_value_error(1, "The alphabet can not contain a reserved character or the padding character.");
RETURN_THROWS();
}

smart_str unique_chars = {0};
for (int i = 0; i < 32; i++) {
char c = upper_alpha[i];
if (strstr(unique_chars, c)) {
zend_argument_value_error(1, 'The alphabet must only contain unique characters.');
RETURN_THROWS();
}

smart_str_appends(&unique_chars, c);
}
smart_str_free(&unique_chars);

len = ZSTR_LEN(encoded);
if (len == 0) {
RETURN_EMPTY_STRING();
}

//@todo adding encoded validation
//@todo adding RFC4648 decoding algorithm

RETURN_STR(encoded);
}
/* }}} */
23 changes: 23 additions & 0 deletions ext/standard/base32.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
/*
+----------------------------------------------------------------------+
| Copyright (c) The PHP Group |
+----------------------------------------------------------------------+
| This source file is subject to version 3.01 of the PHP license, |
| that is bundled with this package in the file LICENSE, and is |
| available through the world-wide-web at the following url: |
| https://www.php.net/license/3_01.txt |
| If you did not receive a copy of the PHP license and are unable to |
| obtain it through the world-wide-web, please send a note to |
| [email protected] so we can mail you a copy immediately. |
+----------------------------------------------------------------------+
| Authors: Ignace Nyamagana Butera <[email protected]> |
+----------------------------------------------------------------------+
*/

#ifndef BASE32_H
#define BASE32_H

PHP_FUNCTION(base32_decode);
PHP_FUNCTION(base32_encode);

#endif
27 changes: 27 additions & 0 deletions ext/standard/basic_functions.stub.php
Original file line number Diff line number Diff line change
Expand Up @@ -259,6 +259,19 @@
*/
const PHP_QUERY_RFC3986 = UNKNOWN;

/**
* @var string
* @cvalue PHP_BASE32_ASCII
*/
const PHP_BASE32_ASCII = UNKNOWN;

/**
* @var string
* @cvalue PHP_BASE32_HEX
*/
const PHP_BASE32_HEX = UNKNOWN;


/**
* @var float
* @cvalue M_E
Expand Down Expand Up @@ -1924,6 +1937,20 @@ function array_combine(array $keys, array $values): array {}
/** @compile-time-eval */
function array_is_list(array $array): bool {}

/* base32.c */

/**
* @compile-time-eval
* @refcount 1
*/
function base32_encode(string $decoded, string $alphabet = PHP_BASE32_ASCII, string $padding = '='): string {}

/**
* @compile-time-eval
* @refcount 1
*/
function base32_decode(string $encoded, string $alphabet = PHP_BASE32_ASCII, string $padding = '=', bool $strict = false): string|false {}

/* base64.c */

/**
Expand Down
21 changes: 20 additions & 1 deletion ext/standard/basic_functions_arginfo.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading