Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dialect option: normalize BCD "on-the-fly" when moving from ALPHANUMERIC to NUMERIC #200

Open
wants to merge 1 commit into
base: gcos4gnucobol-3.x
Choose a base branch
from

Conversation

ddeclerck
Copy link
Contributor

@ddeclerck ddeclerck commented Dec 2, 2024

This PR adds a new dialect option to allow on-the-fly "normalization" of BCD when moving from ALPHANUMERIC to NUMERIC. This is needed to properly mimic the GCOS COBOL behavior, for instance moving an ALPHANUMERIC containing "ABC456789}" to a NUMERIC yields "-1234567890" in this implementation (it tries to interpret the source as a PIC S9). See the new test at the end of run_misc.at.

@ddeclerck ddeclerck force-pushed the gcos_move branch 3 times, most recently from 2918afc to 675824c Compare December 5, 2024 15:30
@ddeclerck
Copy link
Contributor Author

@GitMensch

This seems to work. Should probably be improved.

MSYS2 CI failure is caused by different output format for floats (eg. where we have '-1.2345679E+9' in most environments, in MSYS2 we have '-1.2345679E+09') - any idea where this could be adjusted ?

@GitMensch
Copy link
Collaborator

GitMensch commented Dec 5, 2024

We had a similar issue with COMP-2 printing before, the solution was to never "printf" but instead use snprintf and post-adjust, if necessary.

Just FYI: That's the result with MF, after dropping the extension COMP-N (and the invalid index use):

    54     MOVE SRC-ALNUM TO DST-INDEX.
*  49-S*******************************                                       **
**    Illegal use of Index-name or Index Data-item

... with explicit dropping the numeric checks:

MF VC

SRC-ALNUM        : 'ABC456789}'
 -> DST-NUMDISP  : '123456789=-'
 -> DST-NUMEDIT  : '-1 234 567 89='
 -> DST-PACK     : '+123456789='
 -> DST-BIN      : '+1234567801'
 -> DST-BINC     : '+121'
 -> DST-BINS     : '+00633'
 -> DST-BINL     : '+1234567801'
 -> DST-BIND     : '+00000000001234567801'
 -> DST-COMP5    : '+0001234567801'
 -> DST-COMP6    : '123456789='
 -> DST-COMPX    : '01234567801'
 -> DST-FLTSHORT : ' .12345678E 10'
 -> DST-FLTLONG  : ' .123456780100000000E 010'
SRC-ALNUM        : 'ABC456789p'
 -> DST-NUMDISP  : '1234567890-'
 -> DST-NUMEDIT  : '-1 234 567 890'
 -> DST-PACK     : '+1234567890'
 -> DST-BIN      : '+1234567890'
 -> DST-BINC     : '-046'
 -> DST-BINS     : '+00722'
 -> DST-BINL     : '+1234567890'
 -> DST-BIND     : '+00000000001234567890'
 -> DST-COMP5    : '+0001234567890'
 -> DST-COMP6    : '1234567890'
 -> DST-COMPX    : '01234567890'
 -> DST-FLTSHORT : ' .12345679E 10'
 -> DST-FLTLONG  : ' .123456789000000000E 010'

MF VC CHARSET"EBCDIC" (implies SIGN"EBCDIC")

note, also different results if just specifying SIGN"EBCDIC":

SRC-ALNUM        : 'ABC456789}'
 -> DST-NUMDISP  : '1234567890+'
 -> DST-NUMEDIT  : '-1 234 567 890'
 -> DST-PACK     : '+1234567890'
 -> DST-BIN      : '+1234567890'
 -> DST-BINC     : '-046'
 -> DST-BINS     : '+00722'
 -> DST-BINL     : '+1234567890'
 -> DST-BIND     : '+00000000001234567890'
 -> DST-COMP5    : '+0001234567890'
 -> DST-COMP6    : '1234567890'
 -> DST-COMPX    : '01234567890'
 -> DST-FLTSHORT : ' .12345679E 10'
 -> DST-FLTLONG  : ' .123456789000000000E 010'
SRC-ALNUM        : 'ABC456789p'
 -> DST-NUMDISP  : '1234567897+'
 -> DST-NUMEDIT  : '+1 234 567 897'
 -> DST-PACK     : '+1234567897'
 -> DST-BIN      : '+1234567897'
 -> DST-BINC     : '-039'
 -> DST-BINS     : '+00729'
 -> DST-BINL     : '+1234567897'
 -> DST-BIND     : '+00000000001234567897'
 -> DST-COMP5    : '+0001234567897'
 -> DST-COMP6    : '1234567897'
 -> DST-COMPX    : '01234567897'
 -> DST-FLTSHORT : ' .12345679E 10'
 -> DST-FLTLONG  : ' .123456789700000000E 010'

... which is another variant, but that's nearer to the new one in this PR than 3.2 (contains mostly zero).

Enterprise COBOL

... and additional FYI the output of Enterprise COBOL (after changing the FLOAT types to COMP-1/COMP-2 and dropping the BINARY types) [no abort]:

 SRC-ALNUM        : 'ABC456789}'
  -> DST-NUMDISP  : 'ABC456789{'
  -> DST-NUMEDIT  : '+1 234 567 890'
  -> DST-PACK     : '1234567890'
  -> DST-BIN      : '1234567890'
  -> DST-FLTSHORT : ' .12345679E 10'
  -> DST-FLTLONG  : ' .12345678900000000E 10'

ACU 3.2.1

and last (took a while as no copy+paste or wget possible with that VM) on old ACU 3.2.1:

SRC-ALNUM        : 'ABC456789}'
 -> DST-NUMDISP  : 'ABC456789}'
 -> DST-NUMEDIT  : '+A BC4 567 89}'
 -> DST-PACK     : ' 123456789='
 -> DST-BIN      : ' 8994567967'
 -> DST-COMP5    : ' 8994567967'
 -> DST-COMP6    : '123456789='
 -> DST-COMPX    : '  18994567967'
 -> DST-COMPN    : '  18994567967'
 -> DST-FLTSHORT : ' 1.8994567E10'
 -> DST-FLTLONG  : ' 1.8994567967000000E10'
 -> DST-INDEX    : ' 18146398783'
SRC-ALNUM        : 'ABC456789p'
 -> DST-NUMDISP  : 'ABC456789p'
 -> DST-NUMEDIT  : '+A BC4 567 89p'
 -> DST-PACK     : ' 1234567890'
 -> DST-BIN      : ' 8994567954'
 -> DST-COMP5    : '  818453536'
 -> DST-COMP6    : '1234567890'
 -> DST-COMPX    : '  18994567954'
 -> DST-COMPN    : ' 18994567954'
 -> DST-FLTSHORT : ' 1.8994567E10'
 -> DST-FLTLONG  : ' 1.8994567954000000E10'
 -> DST-INDEX    : ' 1814698770'

[and for me to remember: compile with ccbl -Df -o PRN.acu for using float items [alternative -Cv for "IBM numeric format" -no change to the values displayed] and actually do output, and run with TERM=vt100 A_TERMCAP=/opt/acu/etc/a_termcap runcbl ./PRN.acu (compiling with -Ca would do a "direct" DISPLAY but does output binary data, instead of "pretty-printing", which we want).

This PR's result

... and as a comparison the result of this PR so far:

SRC-ALNUM        : 'ABC456789}'
 -> DST-NUMDISP  : '-1234567890'
 -> DST-NUMEDIT  : '-1 234 567 890'
 -> DST-PACK     : '-1234567890'
 -> DST-BIN      : '-1234567890'
 -> DST-BINC     : '+046'
 -> DST-BINS     : '-00722'
 -> DST-BINL     : '-1234567890'
 -> DST-BIND     : '-00000000001234567890'
 -> DST-COMP5    : '-00000000001234567890'
 -> DST-COMP6    : '1234567890'
 -> DST-COMPX    : '1234567890'
 -> DST-COMPN    : '1234567890'
 -> DST-FLTSHORT : '-1.2345679E+9'
 -> DST-FLTLONG  : '-1234567890'
 -> DST-INDEX    : '-1234567890'

@ddeclerck
Copy link
Contributor Author

I see. So it seems BCD is also "normalized" in MF and IBM, except that the sign is mostly ignored (except in MF when mving to NUMERIC-EDITED). Also I don't get why the results for binary in MF have exactly the opposite sign compared to GCOS... And then there are also differences in the way the fields are displayed - but I think this is not related to normalization.

I'm thinking maybe we should have two options, one to normalize the digits, and one for the sign (but then it would have to be more than a YES/NO flag, to account for MF keeping the sign only when moving to NUMEDIC-EDITED).

@GitMensch
Copy link
Collaborator

If I don't miss something, the most visible thing is that this is NOT about BCD (COMP-3/COMP-6) normalization, but about normalization of USAGE DISPLAY, because all environments but ACU show the same data -> normalized and only binary truncated for the binary fields.

I'll do the "review" after some meeting, to comment on the actual changes.

(just FYI - added the ACU results which are different and took too long to get in the first place; I'd ignore that output for now)

@ddeclerck
Copy link
Contributor Author

ddeclerck commented Dec 6, 2024

If I don't miss something, the most visible thing is that this is NOT about BCD (COMP-3/COMP-6) normalization, but about normalization of USAGE DISPLAY

Well, I'd still say it's BCD-related, since we're trying to interpret DISPLAY data as unpacked BCD.
But it's hard to come up with a proper description and self-explanatory flag names 😅.

BTW, it could be interesting to try with an ASCII sign ("p") instead of the EBCDIC sign ("}") in SRC-ALNUM.
Also, I'd be curious to know if those environment would consider a sign that is leading/trailing separate (on GCOS, it seems this DISPLAY data that we're moving to numeric is always interpreted as having a non-separate trailing sign - which is the default anyways for NUMERIC-DISPLAY).

@GitMensch
Copy link
Collaborator

Note: I've added the ASCII p variant above as well.

Concerning the PR: that's effectively a general issue. I've thought that I have broken something there, but this code goes back to at least OC 1.1, my changes only produces the same "intended" result with less instructions.
The general issues seen with inspecting the results from other COBOL vendors and the "old" existing implementation that uses cob_move_alphanum_to_display:

  • the sign is expected to be separate and in the first position; while with "redefined" numeric display data this is normally non-separate trailing
  • as soon as the first error is seen, the target is set to all zero
  • even with -fec=all this raises no exception (with MF it does by default, MF has an option to effectively make that fatal exception [abort] a non-fatal one [data is handled similar to this PR])

The question is how to go on. Using COB_D2I does work (is a bit more expensive, of course), so I tend to use it unconditionally in this function.
This mostly leaves the sign handling. My idea is to first use the current approach (skip all spaces, then check for a sign, if none found check at the last position of the field [effectively what this PR does, but not with an intermediate field]).

Another approach would be to try following COBOL 202x MOVE statement, general rule 7 d 4, which may (I'm not sure) should only be done if the target is neither numeric-display nor numeric-edited (or its national variants if the source is national):

  • define an intermediate unsigned field with no decimal places
  • size+digits = orig_size (or if this is > 31 then 31)
  • data=orig_data (or moved to the right to the last 31 positions, if there are more)
  • then do a numeric move from that intermediate field to the target one

That possibly yields nearly the same result as this PR (nearly because the data will likely always be handled as unsigned).
It would change the handling of leading separate sign of our current implementation (to an error in ascii in general and an error in ebcdic with "+").

What do you think?

@ddeclerck
Copy link
Contributor Author

The question is how to go on. Using COB_D2I does work (is a bit more expensive, of course), so I tend to use it unconditionally in this function. This mostly leaves the sign handling. My idea is to first use the current approach (skip all spaces, then check for a sign, if none found check at the last position of the field [effectively what this PR does, but not with an intermediate field]).

Would that imply doing this unconditionally instead of using a flag ?

Another approach would be to try following COBOL 202x MOVE statement, general rule 7 d 4, which may (I'm not sure) should only be done if the target is neither numeric-display nor numeric-edited (or its national variants if the source is national):

  • define an intermediate unsigned field with no decimal places
  • size+digits = orig_size (or if this is > 31 then 31)
  • data=orig_data (or moved to the right to the last 31 positions, if there are more)
  • then do a numeric move from that intermediate field to the target one

That possibly yields nearly the same result as this PR (nearly because the data will likely always be handled as unsigned). It would change the handling of leading separate sign of our current implementation (to an error in ascii in general and an error in ebcdic with "+").

Would this be configurabled through a dialect flag ? Our customer expects the sign to be preserved, as it is the case on GCOS.

@GitMensch
Copy link
Collaborator

For both options there would be an option to make this depending on a dialect (or optimization) flag.

If you like the first and go down that route I think there's no option necessary, is it?
Note: in any case we need a NEWS entry (especially for "no option" - I think that can go under "important bug fixes").

It may be good to have that test include a performance check option, as in

01 FILLER USAGE BINARY-INT VALUE 0.
88 DO-DISP VALUE 0.
88 NO-DISP VALUE 1.
REPLACE ==DISPLAY== BY ==IF DO-DISP DISPLAY==.
*
PROCEDURE DIVISION.
MAIN.
* Test with DISPLAY on error
PERFORM DO-CHECK.
>> IF CHECK-PERF IS DEFINED
SET NO-DISP TO TRUE
* some performance checks on the way...
PERFORM DO-CHECK 20000 TIMES.
>> END-IF
GOBACK.
DO-CHECK.

This allows to easily verify both correctness and speed if we change that later on.

@ddeclerck
Copy link
Contributor Author

MF VC

SRC-ALNUM        : 'ABC456789}'
 -> DST-NUMDISP  : '123456789=-'
 -> DST-NUMEDIT  : '-1 234 567 89='
 -> DST-PACK     : '+123456789='
 -> DST-BIN      : '+1234567801'
...
SRC-ALNUM        : 'ABC456789p'
 -> DST-NUMDISP  : '1234567890-'
 -> DST-NUMEDIT  : '-1 234 567 890'
 -> DST-PACK     : '+1234567890'
 -> DST-BIN      : '+1234567890'
...

I do not have the same results with MF VC (ASCII):

SRC-ALNUM        : 'ABC456789}'
 -> DST-NUMDISP  : '1234567903-'
 -> DST-NUMEDIT  : '+A BC4 567 89}'
 -> DST-PACK     : '+1234567903'
 -> DST-BIN      : '+1234567903'
 ...
SRC-ALNUM        : 'ABC456789p'
 -> DST-NUMDISP  : '1234567890-'
 -> DST-NUMEDIT  : '+A BC4 567 89p'
 -> DST-PACK     : '+1234567890'
 -> DST-BIN      : '+1234567890'
 ...

The two differences I notice are:

  • no normalization occur when moving to NUMERIC-EDITED
  • "789}" becomes "7903" instead of "789=" (which made more sense to me because '}' = 0x7D and '=' = 0x3D)

Now, I barely know how MF works, so maybe I'm doing something wrong ? Or maybe it's something that behaves differently depending on the version...

@ddeclerck ddeclerck force-pushed the gcos_move branch 2 times, most recently from a58b6f1 to 9115c8c Compare December 16, 2024 18:56
@GitMensch
Copy link
Collaborator

@ddeclerck Given 44c96d2 - what is the current state of this PR (and is it still high-priority)? If it is high, please rebase after upstreaming #204.

@GitMensch
Copy link
Collaborator

Also note that this change, together with a minimal test case, should be upstreamed as well, potentially with a comment /* skip leading zeroes + */.

@ddeclerck
Copy link
Contributor Author

@ddeclerck Given 44c96d2 - what is the current state of this PR

Well, that commit you mention (from PR #201) was about fixing stuff and adding a bit of normalization on moves to NUMERCI-EDITED. This PR is about normalizing when moving from ALPHANUMERIC to NUMERIC.

(and is it still high-priority)? If it is high, please rebase after upstreaming #204.

It's still kind of important, although not as critical as the other issues reported in the past few days (and a new one that just arrived tonight 😭).

@ddeclerck
Copy link
Contributor Author

Also note that this change, together with a minimal test case, should be upstreamed as well, potentially with a comment /* skip leading zeroes + */.

You'd want this in a separate PR ?

@GitMensch
Copy link
Collaborator

Also note that this change, together with a minimal test case, should be upstreamed as well, potentially with a comment /* skip leading zeroes + */.

You'd want this in a separate PR ?

As you want, either a separate one or directly upstream.

@ddeclerck
Copy link
Contributor Author

Also note that this change, together with a minimal test case, should be upstreamed as well, potentially with a comment /* skip leading zeroes + */.

You'd want this in a separate PR ?

As you want, either a separate one or directly upstream.

I went through a PR just to ensure the CI passes on all platforms.
#207

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants