Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix decimal constant bug occuring when several programs in a COBOL file #113

Closed

Conversation

ddeclerck
Copy link
Contributor

This patch fixes a nasty bug dealing with decimal constants when there are several programs in a COBOL file.

Consider the following COBOL file:

       IDENTIFICATION DIVISION.
       PROGRAM-ID. prog.
       DATA DIVISION.
       WORKING-STORAGE SECTION.
       01  X  PIC 9(2).
       PROCEDURE DIVISION.
           CALL "prog2"
           IF X + 42 = 0
               DISPLAY "OK".
           STOP RUN.
       END PROGRAM prog.

       PROGRAM-ID. prog2 INITIAL.
       PROCEDURE DIVISION.
           EXIT PROGRAM.
       END PROGRAM prog2.

Attempting to run this program will result in a segfault:

prog.cob:8: attempt to reference invalid memory address (signal)

We found out the bug was occurring because returning from a subprogram clears all decimal constants, i.e the generated code for prog2 contains:

  P_clear_decimal:
  /* Clear Decimal Constant values */
  cob_decimal_clear (dc_1);
  dc_1 = NULL;

Hence when prog tries to use the constant 42 (through dc_1), we are greeted with a segfault since dc_1 is now NULL.

The proposed fix is to move all the decimal constants to the "local" header files instead of the "global" header.
The testsuite has been run and does not complain.

@codecov-commenter
Copy link

codecov-commenter commented Oct 3, 2023

Codecov Report

Merging #113 (dbaaf16) into gcos4gnucobol-3.x (c0d64ad) will increase coverage by 0.01%.
The diff coverage is 100.00%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

@@                  Coverage Diff                  @@
##           gcos4gnucobol-3.x     #113      +/-   ##
=====================================================
+ Coverage              65.74%   65.75%   +0.01%     
=====================================================
  Files                     32       32              
  Lines                  59092    59090       -2     
  Branches               15575    15572       -3     
=====================================================
+ Hits                   38849    38857       +8     
+ Misses                 14262    14254       -8     
+ Partials                5981     5979       -2     
Files Coverage Δ
cobc/codegen.c 76.06% <100.00%> (+0.07%) ⬆️

... and 3 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@GitMensch
Copy link
Collaborator

GitMensch commented Oct 3, 2023 via email

@ddeclerck
Copy link
Contributor Author

Confirmed on 3.1 and trunk as well.
Should I open the issue here or on SourceForge ?

@GitMensch
Copy link
Collaborator

GitMensch commented Oct 3, 2023 via email

@ddeclerck
Copy link
Contributor Author

Hi @GitMensch,
Did you have a bit of time to check this ?

@lefessan
Copy link
Member

The bug report is here: https://sourceforge.net/p/gnucobol/bugs/920/

@ddeclerck
Copy link
Contributor Author

You mean here : https://sourceforge.net/p/gnucobol/bugs/917/
(920 is a different thing)

@GitMensch
Copy link
Collaborator

Without further checks I do wonder where this has gone bad, it seems this was fixed already?
https://sourceforge.net/p/gnucobol/bugs/431/

@ddeclerck
Copy link
Contributor Author

Without further checks I do wonder where this has gone bad, it seems this was fixed already? https://sourceforge.net/p/gnucobol/bugs/431/

I think this is a different issue. The one you mention is about decimal constants not being initialized at all.
The issue we are facing right now is that when exiting a called module, every decimal constant are de-initialized, even those used by the calling module.

Copy link
Collaborator

@GitMensch GitMensch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only minor changes to do for now, mostly to the testsuite.

Please be so kind to adjust it, then upload the generated code of that program with and without the codegen change, to ease rechecking the result after switching from "storage" to local.

tests/testsuite.src/run_misc.at Outdated Show resolved Hide resolved
tests/testsuite.src/run_misc.at Outdated Show resolved Hide resolved
cobc/codegen.c Show resolved Hide resolved
tests/testsuite.src/run_misc.at Outdated Show resolved Hide resolved
tests/testsuite.src/run_misc.at Outdated Show resolved Hide resolved
tests/testsuite.src/run_misc.at Outdated Show resolved Hide resolved
@ddeclerck
Copy link
Contributor Author

Only minor changes to do for now, mostly to the testsuite.

Please be so kind to adjust it, then upload the generated code of that program with and without the codegen change, to ease rechecking the result after switching from "storage" to local.

That has been done.

Here is the generated code in both case (relevant changes in .h files) :
prog_nofix.zip
prog_fix.zip

@GitMensch
Copy link
Collaborator

Thank you for your work. I've adjusted the bug report with the findings - it isn't about INITIAL at all, but about CANCEL...

It seems reasonable to move the "INITIAL" from AT_SETUP to AT_KEYWORDS (also add CALL and CANCEL there), then drop INITIAL from prog.cpy and include that in the current places via REPLACING and finally add two other iterations of that copy using non-initial programs that are CANCELed after their CALL.

As far as I see your change will then still work fine - then please push that change to the testsuite here.

The second issue in both the old and new codegen: all constants from all runtime-elements (programs and user-defined functions) are initialized and clean in all runtime-elements, even if they aren't used at all (which is highlighted by your first test code where the constant was not used in prog2 and still got cleaned at its exit.
This is only related to the issue tackled here and may be handled separately. I've another bug report for this, which could be solved first (and drop the make_decimal check in that loop, too.

Another related issue is, that we don't need multiple definitions in the first place. Adjusting that would also fix the bug at hand, but I currently see only two options and am not sure if those are good (both would leave the constant to be static and global):

  1. move the init/cleanup to shared object loading/unloading (like output_so_load_version_check), but that would leave them active forever if COB_PHYSICAL_CANCEL is not used - and still needs a fallback if neither HAVE_ATTRIBUTE_CONSTRUCTOR nor WIN32 is set, the longer I think about that, the less I like that option myself
  2. if the compile-unit has more than one program (otherwise there is no problem in the first place), then adjust the codegen to check that variable for NULL before doing the init and add a reference counter for each cob_decimal that is incremented on each place where the init is done, decrement on cleanup and only do real cleanup + setting to NULL if the reference counter is zero.

We could use the change as-is, which would fix the SIGSEGV with the downside of a bit increased cpu time and memory usage (common programs likely have 5 to 20 decimal constants).

What is your take on this?

@ddeclerck
Copy link
Contributor Author

I updated the testsuite as requested (fix still works fine).

As for your suggestions:

  1. move the init/cleanup to shared object loading/unloading (like output_so_load_version_check), but that would leave them active forever if COB_PHYSICAL_CANCEL is not used - and still needs a fallback if neither HAVE_ATTRIBUTE_CONSTRUCTOR nor WIN32 is set, the longer I think about that, the less I like that option myself

I agree that this might not be the best option.

  1. if the compile-unit has more than one program (otherwise there is no problem in the first place), then adjust the codegen to check that variable for NULL before doing the init and add a reference counter for each cob_decimal that is incremented on each place where the init is done, decrement on cleanup and only do real cleanup + setting to NULL if the reference counter is zero.

Seems like a reasonable approach. It's probably best to tackle issue 923 first though. An optimal approach would be to add that counter only for those constants that appear in more than one program - but that might be overengineering...

@GitMensch
Copy link
Collaborator

I'd declare this PR as "ready for svn" (even when svn committing is still postponed) as soon as the Changelog entry is added.

Can you work on #923? In this case I'd suggest to postpone the change on

	if (CB_TREE_CLASS (m->x) == CB_CLASS_NUMERIC
	 && m->make_decimal) {

there.

@ddeclerck
Copy link
Contributor Author

I'll try. I'll add questions directly on the SVN issue if needed.

@ddeclerck
Copy link
Contributor Author

Note: as suggested, I put back the following, since it is handled by 923.

if (CB_TREE_CLASS (m->x) == CB_CLASS_NUMERIC
	 && m->make_decimal) {

@GitMensch
Copy link
Collaborator

@ddeclerck Please bring this and #115 to svn, as two commits in whatever order. Then reference the commits at the referenced issues on SF (not sure if you can close them there yourself, if yes, please do so).

@GitMensch
Copy link
Collaborator

now upstream - thanks!

@GitMensch GitMensch closed this Jan 22, 2024
@ddeclerck ddeclerck deleted the fix_decimal_constant branch July 15, 2024 12:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants