Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new bounds heuristic: one larger than the largest index accessed in the function? #599

Closed
mwhicks1 opened this issue May 22, 2021 · 5 comments · Fixed by #602
Closed

new bounds heuristic: one larger than the largest index accessed in the function? #599

mwhicks1 opened this issue May 22, 2021 · 5 comments · Fixed by #602
Assignees
Labels

Comments

@mwhicks1
Copy link
Member

For this code

int getX(int *x) { return x[0]; }
int getY(int *x, int i) { return x[i]; }
int getN(int *x, int i) { if (i<2) return x[i]; else return 0; }

You might imagine that we could infer the bound should be at least as much as the largest index, so the first bound would be count(1) and the second would be count(i), but that doesn't happen. If we add a conditional, as in the third function, heuristics pick up the bound:

int getX(_Array_ptr<int> x) { return x[0]; }
int getY(_Array_ptr<int> x, int i) { return x[i]; }
int getN(_Array_ptr<int> x : count(2), int i) { if (i<2) return x[i]; else return 0; }
@Machiry Machiry self-assigned this May 22, 2021
@mattmccutchen-cci mattmccutchen-cci changed the title new bounds heuristic? new bounds heuristic: one larger than the largest index accessed in the function? May 22, 2021
@mattmccutchen-cci
Copy link
Member

I'd appreciate meaningful issue titles. I've tried to improve this one.

second would be count(i)

I assume you mean count(i+1)?

Do you have a realistic example program in which this feature would be useful so that we can prioritize it appropriately? (Has this been discussed somewhere else that I'm not aware of?)

@Machiry
Copy link
Collaborator

Machiry commented Jun 15, 2021

I'd appreciate meaningful issue titles. I've tried to improve this one.

second would be count(i)

I assume you mean count(i+1)?

Do you have a realistic example program in which this feature would be useful so that we can prioritize it appropriately? (Has this been discussed somewhere else that I'm not aware of?)

For yacr2 of ptrdist:

ulong ExistPathAboveVCG(_Array_ptr<nodeVCGType> VCG : count(below + 1), ulong above, ulong below)
{
    DFSClearVCG(VCG);
    DFSAboveVCG(VCG, above);
    return(VCG[below].netsAboveReached);
}

That further added bounds to other functions:

ulong VCV(_Array_ptr<nodeVCGType> VCG : count(check + 1), ulong check, ulong track, _Array_ptr<ulong> assign)
{
    ulong	net;
    ulong	vcv;
    vcv = 0;
    for (net = 1; net <= channelNets; net++) {
	if (assign[net]) {
	    if (assign[net] < track) {
		/*
		 * Above VCV?
		 */
		if (ExistPathAboveVCG(VCG, net, check)) {
		    vcv++;
		}
	    }
	    else if (assign[net] > track) {
		/*
		 * Below VCV?
		 */
		if (ExistPathAboveVCG(VCG, check, net)) {
		    vcv++;
		}
	    }
	    else {
		/*
		 * Above or Below VCV?
		 */
		if (ExistPathAboveVCG(VCG, net, check) || ExistPathAboveVCG(VCG, check, net)) {
		    vcv++;
		}
	    }
	}
    }
    return(vcv);
}

@mattmccutchen-cci
Copy link
Member

mattmccutchen-cci commented Jun 15, 2021

Thanks for posting the example. Mike forwarded it to me earlier, and we discussed it a bit; I'll restate my thoughts here. In short, this example by itself doesn't overcome my skepticism that the count(i+1) heuristic (which required a nontrivial amount of code to implement) should be added to 3C. Obviously, Mike can overrule me, and the fact that the code is already written and reviewed may affect the decision here. But as we continue adding to the bounds inference going forward, I would urge that we think seriously in advance about whether the realistic value of each feature justifies its total maintenance cost.

Based on my brief look at the yacr2 code, it looks like the size of VCG is actually channelNets + 1 (from the malloc in AllocVCG), not check + 1. In effect, the functions you quoted take one array index and access that element of the array. I'd argue that it would better convey the design of the code if the bound were channelNets + 1; then by default, the compiler would insert a runtime check that the index is in bounds. Then if we wanted, we could use a _Where clause to document that the index must be in bounds and push any runtime check to the caller. In fact, I wonder why 3C didn't propagate the channelNets + 1 bound already. Maybe because it's a mutable global variable and thus Checked C does not allow it to be used as a bound for function parameters?

Note that this issue is independent of the weird thing yacr2 does with indexing arrays from 1 to N. If yacr2 used the usual 0 to N-1, then the size would be just channelNets, but the check + 1 issue would still exist.

Thinking about the general phenomenon illustrated by this example: As long as the actual array size is in scope and valid to use as a bound, I propose that 3C should prefer the size, so we wouldn't need this new "i + 1" heuristic. If we think cases in which the full array size isn't usable as a bound will be common enough, that could justify adding the heuristic. That may be true in this yacr2 example, but I claim that the example is unconventional: the size is stored only in the channelNets global variable, but VCG is passed as a parameter that AFAICT is always equal to the VCG global variable. A more conventional design would be to put both channelNets and VCG in a structure and pass that structure to the function; then even if the function doesn't access the size field, it is still in scope to use as a bound.

@Machiry
Copy link
Collaborator

Machiry commented Jun 15, 2021

Thanks for posting the example. Mike forwarded it to me earlier, and we discussed it a bit; I'll restate my thoughts here. In short, this example by itself doesn't overcome my skepticism that the count(i+1) heuristic (which required a nontrivial amount of code to implement) should be added to 3C. Obviously, Mike can overrule me, and the fact that the code is already written and reviewed may affect the decision here. But as we continue adding to the bounds inference going forward, I would urge that we think seriously in advance about whether the realistic value of each feature justifies its total maintenance cost.

Based on my brief look at the yacr2 code, it looks like the size of VCG is actually channelNets + 1 (from the malloc in AllocVCG), not check + 1. In effect, the functions you quoted take one array index and access that element of the array. I'd argue that it would better convey the design of the code if the bound were channelNets + 1; then by default, the compiler would insert a runtime check that the index is in bounds. Then if we wanted, we could use a _Where clause to document that the index must be in bounds and push any runtime check to the caller. In fact, I wonder why 3C didn't propagate the channelNets + 1 bound already. Maybe because it's a mutable global variable and thus Checked C does not allow it to be used as a bound for function parameters?

Note that this issue is independent of the weird thing yacr2 does with indexing arrays from 1 to N. If yacr2 used the usual 0 to N-1, then the size would be just channelNets, but the check + 1 issue would still exist.

Thinking about the general phenomenon illustrated by this example: As long as the actual array size is in scope and valid to use as a bound, I propose that 3C should prefer the size, so we wouldn't need this new "i + 1" heuristic. If we think cases in which the full array size isn't usable as a bound will be common enough, that could justify adding the heuristic. That may be true in this yacr2 example, but I claim that the example is unconventional: the size is stored only in the channelNets global variable, but VCG is passed as a parameter that AFAICT is always equal to the VCG global variable. A more conventional design would be to put both channelNets and VCG in a structure and pass that structure to the function; then even if the function doesn't access the size field, it is still in scope to use as a bound.

Agree that yacr2 is a special case. But in general, I think the check + 1 would help overall in determining bounds. Unfortunately, note sure if _Where clause is yet supported by Checked C, in which case, we could revamp the bounds inference to use more streamlined and localized bounds.

@mattmccutchen-cci
Copy link
Member

Agree that yacr2 is a special case. But in general, I think the check + 1 would help overall in determining bounds.

Can you give a more convincing example then?

Unfortunately, note sure if _Where clause is yet supported by Checked C, in which case, we could revamp the bounds inference to use more streamlined and localized bounds.

It's an area of ongoing work. For example, _Where clauses are not yet checked at call sites (microsoft#218), though Sulekha claimed the Checked C team would start working on that soon. If the uses of _Where clauses by 3C that you envision at least don't cause errors in the current compiler, then it might make sense to have 3C go ahead and start generating them and let Microsoft finish the implementation later. The compiler's validation has so many holes right now anyway that I wouldn't be worried about adding more holes by adopting _Where clauses early.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants