new bounds heuristic: one larger than the largest index accessed in the function? #599

mwhicks1 · 2021-05-22T00:33:32Z

For this code

int getX(int *x) { return x[0]; }
int getY(int *x, int i) { return x[i]; }
int getN(int *x, int i) { if (i<2) return x[i]; else return 0; }

You might imagine that we could infer the bound should be at least as much as the largest index, so the first bound would be count(1) and the second would be count(i), but that doesn't happen. If we add a conditional, as in the third function, heuristics pick up the bound:

int getX(_Array_ptr<int> x) { return x[0]; }
int getY(_Array_ptr<int> x, int i) { return x[i]; }
int getN(_Array_ptr<int> x : count(2), int i) { if (i<2) return x[i]; else return 0; }

The text was updated successfully, but these errors were encountered:

mattmccutchen-cci · 2021-05-22T13:21:40Z

I'd appreciate meaningful issue titles. I've tried to improve this one.

second would be count(i)

I assume you mean count(i+1)?

Do you have a realistic example program in which this feature would be useful so that we can prioritize it appropriately? (Has this been discussed somewhere else that I'm not aware of?)

Machiry · 2021-06-15T21:05:15Z

I'd appreciate meaningful issue titles. I've tried to improve this one.

second would be count(i)

I assume you mean count(i+1)?

Do you have a realistic example program in which this feature would be useful so that we can prioritize it appropriately? (Has this been discussed somewhere else that I'm not aware of?)

For yacr2 of ptrdist:

ulong ExistPathAboveVCG(_Array_ptr<nodeVCGType> VCG : count(below + 1), ulong above, ulong below)
{
    DFSClearVCG(VCG);
    DFSAboveVCG(VCG, above);
    return(VCG[below].netsAboveReached);
}

That further added bounds to other functions:

ulong VCV(_Array_ptr<nodeVCGType> VCG : count(check + 1), ulong check, ulong track, _Array_ptr<ulong> assign)
{
    ulong	net;
    ulong	vcv;
    vcv = 0;
    for (net = 1; net <= channelNets; net++) {
	if (assign[net]) {
	    if (assign[net] < track) {
		/*
		 * Above VCV?
		 */
		if (ExistPathAboveVCG(VCG, net, check)) {
		    vcv++;
		}
	    }
	    else if (assign[net] > track) {
		/*
		 * Below VCV?
		 */
		if (ExistPathAboveVCG(VCG, check, net)) {
		    vcv++;
		}
	    }
	    else {
		/*
		 * Above or Below VCV?
		 */
		if (ExistPathAboveVCG(VCG, net, check) || ExistPathAboveVCG(VCG, check, net)) {
		    vcv++;
		}
	    }
	}
    }
    return(vcv);
}

mattmccutchen-cci · 2021-06-15T21:18:31Z

Thanks for posting the example. Mike forwarded it to me earlier, and we discussed it a bit; I'll restate my thoughts here. In short, this example by itself doesn't overcome my skepticism that the count(i+1) heuristic (which required a nontrivial amount of code to implement) should be added to 3C. Obviously, Mike can overrule me, and the fact that the code is already written and reviewed may affect the decision here. But as we continue adding to the bounds inference going forward, I would urge that we think seriously in advance about whether the realistic value of each feature justifies its total maintenance cost.

Based on my brief look at the yacr2 code, it looks like the size of VCG is actually channelNets + 1 (from the malloc in AllocVCG), not check + 1. In effect, the functions you quoted take one array index and access that element of the array. I'd argue that it would better convey the design of the code if the bound were channelNets + 1; then by default, the compiler would insert a runtime check that the index is in bounds. Then if we wanted, we could use a _Where clause to document that the index must be in bounds and push any runtime check to the caller. In fact, I wonder why 3C didn't propagate the channelNets + 1 bound already. Maybe because it's a mutable global variable and thus Checked C does not allow it to be used as a bound for function parameters?

Note that this issue is independent of the weird thing yacr2 does with indexing arrays from 1 to N. If yacr2 used the usual 0 to N-1, then the size would be just channelNets, but the check + 1 issue would still exist.

Thinking about the general phenomenon illustrated by this example: As long as the actual array size is in scope and valid to use as a bound, I propose that 3C should prefer the size, so we wouldn't need this new "i + 1" heuristic. If we think cases in which the full array size isn't usable as a bound will be common enough, that could justify adding the heuristic. That may be true in this yacr2 example, but I claim that the example is unconventional: the size is stored only in the channelNets global variable, but VCG is passed as a parameter that AFAICT is always equal to the VCG global variable. A more conventional design would be to put both channelNets and VCG in a structure and pass that structure to the function; then even if the function doesn't access the size field, it is still in scope to use as a bound.

Machiry · 2021-06-15T22:00:21Z

Thanks for posting the example. Mike forwarded it to me earlier, and we discussed it a bit; I'll restate my thoughts here. In short, this example by itself doesn't overcome my skepticism that the count(i+1) heuristic (which required a nontrivial amount of code to implement) should be added to 3C. Obviously, Mike can overrule me, and the fact that the code is already written and reviewed may affect the decision here. But as we continue adding to the bounds inference going forward, I would urge that we think seriously in advance about whether the realistic value of each feature justifies its total maintenance cost.

Based on my brief look at the yacr2 code, it looks like the size of VCG is actually channelNets + 1 (from the malloc in AllocVCG), not check + 1. In effect, the functions you quoted take one array index and access that element of the array. I'd argue that it would better convey the design of the code if the bound were channelNets + 1; then by default, the compiler would insert a runtime check that the index is in bounds. Then if we wanted, we could use a _Where clause to document that the index must be in bounds and push any runtime check to the caller. In fact, I wonder why 3C didn't propagate the channelNets + 1 bound already. Maybe because it's a mutable global variable and thus Checked C does not allow it to be used as a bound for function parameters?

Note that this issue is independent of the weird thing yacr2 does with indexing arrays from 1 to N. If yacr2 used the usual 0 to N-1, then the size would be just channelNets, but the check + 1 issue would still exist.

Thinking about the general phenomenon illustrated by this example: As long as the actual array size is in scope and valid to use as a bound, I propose that 3C should prefer the size, so we wouldn't need this new "i + 1" heuristic. If we think cases in which the full array size isn't usable as a bound will be common enough, that could justify adding the heuristic. That may be true in this yacr2 example, but I claim that the example is unconventional: the size is stored only in the channelNets global variable, but VCG is passed as a parameter that AFAICT is always equal to the VCG global variable. A more conventional design would be to put both channelNets and VCG in a structure and pass that structure to the function; then even if the function doesn't access the size field, it is still in scope to use as a bound.

Agree that yacr2 is a special case. But in general, I think the check + 1 would help overall in determining bounds. Unfortunately, note sure if _Where clause is yet supported by Checked C, in which case, we could revamp the bounds inference to use more streamlined and localized bounds.

mattmccutchen-cci · 2021-06-16T01:11:06Z

Agree that yacr2 is a special case. But in general, I think the check + 1 would help overall in determining bounds.

Can you give a more convincing example then?

Unfortunately, note sure if _Where clause is yet supported by Checked C, in which case, we could revamp the bounds inference to use more streamlined and localized bounds.

It's an area of ongoing work. For example, _Where clauses are not yet checked at call sites (microsoft#218), though Sulekha claimed the Checked C team would start working on that soon. If the uses of _Where clauses by 3C that you envision at least don't cause errors in the current compiler, then it might make sense to have 3C go ahead and start generating them and let Microsoft finish the implementation later. The compiler's validation has so many holes right now anyway that I wouldn't be worried about adding more holes by adopting _Where clauses early.

mwhicks1 added enhancement New feature or request array bounds code pattern labels May 22, 2021

Machiry self-assigned this May 22, 2021

mattmccutchen-cci changed the title ~~new bounds heuristic?~~ new bounds heuristic: one larger than the largest index accessed in the function? May 22, 2021

Machiry mentioned this issue May 31, 2021

Adding bounds inference of the form count(i+1) #602

Merged

Machiry linked a pull request May 31, 2021 that will close this issue

Adding bounds inference of the form count(i+1) #602

Merged

mattmccutchen-cci mentioned this issue Jun 16, 2021

Handling function pointer bounds #587

Merged

Machiry closed this as completed in #602 Jun 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new bounds heuristic: one larger than the largest index accessed in the function? #599

new bounds heuristic: one larger than the largest index accessed in the function? #599

mwhicks1 commented May 22, 2021

mattmccutchen-cci commented May 22, 2021

Machiry commented Jun 15, 2021

mattmccutchen-cci commented Jun 15, 2021 •

edited

Loading

Machiry commented Jun 15, 2021

mattmccutchen-cci commented Jun 16, 2021

new bounds heuristic: one larger than the largest index accessed in the function? #599

new bounds heuristic: one larger than the largest index accessed in the function? #599

Comments

mwhicks1 commented May 22, 2021

mattmccutchen-cci commented May 22, 2021

Machiry commented Jun 15, 2021

mattmccutchen-cci commented Jun 15, 2021 • edited Loading

Machiry commented Jun 15, 2021

mattmccutchen-cci commented Jun 16, 2021

mattmccutchen-cci commented Jun 15, 2021 •

edited

Loading