Frama-C anagram函数行为validation

我写了一个C函数,检查两个给定的字符串(C风格)是否是字谜。 我尝试用Frama-Cvalidation它,但它不能validation函数的最终行为(其他规范是有效的)。 第一个进入超时(即使WP中的超时值非常高),第二个未知。

这是代码:

#include  //@ ghost char alphabet[26] = {'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'}; /*@ // Takes a character and return it to lowercase if it's uppercase axiomatic ToLower { logic char to_lower(char c); axiom lowercase: \forall char c; 97 <= c  to_lower(c) == c; axiom uppercase: \forall char c; 65 <= c  to_lower(c) == to_lower((char) (c+32)); } */ /*@ // Count the occurences of character 'c' into 'string' that is long 'n' characters axiomatic CountChar { logic integer count_char(char* string, integer n, char c); axiom count_zero: \forall char* string, integer n, char c; n  count_char(string, n, c) == 0; axiom count_hit: \forall char* string, integer n, char c; n >= 0 && to_lower(string[n]) == c ==> count_char(string, n+1, c) == count_char(string, n, c) + 1; axiom count_miss: \forall char* string, integer n, char c; n >= 0 && to_lower(string[n]) != c ==> count_char(string, n+1, c) == count_char(string, n, c); } */ /*@ predicate are_anagrams{L}(char* s1, char* s2) = ( \forall integer i; 0 <= i  count_char(s1, strlen(s1), alphabet[i]) == count_char(s2, strlen(s2), alphabet[i]) ); */ /*@ requires valid_string(a); requires valid_string(b); // Requires that strings 'a' and 'b' are composed only by alphabet's letters and that are long equally. requires \forall integer k; 0 <= k  65 <= a[k] <= 90 || 97 <= a[k] <= 122; requires \forall integer k; 0 <= k  65 <= b[k] <= 90 || 97 <= b[k] <= 122; requires strlen(a) == strlen(b); ensures 0 <= \result <= 1; assigns \nothing; behavior anagrams: assumes are_anagrams(a, b); ensures \result == 1; behavior not_anagrams: assumes !are_anagrams(a, b); ensures \result == 0; complete behaviors anagrams, not_anagrams; disjoint behaviors anagrams, not_anagrams; */ int check_anagram(const char a[], const char b[]) { // Create two arrays and initialize them to zero int first[26]; int second[26]; int c; /*@ loop assigns first[0..(c-1)]; loop assigns second[0..(c-1)]; loop assigns c; loop invariant 0 <= c <= 26; loop invariant \forall integer k; 0 <= k  second[k] == first[k]; loop invariant \forall integer k; 0 <= k  first[k] == 0 && second[k] == 0; loop invariant \valid(first+(0..25)) && \valid(second+(0..25)); loop variant 26-c; */ for(c = 0; c < 26; c++) { first[c] = 0; second[c] = 0; } char tmp = 'a'; c = 0; // Now increment the array position related to position of character occured in the alphabet, subtracting ASCII decimal value of character from the character. /*@ loop assigns first[0..25]; loop assigns tmp; loop assigns c; loop invariant 97 <= tmp <= 122; loop invariant \valid(first+(0..25)); loop invariant strlen(\at(a, Pre)) == strlen(\at(a, Here)); loop invariant 0 <= c  64 && a[c] < 91) ? a[c]+32 : a[c]; first[tmp-97]++; c++; } c = 0; // Doing the same thing on second string. /*@ loop assigns second[0..25]; loop assigns tmp; loop assigns c; loop invariant 97 <= tmp <= 122; loop invariant \valid(second+(0..25)); loop invariant strlen(\at(b, Pre)) == strlen(\at(b, Here)); loop invariant 0 <= c  64 && b[c] < 91) ? b[c]+32 : b[c]; second[tmp-'a']++; c++; } // And now compare the arrays containing the number of occurences to determine if strings are anagrams or not. /*@ loop invariant strlen(\at(a, Pre)) == strlen(\at(a, Here)); loop invariant strlen(\at(b, Pre)) == strlen(\at(b, Here)); loop invariant 0 <= c <= 26; loop assigns c; loop variant 26-c; */ for (c = 0; c < 26; c++) { if (first[c] != second[c]) return 0; } return 1; } 

你的规格一见钟情似乎是正确的(但那时它是一个非常复杂的规范。我从未写过任何复杂的ACSL,我可能会遗漏一些东西)。

但是,函数check_anagram的注释显然不足以解释为什么这个函数应该尊重契约。 特别是,考虑while循环。 为了真实地了解函数的工作原理,每个循环的不变量应该表明,在任何迭代中,数组的firstsecond包含到目前为止访问的第一个和第二个字符串的字符数。

这就是为什么在每个循环结束时,这些数组包含整个字符串的字符数。

表达这些不变量将真正显示该函数的工作原理。 没有它们,就没有希望得出合同得以实施的结论。

我不是静态分析方面的专家,但我怀疑一些静态分析引擎可能会扼杀诸如(a[c] > 64 && a[c] < 91)a[c]+32first[tmp-97]和你在这里使用的其他特定于ASCII的代码。

请记住,C不需要ASCII字符集; 就我们所知,你可能试图在EBCDIC是字符集的情况下运行它,在这种情况下,我希望可能存在缓冲区溢出,具体取决于输入。

您应该使用查找表(或某种类型的字典)将每个字符转换为整数索引,并使用touppertolower等函数来转换unsigned char值(请注意此处unsigned char的重要性)。