推荐保留预处理器指令的C前端

我想开始一个涉及转换C代码的项目,但我想要包含预处理器指令。 我不想通过编写自己的C解析器重新发明轮子,所以有人知道可以解析C预处理器和C代码的前端,并产生一个可以用来重新生成的AST(或者漂亮的 -打印)原始来源?

例如,:

#define FILENAME "filename" #include  FILE *f=0; ... if (file_is_open) { #ifdef CAN_OPEN_IT f = fopen(FILENAME, "r"); #else printf("Unable to open file.\n"); #endif } 

应将上述代码解析为一些可用于重新生成源的内存中表示。 换句话说,它不应该在两个阶段中作为普通C处理,首先处理PP指令然后解析纯C代码。 相反,它应该代表整个编译时逻辑,包括预处理器变量。

看看Clang 。 (参见http://clang.llvm.org/features.html#applications 。)

我们的DMS软件再造工具包有一个C前端 (和一个C ++前端):

  • 解析(可编译)各种方言的C源代码到AST中,
  • 在大多数情况下,将预处理程序指令保留为AST节点
  • 可以从AST重新生成可编译的C代码(带有注释和预处理器指令)
  • 可以在单个图像中收集数千个文件,以允许跨文件分析和转换
  • 提供完整的符号表构造和访问
  • 使用大型AST操作库提供对AST的过程访问,包括导航,检查,插入,删除,替换,匹配,…
  • 使用以C表示法编写的与AST匹配的模式提供源到源的转换

对于C(尚未用于C ++),DMS还提供:

  • 控制和数据流分析
  • 本地和全球的分析点
  • 全局调用图构造

DMS已被用于处理极大的C应用程序,以便从原始源代码库中提取事实并生成新的派生代码。

(编辑:2016年2月)

它可以处理OP的示例(稍作修复以使其有效)。 这是略有修改的来源:

 #define FILENAME "filename" #include  FILE *f; main() { f=0; if (file_is_open) { #ifdef CAN_OPEN_IT f = fopen(FILENAME, "r"); #else printf("Unable to open file.\n"); #endif } } 

这是AST产生的:

 C~GCC4 Domain Parser Version 3.0.1(28449) Copyright (C) 1996-2013 Semantic Designs, Inc; All Rights Reserved; SD Confidential Powered by DMS (R) Software Reengineering Toolkit AST Optimizations: remove constant tokens, remove unary productions, compact sequences Using encoding Unicode-UTF-8?ANSI +CRLF +1 /^I (translation_unit@C~GCC4=2#4a7e0e0^0 Line 1 Column 1 File C:/temp/test.c (declaration_seq@C~GCC4=605#4a77580^1#4a7e0e0:1 {4} Line 1 Column 1 File C:/temp/test.c (control_line@C~GCC4=1094#4a775c0^1#4a77580:1 Line 1 Column 1 File C:/temp/test.c ('#'@C~GCC4=1548#4a771c0^1#4a775c0:1[Keyword:0] Line 1 Column 1 File C:/temp/test.c)'#' (IDENTIFIER@C~GCC4=1531#4a77200^1#4a775c0:2[`FILENAME'] Line 1 Column 9 File C:/temp/test.c)IDENTIFIER (@C~GCC4=1603#4a77180^2#4a775c0:3#4a7f300:1[`FILENAME'] Line 1 Column 18 File C:/temp/test.c $VOID$ [Child 1] |(STRING_LITERAL@C~GCC4=1525#4a77160^2#4a77180:2#4a7f300:2[`filename'] Line 1 Column 18 File C:/temp/test.c)STRING_LITERAL $VOID$ [Child 3] )#4a77180 (new_line@C~GCC4=1578#4a77260^1#4a775c0:4[Keyword:0] Line 1 Column 28 File C:/temp/test.c)new_line )control_line#4a775c0 (control_line@C~GCC4=1104#4a77460^1#4a77580:2 Line 2 Column 1 File C:/temp/test.c ('#'@C~GCC4=1548#4a77340^1#4a77460:1[Keyword:0] Line 2 Column 1 File C:/temp/test.c)'#' (ANGLED_HEADER_NAME@C~GCC4=1589#4a77380^1#4a77460:2[`stdio.h'] Line 2 Column 10 File C:/temp/test.c)ANGLED_HEADER_NAME (new_line@C~GCC4=1578#4a773c0^1#4a77460:3[Keyword:0] Line 2 Column 19 File C:/temp/test.c)new_line )control_line#4a77460 (simple_declaration@C~GCC4=631#4a774c0^1#4a77580:3 Line 4 Column 1 File C:/temp/test.c (IDENTIFIER@C~GCC4=1531#4a77360^1#4a774c0:1[`FILE'] Line 4 Column 1 File C:/temp/test.c)IDENTIFIER (declarator@C~GCC4=850#4a77520^1#4a774c0:2 Line 4 Column 6 File C:/temp/test.c |(ptr_operator@C~GCC4=866#4a77560^1#4a77520:1 Line 4 Column 6 File C:/temp/test.c)ptr_operator |(IDENTIFIER@C~GCC4=1531#4a77480^1#4a77520:2[`f'] Line 4 Column 7 File C:/temp/test.c)IDENTIFIER )declarator#4a77520 )simple_declaration#4a774c0 (function_definition@C~GCC4=966#4a77be0^1#4a77580:4 Line 5 Column 1 File C:/temp/test.c (direct_declarator@C~GCC4=852#4a77440^1#4a77be0:1 Line 5 Column 1 File C:/temp/test.c |(IDENTIFIER@C~GCC4=1531#4a774e0^1#4a77440:1[`main'] Line 5 Column 1 File C:/temp/test.c)IDENTIFIER |(parameter_declaration_clause@C~GCC4=900#4a77220^1#4a77440:2 Line 5 Column 6 File C:/temp/test.c)parameter_declaration_clause )direct_declarator#4a77440 (compound_statement@C~GCC4=507#4a77b20^1#4a77be0:2 Line 5 Column 8 File C:/temp/test.c |(statement_seq@C~GCC4=511#4a77d20^1#4a77b20:1 {2} Line 6 Column 3 File C:/temp/test.c | (AMBIGUITY@C~GCC4=1602#4a77680^1#4a77d20:1{2} Line 6 Column 3 File C:/temp/test.c | (expression_statement@C~GCC4=503#4a7e040^1#4a77680:1 Line 6 Column 3 File C:/temp/test.c | (assignment_expression@C~GCC4=457#4a77f00^1#4a7e040:1 Line 6 Column 3 File C:/temp/test.c | |(assignment_target@C~GCC4=470#4a77a00^1#4a77f00:1 Line 6 Column 3 File C:/temp/test.c | | (IDENTIFIER@C~GCC4=1531#4a77400^2#4a77a00:1#4a77fc0:1[`f'] Line 6 Column 3 File C:/temp/test.c)IDENTIFIER | |)assignment_target#4a77a00 | |(INT_LITERAL@C~GCC4=1471#4a77a60^2#4a77f00:2#4a77f60:1[0] Line 6 Column 5 File C:/temp/test.c)INT_LITERAL | )assignment_expression#4a77f00 | )expression_statement#4a7e040 | (simple_declaration@C~GCC4=630#4a7e060^1#4a77680:2 Line 6 Column 3 File C:/temp/test.c | (init_declarator@C~GCC4=835#4a77fc0^1#4a7e060:1 Line 6 Column 3 File C:/temp/test.c | |(IDENTIFIER@C~GCC4=1531#4a77400^2... [ALREADY PRINTED] ...) | |(initializer@C~GCC4=983#4a77f60^1#4a77fc0:2 Line 6 Column 4 File C:/temp/test.c | | (INT_LITERAL@C~GCC4=1471#4a77a60^2... [ALREADY PRINTED] ...) | |)initializer#4a77f60 | )init_declarator#4a77fc0 | )simple_declaration#4a7e060 | )AMBIGUITY#4a77680 | (selection_statement@C~GCC4=527#4a77b40^1#4a77d20:2 Line 7 Column 1 File C:/temp/test.c | (IDENTIFIER@C~GCC4=1531#4a7e0c0^1#4a77b40:1[`file_is_open'] Line 7 Column 5 File C:/temp/test.c)IDENTIFIER | (compound_statement@C~GCC4=507#4a77ae0^1#4a77b40:2 Line 7 Column 19 File C:/temp/test.c | (statement@C~GCC4=490#4a7f840^1#4a77ae0:1 Line 8 Column 1 File C:/temp/test.c | |(if_directive@C~GCC4=1088#4a7f1c0^1#4a7f840:1 Line 8 Column 1 File C:/temp/test.c | | ('#'@C~GCC4=1548#4a7f240^1#4a7f1c0:1[Keyword:0] Line 8 Column 1 File C:/temp/test.c)'#' | | (IDENTIFIER@C~GCC4=1531#4a7ee60^1#4a7f1c0:2[`CAN_OPEN_IT'] Line 8 Column 8 File C:/temp/test.c)IDENTIFIER | | (new_line@C~GCC4=1578#4a7f1e0^1#4a7f1c0:3[Keyword:0] Line 8 Column 19 File C:/temp/test.c)new_line | |)if_directive#4a7f1c0 | |(AMBIGUITY@C~GCC4=1602#4a77d40^1#4a7f840:2{2} Line 9 Column 5 File C:/temp/test.c | | (expression_statement@C~GCC4=503#4a7f4a0^1#4a77d40:1 Line 9 Column 5 File C:/temp/test.c | | (assignment_expression@C~GCC4=457#4a7f3c0^1#4a7f4a0:1 Line 9 Column 5 File C:/temp/test.c | | (assignment_target@C~GCC4=470#4a7eec0^1#4a7f3c0:1 Line 9 Column 5 File C:/temp/test.c | | |(IDENTIFIER@C~GCC4=1531#4a7eee0^2#4a7eec0:1#4a7f400:1[`f'] Line 9 Column 5 File C:/temp/test.c)IDENTIFIER | | )assignment_target#4a7eec0 | | (postfix_expression@C~GCC4=201#4a7f2e0^1#4a7f3c0:2 Line 9 Column 9 File C:/temp/test.c | | |(IDENTIFIER@C~GCC4=1531#4a7f120^2#4a7f2e0:1#4a7f160:1[`fopen'] Line 9 Column 9 File C:/temp/test.c)IDENTIFIER | | |(expression_list@C~GCC4=228#4a7f260^2#4a7f2e0:2#4a7f160:2 Line 9 Column 15 File C:/temp/test.c | | | (@C~GCC4=1607#4a7f300^1#4a7f260:1[`FILENAME'] Line 9 Column 15 File C:/temp/test.c | | | (@C~GCC4=1603#4a77180^2... [ALREADY PRINTED] ...) | | | (STRING_LITERAL@C~GCC4=1525#4a77160^2... [ALREADY PRINTED] ...) | | | $VOID$ [Child 3] | | | (STRING_LITERAL@C~GCC4=1525#4a7f2c0^1#4a7f300:4[`filename'] Line 1 Column 18 File C:/temp/test.c)STRING_LITERAL | | | $VOID$ [Child 5] | | | )#4a7f300 | | | (STRING_LITERAL@C~GCC4=1525#4a7f140^1#4a7f260:2[`r'] Line 9 Column 25 File C:/temp/test.c)STRING_LITERAL | | |)expression_list#4a7f260 | | )postfix_expression#4a7f2e0 | | )assignment_expression#4a7f3c0 | | )expression_statement#4a7f4a0 | | (simple_declaration@C~GCC4=630#4a7f480^1#4a77d40:2 Line 9 Column 5 File C:/temp/test.c | | (init_declarator@C~GCC4=835#4a7f400^1#4a7f480:1 Line 9 Column 5 File C:/temp/test.c | | (IDENTIFIER@C~GCC4=1531#4a7eee0^2... [ALREADY PRINTED] ...) | | (initializer@C~GCC4=983#4a7f3e0^1#4a7f400:2 Line 9 Column 7 File C:/temp/test.c | | |(postfix_expression@C~GCC4=201#4a7f160^1#4a7f3e0:1 Line 9 Column 9 File C:/temp/test.c | | | (IDENTIFIER@C~GCC4=1531#4a7f120^2... [ALREADY PRINTED] ...) | | | (expression_list@C~GCC4=228#4a7f260^2... [ALREADY PRINTED] ...) | | |)postfix_expression#4a7f160 | | )initializer#4a7f3e0 | | )init_declarator#4a7f400 | | )simple_declaration#4a7f480 | |)AMBIGUITY#4a77d40 | |(else_directive@C~GCC4=1091#4a7f4c0^1#4a7f840:3 Line 10 Column 1 File C:/temp/test.c | | ('#'@C~GCC4=1548#4a7f500^1#4a7f4c0:1[Keyword:0] Line 10 Column 1 File C:/temp/test.c)'#' | | (new_line@C~GCC4=1578#4a7f4e0^1#4a7f4c0:2[Keyword:0] Line 10 Column 6 File C:/temp/test.c)new_line | |)else_directive#4a7f4c0 | |(expression_statement@C~GCC4=503#4a7f7c0^1#4a7f840:4 Line 11 Column 5 File C:/temp/test.c | | (postfix_expression@C~GCC4=201#4a77ba0^1#4a7f7c0:1 Line 11 Column 5 File C:/temp/test.c | | (IDENTIFIER@C~GCC4=1531#4a7f640^1#4a77ba0:1[`printf'] Line 11 Column 5 File C:/temp/test.c)IDENTIFIER | | (STRING_LITERAL@C~GCC4=1525#4a77c20^1#4a77ba0:2[`Unable to open file. '] Line 11 Column 12 File C:/temp/test.c)STRING_LITERAL | | )postfix_expression#4a77ba0 | |)expression_statement#4a7f7c0 | |(endif_directive@C~GCC4=1092#4a7f7e0^1#4a7f840:5 Line 12 Column 1 File C:/temp/test.c | | ('#'@C~GCC4=1548#4a7f720^1#4a7f7e0:1[Keyword:0] Line 12 Column 1 File C:/temp/test.c)'#' | | (new_line@C~GCC4=1578#4a7f700^1#4a7f7e0:2[Keyword:0] Line 12 Column 7 File C:/temp/test.c)new_line | |)endif_directive#4a7f7e0 | )statement#4a7f840 | )compound_statement#4a77ae0 | )selection_statement#4a77b40 |)statement_seq#4a77d20 )compound_statement#4a77b20 )function_definition#4a77be0 )declaration_seq#4a77580 )translation_unit#4a7e0e0 

您可以在第8行看到预处理程序指令为“if_directive”。

是的,DMS也可以同时打印这棵树。 以下命令运行解析器以生成AST,然后运行DMS prettyprinter以仅从树中重新生成源。 往返是准确的; 你可以重新编译并获得相同的结果。 评论也被保留。

 C:\DMS\Domains\C\GCC4\Tools\PrettyPrinter>run domainprettyprinter \temp\test.c C~GCC4 PrettyPrinter Version 1.2.13 Copyright (C) 2004-2013 Semantic Designs, Inc; All Rights Reserved; SD Confidential Powered by DMS (R) Software Reengineering Toolkit #define FILENAME "filename" #include  FILE *f; main() { f = 0; if (file_is_open) { #ifdef CAN_OPEN_IT f = fopen(FILENAME, "r"); #else printf("Unable to open file.\n"); #endif } } 

您可以看到DMS如何处理C ++ 。 此时它处理GCC和MS方言的所有C ++ 14。

使用GNU gcc编译器,预处理源所需的标志是gcc -E mysource.c ,有关详细信息,请参见此处 。 至于漂亮的印刷它,有缩进 ,这解释了这里的用法,这有点旧,但仍然值得一提。 还有cflow可以生成源的映射。

对不起,如果我误解了你在找什么……

希望这会有所帮助,最好的问候,汤姆。

您可以查看http://www.antlr.org/wiki/display/ANTLR3/ANTLR3+Code+Generation+-+C