使用空格作为分隔符将字符串拆分为C / C ++中的字符串数组的更好方法

对不起，我的C / C ++不是那么好，但是对我来说，下面的现有代码看起来像垃圾。它也有一个错误 – 当str =“07/02/2010”由’\ 0’终止时失败 – 。我认为不是修复错误，而是可以重写。在Python中它只是'kas\nhjkfh kjsdjkasf'.split() 。我知道这是C-ish代码，但分割字符串不是那么复杂！坚持相同的签名，而不使用额外的库，我怎样才能改进它 – 让它简短而甜美？我可以说这个代码闻起来，例如因为else子句一直到最后。

线路失败：

 _tcsncpy_s( s.GetBuffer((int) (nIndex-nLast)), nIndex-nLast, psz+nLast, (size_t) (nIndex-nLast) );

当字符串“07/02/2010”以’\ 0’结尾时，它将尝试将11个字符写入只有10个字符长的缓冲区。

全function：

 #define // This will return the text string as a string array // This function is called from SetControlText to parse the // text string into an array of CStrings that the control // Gadgets will attempt to interpret BOOL CLVGridDateTimeCtrl::ParseTextWithCurrentFormat(const CString& str, const CGXStyle* pOldStyle, CStringArray& strArray ) { // Unused: pOldStyle; // we assume that the significant segments are seperated by space // Please change m_strDelim to add other delimiters CString s; LPCTSTR psz = (LPCTSTR) str; BOOL bLastCharSpace = FALSE; DWORD size = str.GetLength()+1; // (newline will start a new row, tab delimiter will // move to the next column). // parse buffer (DBCS aware) for (DWORD nIndex = 0, nLast = 0; nIndex < size; nIndex += _tclen(psz+nIndex)) { // check for a delimiter if (psz[nIndex] == _T('\0') || _tcschr(_T("\r\n"), psz[nIndex]) || _tcschr(_T(" "), psz[nIndex]) ||!_tcscspn(&psz[nIndex], (LPCTSTR)m_strDelim)) { s.ReleaseBuffer(); s.Empty(); // abort parsing the string if next char // is an end-of-string if (psz[nIndex] == _T('\0')) { if (psz[nIndex] == _T('\r') && psz[nIndex+1] == _T('\n')) nIndex++; _tcsncpy_s(s.GetBuffer((int) (nIndex-nLast)), nIndex-nLast, psz+nLast, (size_t) (nIndex-nLast)); CString temStr = s; strArray.Add(temStr); temStr.Empty(); break; } else if (_tcscspn(&psz[nIndex], (LPCTSTR)m_strDelim) == 0 && !bLastCharSpace) { if (psz[nIndex] == _T('\r') && psz[nIndex+1] == _T('\n')) nIndex++; _tcsncpy_s(s.GetBuffer((int) (nIndex-nLast)), nIndex-nLast, psz+nLast, (size_t) (nIndex-nLast)); CString temStr = s; strArray.Add(temStr); temStr.Empty(); bLastCharSpace = TRUE; // abort parsing the string if next char // is an end-of-string if (psz[nIndex+1] == _T('\0')) break; } // Now, that the value has been copied to the cell, // let's check if we should jump to a new row. else if (_tcschr(_T(" "), psz[nIndex]) && !bLastCharSpace) { if (psz[nIndex] == _T('\r') && psz[nIndex+1] == _T('\n')) nIndex++; _tcsncpy_s(s.GetBuffer((int) (nIndex-nLast)), nIndex-nLast, psz+nLast, (size_t) (nIndex-nLast)); CString temStr = s; strArray.Add(temStr); temStr.Empty(); bLastCharSpace = TRUE; // abort parsing the string if next char // is an end-of-string if (psz[nIndex+1] == _T('\0')) break; } nLast = nIndex + _tclen(psz+nIndex); } else { // nLast = nIndex + _tclen(psz+nIndex); bLastCharSpace = FALSE; } } if (strArray.GetSize()) return TRUE; else return FALSE; }

编辑： m_strDelim = _T(","); 此成员变量仅在此函数中使用。我想我现在看到了标记化的意义 – 它试图解析日期和时间……等等，还有更多！以下是调用此函数的代码。请帮我改进一下。我的一些同事声称C＃使它们不比C ++更有效率。我曾经觉得自己像个白痴，因为我无法对我说同样的话。

 // SetControlText will attempt to convert the text to a valid date first with // the help of COleDateTime and then with the help of the Date control and the // current format BOOL CLVGridDateTimeCtrl::ConvertControlTextToValue(CString& str, ROWCOL nRow, ROWCOL nCol, const CGXStyle* pOldStyle) { CGXStyle* pStyle = NULL; BOOL bSuccess = FALSE; if (pOldStyle == NULL) { pStyle = Grid()->CreateStyle(); Grid()->ComposeStyleRowCol(nRow, nCol, pStyle); pOldStyle = pStyle; } // allow only valid input { // First do this CLVDateTime dt; if (str.IsEmpty()) { ; // if (Grid()->IsCurrentCell(nRow, nCol)) // Reset(); bSuccess = TRUE; } else if (dt.ParseDateTime(str,CLVGlobals::IsUSDateFormat()) && (DATE) dt != 0) { SetDateTime(dt); if (m_bDateValueAsNumber) str.Format(_T("%g"), (DATE) dt); else str = dt.Format(); bSuccess = TRUE; } else { // parse the string using the current format CStringArray strArray; if (!ParseTextWithCurrentFormat(str, pOldStyle, strArray)) return FALSE; UpdateNullStatus(m_TextCtrlWnd); SetFormat(m_TextCtrlWnd, *pOldStyle); int nArrIndex = 0; for(int i=0; iGetValue(); // s.Empty(); if(m_TextCtrlWnd.m_gadgets[i]->IsKindOf(RUNTIME_CLASS(SECDTNumericGadget))) { // TRACE(_T("The value %s\n"), strArray[nArrIndex]); ((CLVDTNumericGadget*)m_TextCtrlWnd.m_gadgets[i])->m_nNewValue = _ttoi(strArray[nArrIndex]); nArrIndex++; if (nArrIndex>strArray.GetUpperBound()) break; } else if(m_TextCtrlWnd.m_gadgets[i]->IsKindOf(RUNTIME_CLASS(SECDTListGadget)) && val!=-1) { int nIndex = ((CLVDTListGadget*)m_TextCtrlWnd.m_gadgets[i])->FindMatch(strArray[nArrIndex], ((CLVDTListGadget*)m_TextCtrlWnd.m_gadgets[i])->GetValue()+1); if (nIndex!=-1) { // TRACE(_T("The value %s\n"), strArray[nArrIndex]); ((CLVDTListGadget*)m_TextCtrlWnd.m_gadgets[i])->SetValue(nIndex); nArrIndex++; if (nArrIndex>strArray.GetUpperBound()) break; } } CLVDBValue dbDate = m_TextCtrlWnd.GetDateTime(); if (dbDate.IsNull()) str = _T(""); else { CLVDateTime dt = (CLVDateTime)dbDate; if (m_bDateValueAsNumber) str.Format(_T("%g"), (DATE) dt); else str = dt.Format(); } } bSuccess = TRUE; } } if (pStyle) Grid()->RecycleStyle(pStyle); return bSuccess; }

String Toolkit Library（Strtk）针对您的问题提供以下解决方案：

 #include  #include  #include "strtk.hpp" int main() { std::string data("kas\nhjkfh kjsdjkasf"); std::deque str_list; strtk::parse(data, ", \r\n", str_list); return 0; }

更多例子可以在这里找到

在C ++中，使用stsringstream可能最简单：

 std::istringstream buffer("kas\nhjkfh kjsdjkasf"); std::vector strings; std::copy(std::istream_iterator(buffer), std::istream_iterator(), std::back_inserter(strings));

我没有试图坚持完全相同的签名，主要是因为它大部分是非标准的，所以它一般不适用于C ++。

另一种可能性是使用Boost::tokenizer ，但显然确实涉及另一个库，因此我不会尝试更详细地介绍它。

我不确定这是否符合“奇异语法”的要求。我可能要在那方面做一点工作……

编辑：我已经知道了 – 而是初始化向量：

 std::istringstream buffer("kas\nhjkfh kjsdjkasf"); std::vector strings( (std::istream_iterator(buffer)), std::istream_iterator());

“bizarro”部分是没有围绕第一个参数的额外括号，这将调用“最令人烦恼的解析”，因此它将声明一个函数而不是定义一个向量。 🙂

编辑2：就问题的编辑而言，似乎几乎不可能直接回答 – 它取决于太多类型（例如，CGXStyle，CLVDateTime）既不是标准也不是解释。举个例子，我根本不能完全遵循它。另外，这看起来像一个相当差的设计，让用户输入或多或少模糊的东西，然后试图理清这些混乱。最好使用一个只允许明确输入的控件，你可以直接读取一些包含日期和时间的字段。

Edit3：执行拆分的代码也将逗号视为分隔符，可以这样做：

 #include  #include  #include  #include  #include  class my_ctype : public std::ctype { public: mask const *get_table() { // this copies the "classic" table used by : static std::vector::mask> table(classic_table(), classic_table()+table_size); // Anything we want to separate tokens, we mark its spot in the table as 'space'. table[','] = (mask)space; // and return a pointer to the table: return &table[0]; } my_ctype(size_t refs=0) : std::ctype(get_table(), false, refs) { } }; int main() { // put our data in a strea: std::istringstream buffer("first kas\nhjkfh kjsdjk,asf\tlast"); // Create a ctype object and tell the stream to use it for parsing tokens: my_ctype parser; buffer.imbue(std::locale(std::locale(), &parser)); // separate the stream into tokens: std::vector strings( (std::istream_iterator(buffer)), std::istream_iterator()); // copy the tokes to cout so we can see what we got: std::copy(strings.begin(), strings.end(), std::ostream_iterator(std::cout, "\n")); return 0; }

最好的方法是使用strtok 。该链接应该是如何使用它的自我解释，你也可以使用多个分隔符。非常方便的Cfunction。

排除这个问题的最重要的方法是使用Qt库。如果你正在使用KDE，那么它们已经安装好了。 QString类有一个成员函数split，就像python版本一样。例如

 QString("This is a string").split(" ", QString::SkipEmptyParts)

返回QString的QStringList ：

 ["This", "is", "a", "string"]

（用pythonic语法）。注意第二个参数是必需的，否则单词应该被多个空格分割，每个单独的一个都会被返回。

一般来说，我在Qt库的帮助下找到了python的大多数简单性，例如。简单的字符串解析和列表迭代，可以轻松处理并具有C ++的强大function。

用C / C ++解析字符串很少是一件简单的事情。您发布的方法看起来有相当多的“历史”参与其中。例如，您声明要将字符串拆分为空格。但该方法本身似乎使用成员变量m_strDelim作为拆分决策的一部分。简单地替换该方法可能会导致其他意外问题。

使用现有的标记化类（例如此Boost库）可以简化相当多的事情。

你可以使用boost::algorithm::split 。即：

 std::string myString; std::vector splitStrings; boost::algorithm::split(splitStrings, myString, boost::is_any_of(" \r\n"));

比我的其他答案更好的方法： TR1的正则表达式function。这是一个让你入门的小教程。这个答案是C ++，使用正则表达式（这可能是分割字符串的最佳/最简单方法），我最近自己使用它，所以我知道它是一个很好的工具。

使用空格作为分隔符将字符串拆分为C / C ++中的字符串数组的更好方法

putchar（）vs printf（） – 有区别吗？

C UINT16如何搞定？

C管道，fork，dup和exec（）

C / C ++中的显式类型转换操作符

将此代码行转换为C.

我正确使用malloc吗？

C：下标一个不完整类型的数组是合法的吗？

我们可以给静态数组的大小一个变量

在MSVC中模拟C函数（Visual Studio）

scanf和strcmp with c string