How do I alter this tokenization process to work on a text file with multiple lines?

andandandand Source

I'm working this source code:

#include <string>
#include <vector>
#include <iostream>
#include <istream>
#include <ostream>
#include <iterator>
#include <sstream>
#include <algorithm>

int main()
{
  std::string str = "The quick brown fox";

  // construct a stream from the string
  std::stringstream strstr(str);

  // use stream iterators to copy the stream to the vector as whitespace separated strings
  std::istream_iterator<std::string> it(strstr);
  std::istream_iterator<std::string> end;
  std::vector<std::string> results(it, end);

  // send the vector to stdout.
  std::ostream_iterator<std::string> oit(std::cout);
  std::copy(results.begin(), results.end(), oit);
}

To, instead of tokenizing a single line and putting it into the vector results, it tokenizes a group of lines taken from this text file and puts the resulting words into a single vector .

Text File:
Munroe states there is no particular meaning to the name and it is simply a four-letter word without a phonetic pronunciation, something he describes as "a treasured and carefully-guarded point in the space of four-character strings." The subjects of the comics themselves vary. Some are statements on life and love (some love strips are simply art with poetry), and some are mathematical or scientific in-jokes.

So far, I'm only clear that I need to use a

while (getline(streamOfText, readTextLine)){} 

to get the loop running.

But I don't think this would work:

while (getline(streamOfText, readTextLine)) { cout << readTextLine << endl;

// construct a stream from the string std::stringstream strstr(readTextLine);

// use stream iterators to copy the stream to the vector as whitespace separated strings std::istream_iterator it(strstr); std::istream_iterator end; std::vector results(it, end);

/*HOw CAN I MAKE THIS INSIDE THE LOOP WITHOUT RE-DECLARING AND USING THE CONSTRUCTORS FOR THE ITERATORS AND VECTOR? */

  // send the vector to stdout.
  std::ostream_iterator<std::string> oit(std::cout);
  std::copy(results.begin(), results.end(), oit);

          }
c++iteratortoken

Answers

answered 9 years ago Johannes Schaub - litb #1

Yes, then you have one whole line in readTextLine. Is it that what you wanted in that loop? Then instead of constructing the vector from the istream iterators, copy into the vector, and define the vector outside the loop:

std::vector<std::string> results;
while (getline(streamOfText, readTextLine)){
    std::istringstream strstr(readTextLine);
    std::istream_iterator<std::string> it(strstr), end;
    std::copy(it, end, std::back_inserter(results));
}

You actually don't need to read a line into the string first, if all you need is all words from a stream, and no per-line processing. Just read from the other stream directly like you did in your code. It will not only read words from one line, but from the whole stream, until the end-of-file:

std::istream_iterator<std::string> it(streamOfText), end;
std::vector<std::string> results(it, end);

To do all that manually, like you ask for in the comments, do

std::istream_iterator<std::string> it(streamOfText), end;
while(it != end) results.push_back(*it++);

I recommend you to read a good book on this. It will show you much more useful techniques i think. C++ Standard library by Josuttis is a good book.

comments powered by Disqus