View Full Version: Scanning a file

C++ Learning Community > C++ Tips > Scanning a file


Title: Scanning a file
Description: Using algorithms from STL


myork - March 2, 2006 02:33 PM (GMT)
AquaFox just had a great example where he Encrypted a file.

But most of the work done in the programe involved writting a loop to correctly iterate through the input file one character at a time. Though it was done correctly (after a couple of tries :-) it shows that the actual algorithm is obscured by some boiler plate looping code.

The C++ commitee (or should I say the STL sub-commitee) recognised this and as part of the STL we have std::istream_iterator<> (its brother std::istreambuf_iterator<>) for reading a file and std::ostream_iteratorr<> for writting a file. Even if you do not want to use the std::algorithms you can use these iterators in your own for loop.

CODE


#include <fstream>
#include <iterator>

int main(int argc,char* argv[])
{
   std::ifstream       in(argv[1]);
   std::ofstream       out(argv[2]);

   if (in && out)
   {
       std::istreambuf_iterator<char>  loop(in);
       std::istreambuf_iterator<char>  end;

       std::ostream_iterator<char>     outIter(out);

       for(;loop != end;++loop)
       {
           (*outIter)  = (*loop);
           ++outIter;
       }
   }
   return(0);
}


Now start throwing std algorithms into the mix and we are rocking.

But the real tip here. Don't read a file using a C style algorithm. There are so many little things that can go wrong. By using the stream iterators you automatically have a nice interface that will loop through a file very simply.

Advanced Tip:
Note that above I use:
CODE
std::istreambuf_iterator<char>
That is because I wanted to read every character one a time. But if I know that the file is fulll of space seporated integers then I can simply transform the iterator
CODE
std::istream_iterator<int>
Now it will read an integer at a time.

AquaFox - March 2, 2006 02:43 PM (GMT)
Sticky please.

myork - March 2, 2006 06:23 PM (GMT)
Lets extend the concept to more customised data:
There was a recent question about reading User Data from a file:

We can use the same technique for this:

Here we read the UserData file.
And print the username to the standard output.
CODE

#include <fstream>
#include <iterator>
#include <iostream>

struct UserData
{
   int    id;
   char username[256];
   char password[256];
};

int main(int argc,char* argv[])
{
  std::ifstream       in(argv[1]);

  if (in)
  {
      std::istream_iterator<UserData>  loop(in);
      std::istream_iterator<UserData>  end;

      std::ostream_iterator<const char*>     outIter(std::cout,"\n");

      for(;loop != end;++loop)
      {
          (*outIter)  = loop->username;
          ++outIter;
      }
  }
  return(0);
}


Notice the only thing that changed was the template parameter for the stream iterators. To use this method though we need to define 'operator>>()' for objects that are read and the 'operator<<()' for objects that we are going to write.

In this example we are reading UserData, so we need to define std::istream operator>>(std::istream& s,UserData& obj);. We are writing a 'const char*' (the username) and there is already an operator defined for this so no work is need there.

So stripping code from original problem we can define the operator as:
CODE
std::istream& operator>>(std::istream& s,UserData& x)
{
   /*
    * Just for the record this is not how I would do it.
    * But I wanted to retain backward compatibility with the original code.
    *
    * But note, because of endianess the integer value is not guaranteed portable
    * between different systems.
    */
   s.read(reinterpret_cast<char*>(&x.id),sizeof(int));
   s.read(x.username,256);
   s.read(x.password,256);
   return(s);
}

Viper - March 2, 2006 06:35 PM (GMT)
My preferred way to read a whole file pretty fast..
CODE
std::stringstream sstream;
sstream << file.rdbuf();
std::string filecontents = sstream.str();

myork - March 2, 2006 06:52 PM (GMT)
QUOTE (Viper @ Mar 2 2006, 01:35 PM)
My preferred way to read a whole file pretty fast..
CODE
std::stringstream sstream;
sstream << file.rdbuf();
std::string filecontents = sstream.str();


But all you are doing is moving the file from one buffer into another.
The first buffer is on a file system the new buffer is in memory.

You have done nothing to processes the file, you now have the data in a buffer that is easier to move iterators bidirectionally. If you have to do anything to processes the file you have not done yourself in any favours because of all the extra memory handling costs. So unless reading the objects requires a bidirectional iterator this is not worth the cost.

NB. If the file is small the cost will be small.
But if the file becomes large then the cost will grow exponentially. Especially since a file System is designed to handle large block while memory is designed to handle small fast blocks.


Viper - March 2, 2006 07:32 PM (GMT)
I know, I just said reading it in pretty fast :)

xdracox - March 2, 2006 09:25 PM (GMT)
QUOTE (myork @ Mar 2 2006, 12:23 PM)
CODE
std::istream& operator>>(std::istream& s,UserData& x)
{
   /*
    * Just for the record this is not how I would do it.
    * But I wanted to retain backward compatibility with the original code.
    *
    * But note, because of endianess the integer value is not guaranteed portable
    * between different systems.
    */
   s.read(reinterpret_cast<char*>(&x.id),sizeof(int));
   s.read(x.username,256);
   s.read(x.password,256);
   return(s);
}

How would you do it then?

myork - March 2, 2006 10:48 PM (GMT)
QUOTE (xdracox @ Mar 2 2006, 04:25 PM)
How would you do it then?

Not relavant to this thread. Send me an e-mail offline and I will show you.

Back to the thread.

You may have noticed that in the center of my loop I kept doing this:
CODE
         (*outIter)  = loop->username;
         ++outIter;


When it would have been easier todo:
CODE
stream /*like std::cout*/ << loop->username << "\n";


Well the reason is that it makes the transition to the standard algorithms easier to illustrate. Lets now translate my two examples:

Example 1:
CODE
      for(;loop != end;++loop)
      {
          (*outIter)  = (*loop);
          ++outIter;
      }


Here I am basically copying from one iterator to another. So we can replace this loop with std::copy();
CODE
   std::copy(loop,end,outIter);


Example 2:
CODE
     for(;loop != end;++loop)
     {
         (*outIter)  = loop->username;
         ++outIter;
     }


Here we are copying part of the first iterator to the second iterator. You could say I am transforming the input from the first iterator and assigning it to the second iterator. The STL has just such an algorithm:
CODE
   std::transform(loop,end,outIter,<TransFormMethod>);

Now here we have to provide the STL with a callback to do the transform work. This is relatively trivial. The object TransFormMethod can be many things a function a functor or anything else that can be called like a function.

Transform Example: C Like Example
CODE
const char* TransFormCallBackFunction(const UserData& obj)
{
   return(obj.username);
}


OK. That will work. But its very C-Like.
What we need is a thing that will act like a function (but it does not need to be a function just act like one. Here we come across the concept of functors. This is basically an object that defines the 'operator()'. So just like the example above it needs to accept a single reference to an object of type 'UserData' and return 'const char*'.

Transform Example: C++ Example
CODE
struct TransFormFunctor
{
   const char* operator()(const UserData& obj) const
   {
       return(obj.username);
   }
};

myork - March 2, 2006 11:21 PM (GMT)

Darn you are thinking.
To do anything useful with std::transform() (or other advanced algorithms) I need to start defining functor classes! That sucks as the code that does the work is no longer near the loop code and thus makes it harder to understand.

CODE
   std::transform(loop,end,outIter,TransFormFunctor());

// Not as informative as:
   for(;loop != end;++loop)
   {
       (*outIter)  = loop->username;
       ++outIter;
   }


I agree using the transform like this is not as informative as the loop. But again the STL has you covered. Lets go back to the UserData class.

CODE
struct UserData
{
  int    id;
  char username[256];
  char password[256];
};


If we had written in more like a C++ class with controlled access to the members:
CODE
class UserData
{
  public: UserData(const char* username,const char* password);
      const char* getUserName() cont {return(username);}

       /*
        * Other Methods as required
        */
  private:
  int    id;
  char username[256];
  char password[256];
};


Now we do not need to write our own functor we can just get the STL to call 'getUserName()'.

CODE
   std::transform(loop,end,outIter,std::mem_fun_ref(&UserData::getUserName));


Tada: Nearly self documenting code.




Hosted for free by InvisionFree