Watch, Follow, &
Connect with Us

For forums, blogs and more please visit our
Developer Tools Community.


Welcome, Guest
Guest Settings
Help

Thread: Indy10 Bug (with fix): HashStream on very large file



Permlink Replies: 1 - Last Post: Dec 14, 2016 5:27 PM Last Post By: Remy Lebeau (Te...
Raymond Obin

Posts: 1
Registered: 3/16/08
Indy10 Bug (with fix): HashStream on very large file
Click to report abuse...   Click to reply to this thread Reply
  Posted: Dec 14, 2016 3:14 PM
btw, really impressed with Indy10. I've just replaced IWinHTTPRequest with IdHTTP because IWinHTTPRequest ran out of memory on my 1.2Gig file.
Below is a consequent problem (and solution) that arose in the conversion.

Note: Problem 2 below has been fixed in the Berlin 10.1 Delphi distro (Revision 5341 - although the problem is still in TIdHashMessageDigest2) but isn't fixed in the d.com:444/svn/Indy10/branches/DelphiNextGen/Lib - Revision 5381.

I was using TIdHashMessageDigest5.HashStreamAsHex(aStream) on a TFileStream for a file > 1Gig and was getting out-of-memory errors. On further investigation I found:

Problem 1:
System\IdStreamVCL.pas, in class function TIdStreamHelperVCL.ReadBytes() is the code:
if Length(VBytes) < (AOffset+LActual) then begin
    SetLength(VBytes, AOffset+LActual);
end;

this keeps reallocating the buffer until it is as big as the file.
I think the code should be:
if Length(VBytes) < (LActual) then begin
    SetLength(VBytes, LActual);
end;

to keep the buffer only as big as it needs to be.

Problem 2:
Protocols/IdHashMessageDigest.pas TIdHashMessageDigest4.NativeGetHashBytes() reads the last chunk of the stream with:
// Read the last set of bytes.
  LStartPos := ReadTIdBytesFromStream(AStream, FCBuffer, ASize);

ASize is the size of the whole file, which means it will allocate a buffer the size of the whole file.
LSize is already the number of bytes required, so I think the code should be:
// Read the last set of bytes.
  LStartPos := ReadTIdBytesFromStream(AStream, FCBuffer, LSize);


I've tried this with a 3Meg and a 1.2Gig file, and it give the expected results.

Note: I think these issues occur in parallel code (eg TIdHashMessageDigest2.

Edited by: Raymond Obin on Dec 14, 2016 3:25 PM
Remy Lebeau (Te...


Posts: 9,447
Registered: 12/23/01
Re: Indy10 Bug (with fix): HashStream on very large file
Click to report abuse...   Click to reply to this thread Reply
  Posted: Dec 14, 2016 5:27 PM   in response to: Raymond Obin in response to: Raymond Obin
Raymond wrote:

Problem 1:

System\IdStreamVCL.pas, in class function
TIdStreamHelperVCL.ReadBytes() is the code:

if Length(VBytes) < (AOffset+LActual) then begin
SetLength(VBytes, AOffset+LActual);
end;


this keeps reallocating the buffer until it is as big as the file.

Only if the caller is not managing the AOffset parameter correctly. That
parameter is an offset within the byte array, not the stream. AOffset allows
the caller the option of passing in a pre-allocated array and then reading
byte chunks to particular offsets within the array. ReadBytes() will reallocate
the array only if it is not large enough to receive the requested number
of bytes at the requested offset.

I think the code should be:

if Length(VBytes) < (LActual) then begin
SetLength(VBytes, LActual);
end;


to keep the buffer only as big as it needs to be.

No, the code should not be doing that. LActual is the number of bytes to
be read from the stream and then copied to the array at the requested offset.
Thus, AOffset has to be taken into account when making sure the array is
large enough to receive the bytes.

The code is working as designed. If you are having a problem with memory
usage, look at the code that is calling ReadBytes(). There are some places
where Indy processes streams/files as a single buffer instead of in small
chunks. But the hashing classes don't do that, they hash in chunks.

Problem 2:

Protocols/IdHashMessageDigest.pas
TIdHashMessageDigest4.NativeGetHashBytes() reads the last chunk of the
stream with:

// Read the last set of bytes.
LStartPos := ReadTIdBytesFromStream(AStream, FCBuffer, ASize);


ASize is the size of the whole file

Only if the file size is less than 16 bytes. If you look right above that
call, there is a loop that processes the stream in 16-byte chunks first.
After the loop is finished, ASize contains the number of bytes remaining,
if any.

LSize is already the number of bytes required, so I think the code
should be:

// Read the last set of bytes.
LStartPos := ReadTIdBytesFromStream(AStream, FCBuffer, LSize);

No, because LSize is only used inside of the loop, where it contains the
number of bytes read on each iteration (which should always be 16 unless
an I/O error occurs).

Note: I think these issues occur in parallel code (eg
TIdHashMessageDigest2.

Same thing as above.

Note: Problem 2 below has been fixed in the Berlin 10.1 Delphi distro
(Revision 5341 - although the problem is still in TIdHashMessageDigest2)

No, it is not present, because it does not do what you claim it is doing
to begin with.

but isn't fixed in the d.com:444/svn/Indy10/branches/DelphiNextGen/Lib
- Revision 5381.

The last revision in that branch is 4987, not 5381. That is a difference
of almost 400 revisions checked in after that branch was last worked on.

That branch was created for Delphi XE3/XE4 when support for mobile platforms
was first being added (that is what DelphiNextGen refers to - the NextGen
mobile compilers). That branch code was merged into the main code several
years ago, and there have been many fixes since.

The current active code is in the SVN Trunk only. Ignore any branches, they
are no longer relavant. You are looking at outdated code.

--
Remy Lebeau (TeamB)
Legend
Helpful Answer (5 pts)
Correct Answer (10 pts)

Server Response from: ETNAJIVE02