Mailing List Archive

os.scandir bug in Windows?
TLDR: In os.scandir directory entries, atime is always a copy of mtime
rather than the actual access time.

Demo program: Windows 10, Python 3.8.3:

# osscandirtest.py
import time, os
with open('Test', 'w') as f: f.write('Anything\n') # Write to a file
time.sleep(10)
with open('Test', 'r') as f: f.readline() # Read the file
print(os.stat('Test'))
for DirEntry in os.scandir('.'):
    if DirEntry.name == 'Test':
        stat = DirEntry.stat()
        print(f'scandir DirEntry {stat.st_ctime=} {stat.st_mtime=}
{stat.st_atime=}')

Sample output:

os.stat_result(st_mode=33206, st_ino=8162774324687317,
st_dev=2230120362, st_nlink=1, st_uid=0,
st_gid=0, st_size=10, st_atime=1600631381, st_mtime=1600631371,
st_ctime=1600631262)
scandir DirEntry stat.st_ctime=1600631262.951019
stat.st_mtime=1600631371.7062848 stat.st_atime=1600631371.7062848

For os.stat, atime is 10 seconds more than mtime, as would be expected.
But for os.scandir, atime is a copy of mtime.
ISTM that this is a bug, and in fact recently it stopped me from using
os.scandir in a program where I needed the access timestamp. No big
deal, but ...
If it is a feature for some reason, presumably it should be documented.

Best wishes
Rob Cliffe
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RIKQAXZVUAQBLECFMNN2PUOH322B2BYD/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: os.scandir bug in Windows? [ In reply to ]
Could you please file this as an issue on bugs.python.org?

Thanks!
-Greg


On Sat, Oct 17, 2020 at 7:25 PM Rob Cliffe via Python-Dev <
python-dev@python.org> wrote:

>
> TLDR: In os.scandir directory entries, atime is always a copy of mtime
> rather than the actual access time.
>
> Demo program: Windows 10, Python 3.8.3:
>
> # osscandirtest.py
> import time, os
> with open('Test', 'w') as f: f.write('Anything\n') # Write to a file
> time.sleep(10)
> with open('Test', 'r') as f: f.readline() # Read the file
> print(os.stat('Test'))
> for DirEntry in os.scandir('.'):
> if DirEntry.name == 'Test':
> stat = DirEntry.stat()
> print(f'scandir DirEntry {stat.st_ctime=} {stat.st_mtime=}
> {stat.st_atime=}')
>
> Sample output:
>
> os.stat_result(st_mode=33206, st_ino=8162774324687317,
> st_dev=2230120362, st_nlink=1, st_uid=0,
> st_gid=0, st_size=10, st_atime=1600631381, st_mtime=1600631371,
> st_ctime=1600631262)
> scandir DirEntry stat.st_ctime=1600631262.951019
> stat.st_mtime=1600631371.7062848 stat.st_atime=1600631371.7062848
>
> For os.stat, atime is 10 seconds more than mtime, as would be expected.
> But for os.scandir, atime is a copy of mtime.
> ISTM that this is a bug, and in fact recently it stopped me from using
> os.scandir in a program where I needed the access timestamp. No big
> deal, but ...
> If it is a feature for some reason, presumably it should be documented.
>
> Best wishes
> Rob Cliffe
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/RIKQAXZVUAQBLECFMNN2PUOH322B2BYD/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
Re: os.scandir bug in Windows? [ In reply to ]
Interesting! Indeed, please create an issue and post a link here.

From a quick look at the code, I can't see any obvious bugs here, the info
seems to be coming directly from FindNextFileW. This will likely require
some more digging.


On Sun, Oct 18, 2020 at 7:37 AM Gregory P. Smith <greg@krypto.org> wrote:

> Could you please file this as an issue on bugs.python.org?
>
> Thanks!
> -Greg
>
>
> On Sat, Oct 17, 2020 at 7:25 PM Rob Cliffe via Python-Dev <
> python-dev@python.org> wrote:
>
>>
>> TLDR: In os.scandir directory entries, atime is always a copy of mtime
>> rather than the actual access time.
>>
>> Demo program: Windows 10, Python 3.8.3:
>>
>> # osscandirtest.py
>> import time, os
>> with open('Test', 'w') as f: f.write('Anything\n') # Write to a file
>> time.sleep(10)
>> with open('Test', 'r') as f: f.readline() # Read the file
>> print(os.stat('Test'))
>> for DirEntry in os.scandir('.'):
>> if DirEntry.name == 'Test':
>> stat = DirEntry.stat()
>> print(f'scandir DirEntry {stat.st_ctime=} {stat.st_mtime=}
>> {stat.st_atime=}')
>>
>> Sample output:
>>
>> os.stat_result(st_mode=33206, st_ino=8162774324687317,
>> st_dev=2230120362, st_nlink=1, st_uid=0,
>> st_gid=0, st_size=10, st_atime=1600631381, st_mtime=1600631371,
>> st_ctime=1600631262)
>> scandir DirEntry stat.st_ctime=1600631262.951019
>> stat.st_mtime=1600631371.7062848 stat.st_atime=1600631371.7062848
>>
>> For os.stat, atime is 10 seconds more than mtime, as would be expected.
>> But for os.scandir, atime is a copy of mtime.
>> ISTM that this is a bug, and in fact recently it stopped me from using
>> os.scandir in a program where I needed the access timestamp. No big
>> deal, but ...
>> If it is a feature for some reason, presumably it should be documented.
>>
>> Best wishes
>> Rob Cliffe
>> _______________________________________________
>> Python-Dev mailing list -- python-dev@python.org
>> To unsubscribe send an email to python-dev-leave@python.org
>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-dev@python.org/message/RIKQAXZVUAQBLECFMNN2PUOH322B2BYD/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/INJBNXRKOBYFGFJ7CLHNJKVQQKU6X6NM/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
Re: os.scandir bug in Windows? [ In reply to ]
On 10/15/20, Rob Cliffe via Python-Dev <python-dev@python.org> wrote:
>
> TLDR: In os.scandir directory entries, atime is always a copy of mtime
> rather than the actual access time.

There are inconsistencies in various scenarios between between the
stat info from the directory entry and the stat info from the File
Control Block (FCB) -- the filesystem's in-memory record that's common
to all opens for a file/directory.

The worst case is for an NTFS file with multiple hardlinks, for which
the directory entry information is from the last time the file was
opened using a particular hardlink. The accurate NTFS file information
is in the file's Master File Table (MFT) record, which gets accessed
to populate the FCB and update the particular link when a file is
opened.

If you're looking for file times and file size, the only reliable
information comes from directly opening the file an querying the info
via GetFileInformationByHandle (called by os.stat),
GetFileInformationByHandleEx (FileBasicInfo, FileStandardInfo),
GetFileTime, and GetFileSizeEx.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/IJIFZHPEEMVPD2LN6H3MY4KGRKNQ4TBQ/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: os.scandir bug in Windows? [ In reply to ]
How do I do that, please?  I can't see an obvious create option on that
web page.  Do I need to log in?
Thanks
Rob Cliffe

On 18/10/2020 05:31, Gregory P. Smith wrote:
> Could you please file this as an issue on bugs.python.org
> <http://bugs.python.org>?
>
> Thanks!
> -Greg
>
>
> On Sat, Oct 17, 2020 at 7:25 PM Rob Cliffe via Python-Dev
> <python-dev@python.org <mailto:python-dev@python.org>> wrote:
>
>
> TLDR: In os.scandir directory entries, atime is always a copy of
> mtime
> rather than the actual access time.
>
> Demo program: Windows 10, Python 3.8.3:
>
> # osscandirtest.py
> import time, os
> with open('Test', 'w') as f: f.write('Anything\n') # Write to a file
> time.sleep(10)
> with open('Test', 'r') as f: f.readline() # Read the file
> print(os.stat('Test'))
> for DirEntry in os.scandir('.'):
>      if DirEntry.name == 'Test':
>          stat = DirEntry.stat()
>          print(f'scandir DirEntry {stat.st_ctime=} {stat.st_mtime=}
> {stat.st_atime=}')
>
> Sample output:
>
> os.stat_result(st_mode=33206, st_ino=8162774324687317,
> st_dev=2230120362, st_nlink=1, st_uid=0,
> st_gid=0, st_size=10, st_atime=1600631381, st_mtime=1600631371,
> st_ctime=1600631262)
> scandir DirEntry stat.st_ctime=1600631262.951019
> stat.st_mtime=1600631371.7062848 stat.st_atime=1600631371.7062848
>
> For os.stat, atime is 10 seconds more than mtime, as would be
> expected.
> But for os.scandir, atime is a copy of mtime.
> ISTM that this is a bug, and in fact recently it stopped me from
> using
> os.scandir in a program where I needed the access timestamp. No big
> deal, but ...
> If it is a feature for some reason, presumably it should be
> documented.
>
> Best wishes
> Rob Cliffe
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> <mailto:python-dev@python.org>
> To unsubscribe send an email to python-dev-leave@python.org
> <mailto:python-dev-leave@python.org>
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/RIKQAXZVUAQBLECFMNN2PUOH322B2BYD/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
Re: os.scandir bug in Windows? [ In reply to ]
On 10/18/2020 12:25 PM, Rob Cliffe via Python-Dev wrote:
> How do I do that, please?  I can't see an obvious create option on
> that web page.  Do I need to log in?

Yes, you need to log in before you can open an issue. You might need to
create an account first if you don't have one: it's called "Register" on
bpo. After you've logged in, there's a Create New button.

Eric


> Thanks
> Rob Cliffe
>
> On 18/10/2020 05:31, Gregory P. Smith wrote:
>> Could you please file this as an issue on bugs.python.org
>> <http://bugs.python.org>?
>>
>> Thanks!
>> -Greg
>>
>>
>> On Sat, Oct 17, 2020 at 7:25 PM Rob Cliffe via Python-Dev
>> <python-dev@python.org <mailto:python-dev@python.org>> wrote:
>>
>>
>> TLDR: In os.scandir directory entries, atime is always a copy of
>> mtime
>> rather than the actual access time.
>>
>> Demo program: Windows 10, Python 3.8.3:
>>
>> # osscandirtest.py
>> import time, os
>> with open('Test', 'w') as f: f.write('Anything\n') # Write to a file
>> time.sleep(10)
>> with open('Test', 'r') as f: f.readline() # Read the file
>> print(os.stat('Test'))
>> for DirEntry in os.scandir('.'):
>>      if DirEntry.name == 'Test':
>>          stat = DirEntry.stat()
>>          print(f'scandir DirEntry {stat.st_ctime=} {stat.st_mtime=}
>> {stat.st_atime=}')
>>
>> Sample output:
>>
>> os.stat_result(st_mode=33206, st_ino=8162774324687317,
>> st_dev=2230120362, st_nlink=1, st_uid=0,
>> st_gid=0, st_size=10, st_atime=1600631381, st_mtime=1600631371,
>> st_ctime=1600631262)
>> scandir DirEntry stat.st_ctime=1600631262.951019
>> stat.st_mtime=1600631371.7062848 stat.st_atime=1600631371.7062848
>>
>> For os.stat, atime is 10 seconds more than mtime, as would be
>> expected.
>> But for os.scandir, atime is a copy of mtime.
>> ISTM that this is a bug, and in fact recently it stopped me from
>> using
>> os.scandir in a program where I needed the access timestamp. No big
>> deal, but ...
>> If it is a feature for some reason, presumably it should be
>> documented.
>>
>> Best wishes
>> Rob Cliffe
>> _______________________________________________
>> Python-Dev mailing list -- python-dev@python.org
>> <mailto:python-dev@python.org>
>> To unsubscribe send an email to python-dev-leave@python.org
>> <mailto:python-dev-leave@python.org>
>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-dev@python.org/message/RIKQAXZVUAQBLECFMNN2PUOH322B2BYD/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
>
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/377JYZMK3MITKPCCGWQ43R5FPZPO2ADA/
> Code of Conduct: http://python.org/psf/codeofconduct/
Re: os.scandir bug in Windows? [ In reply to ]
On 15Oct2020 2239, Rob Cliffe via Python-Dev wrote:
> TLDR: In os.scandir directory entries, atime is always a copy of mtime
> rather than the actual access time.

Correction - os.stat() updates the access time to _now_, while
os.scandir() returns the last access time without updating it.

Eryk replied with a deeper explanation of the cause, but fundamentally
this is what you are seeing.

Feel free to file a bug, but we'll likely only add a vague note to the
docs about how Windows works here rather than changing anything. If
anything, we should probably fix os.stat() to avoid updating the access
time so that both functions behave the same, but that might be too
complicated.

Cheers,
Steve
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/NGMVB7GWDBCPYHL4IND2LBZ2QPXLWRAX/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: os.scandir bug in Windows? [ In reply to ]
On 19Oct2020 1242, Steve Dower wrote:
> On 15Oct2020 2239, Rob Cliffe via Python-Dev wrote:
>> TLDR: In os.scandir directory entries, atime is always a copy of mtime
>> rather than the actual access time.
>
> Correction - os.stat() updates the access time to _now_, while
> os.scandir() returns the last access time without updating it.

Let me correct myself first :)

*Windows* has decided not to update file access time metadata *in
directory entries* on reads. os.stat() always[1] looks at the file entry
metadata, while os.scandir() always looks at the directory entry metadata.

My suggested approach still applies, other than the bit where we might
fix os.stat(). The best we can do is regress os.scandir() to have
similarly poor performance, but the best *you* can do is use os.stat()
for accurate timings when files might be being modified while your
program is running, and don't do it when you just need names/kinds (and
I'm okay adding that note to the docs).

Cheers,
Steve

[1]: With some fallback to directory entries in exceptional cases that
don't apply here.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/QHHJFYEDBANW7EC3JOUFE7BQRT5ILL4O/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: os.scandir bug in Windows? [ In reply to ]
On 19.10.2020 14:47, Steve Dower wrote:
> On 19Oct2020 1242, Steve Dower wrote:
>> On 15Oct2020 2239, Rob Cliffe via Python-Dev wrote:
>>> TLDR: In os.scandir directory entries, atime is always a copy of mtime rather than the actual access time.
>>
>> Correction - os.stat() updates the access time to _now_, while os.scandir() returns the last access time without updating it.
>
> Let me correct myself first :)
>
> *Windows* has decided not to update file access time metadata *in directory entries* on reads. os.stat() always[1] looks at the file entry
> metadata, while os.scandir() always looks at the directory entry metadata.

Is this behavior documented somewhere?

Such weirdness certaintly something that needs to be documented but I really don't like describing such quirks that are out of our control
and may be subject to change in Python documentation. So we should only consider doing so if there are no other options.


>
> My suggested approach still applies, other than the bit where we might fix os.stat(). The best we can do is regress os.scandir() to have
> similarly poor performance, but the best *you* can do is use os.stat() for accurate timings when files might be being modified while your
> program is running, and don't do it when you just need names/kinds (and I'm okay adding that note to the docs).
>
> Cheers,
> Steve
>
> [1]: With some fallback to directory entries in exceptional cases that don't apply here.
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/QHHJFYEDBANW7EC3JOUFE7BQRT5ILL4O/
> Code of Conduct: http://python.org/psf/codeofconduct/
> --
> Regards,
> Ivan
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/VFXDBURSZ4QKA6EQBZLU6K4FKMGZPSF5/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: os.scandir bug in Windows? [ In reply to ]
On Mon, Oct 19, 2020, at 07:42, Steve Dower wrote:
> On 15Oct2020 2239, Rob Cliffe via Python-Dev wrote:
> > TLDR: In os.scandir directory entries, atime is always a copy of mtime
> > rather than the actual access time.
>
> Correction - os.stat() updates the access time to _now_, while
> os.scandir() returns the last access time without updating it.

This is surprising - do we know why this happens?

Also, it doesn't seem true on my system with python 3.8.5 [.and, yes, I checked that last access update is enabled for my test and updates normally when reading the file's contents].
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/GX3KD4UQKJONCLOZY743WXNGENXL7YG2/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: os.scandir bug in Windows? [ In reply to ]
On Mon, Oct 19, 2020 at 6:28 AM Ivan Pozdeev via Python-Dev <
python-dev@python.org> wrote:

>
> On 19.10.2020 14:47, Steve Dower wrote:
> > On 19Oct2020 1242, Steve Dower wrote:
> >> On 15Oct2020 2239, Rob Cliffe via Python-Dev wrote:
> >>> TLDR: In os.scandir directory entries, atime is always a copy of mtime
> rather than the actual access time.
> >>
> >> Correction - os.stat() updates the access time to _now_, while
> os.scandir() returns the last access time without updating it.
> >
> > Let me correct myself first :)
> >
> > *Windows* has decided not to update file access time metadata *in
> directory entries* on reads. os.stat() always[1] looks at the file entry
> > metadata, while os.scandir() always looks at the directory entry
> metadata.
>
> Is this behavior documented somewhere?
>
> Such weirdness certaintly something that needs to be documented but I
> really don't like describing such quirks that are out of our control
> and may be subject to change in Python documentation. So we should only
> consider doing so if there are no other options.
>

I'm sure this is covered in MSDN. Linking to that if it has it in a
concise explanation would make sense from a note in our docs.

If I'm understanding Steve correctly this is due to Windows/NTFS storing
the access time potentially redundantly in two different places. One within
the directory entry itself and one with the file's own metadata. Those of
us with a traditional posix filesystem background may raise eyeballs at
this duplication, seeing a directory as a place that merely maps names to
inodes with the inode structure (equiv: file entry metadata) being the sole
source of truth. Which ones get updated when and by what actions is up to
the OS.

So yes, just document the "quirk" as an intended OS behavior. This is one
reason scandir() can return additional information on windows vs what it
can return on posix. The entire point of scandir() is to return as much as
possible from the directory without triggering reads of the
inodes/file-entry-metadata. :)

-gps


>
> >
> > My suggested approach still applies, other than the bit where we might
> fix os.stat(). The best we can do is regress os.scandir() to have
> > similarly poor performance, but the best *you* can do is use os.stat()
> for accurate timings when files might be being modified while your
> > program is running, and don't do it when you just need names/kinds (and
> I'm okay adding that note to the docs).
> >
> > Cheers,
> > Steve
> >
> > [1]: With some fallback to directory entries in exceptional cases that
> don't apply here.
> > _______________________________________________
> > Python-Dev mailing list -- python-dev@python.org
> > To unsubscribe send an email to python-dev-leave@python.org
> > https://mail.python.org/mailman3/lists/python-dev.python.org/
> > Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/QHHJFYEDBANW7EC3JOUFE7BQRT5ILL4O/
> > Code of Conduct: http://python.org/psf/codeofconduct/
> > --
> > Regards,
> > Ivan
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/VFXDBURSZ4QKA6EQBZLU6K4FKMGZPSF5/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
Re: os.scandir bug in Windows? [ In reply to ]
On 10/19/20 9:52 AM, Gregory P. Smith wrote:
>
>
> On Mon, Oct 19, 2020 at 6:28 AM Ivan Pozdeev via Python-Dev
> <python-dev@python.org <mailto:python-dev@python.org>> wrote:
>
>
> On 19.10.2020 14:47, Steve Dower wrote:
> > On 19Oct2020 1242, Steve Dower wrote:
> >> On 15Oct2020 2239, Rob Cliffe via Python-Dev wrote:
> >>> TLDR: In os.scandir directory entries, atime is always a copy of
> mtime rather than the actual access time.
> >>
> >> Correction - os.stat() updates the access time to _now_, while
> os.scandir() returns the last access time without updating it.
> >
> > Let me correct myself first :)
> >
> > *Windows* has decided not to update file access time metadata *in
> directory entries* on reads. os.stat() always[1] looks at the file
> entry
> > metadata, while os.scandir() always looks at the directory entry
> metadata.
>
> Is this behavior documented somewhere?
>
> Such weirdness certaintly something that needs to be documented but
> I really don't like describing such quirks that are out of our control
> and may be subject to change in Python documentation. So we should
> only consider doing so if there are no other options.
>
>
> I'm sure this is covered in MSDN.  Linking to that if it has it in a
> concise explanation would make sense from a note in our docs.
>
> If I'm understanding Steve correctly this is due to Windows/NTFS storing
> the access time potentially redundantly in two different places. One
> within the directory entry itself and one with the file's own metadata. 
> Those of us with a traditional posix filesystem background may raise
> eyeballs at this duplication, seeing a directory as a place that merely
> maps names to inodes with the inode structure (equiv: file entry
> metadata) being the sole source of truth.  Which ones get updated when
> and by what actions is up to the OS.
>
> So yes, just document the "quirk" as an intended OS behavior.  This is
> one reason scandir() can return additional information on windows vs
> what it can return on posix.  The entire point of scandir() is to return
> as much as possible from the directory without triggering reads of the
> inodes/file-entry-metadata. :)
>
> -gps

depending on atimes isn't a consistently reliable mechanism anyway,
since filesystems on Linux et. al. are allowed to be mounted so as to
not independently update access times.

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/QXNHYK6NDECISIOZVO4BCW2O6UXRZJGO/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: os.scandir bug in Windows? [ In reply to ]
On 19Oct2020 1652, Gregory P. Smith wrote:
> I'm sure this is covered in MSDN.  Linking to that if it has it in a
> concise explanation would make sense from a note in our docs.

Probably unlikely :) I'm pretty sure this started "perfect" and was then
wound back to improve performance. But it's almost certainly an option
somewhere, which means you can't rely on it being either true nor false.
You just have to be explicit for certain pieces of information.

> If I'm understanding Steve correctly this is due to Windows/NTFS storing
> the access time potentially redundantly in two different places. One
> within the directory entry itself and one with the file's own metadata.
> Those of us with a traditional posix filesystem background may raise
> eyeballs at this duplication, seeing a directory as a place that merely
> maps names to inodes with the inode structure (equiv: file entry
> metadata) being the sole source of truth.  Which ones get updated when
> and by what actions is up to the OS.
>
> So yes, just document the "quirk" as an intended OS behavior.  This is
> one reason scandir() can return additional information on windows vs
> what it can return on posix.  The entire point of scandir() is to return
> as much as possible from the directory without triggering reads of the
> inodes/file-entry-metadata. :)

Yeah, I'd document it as a quirk of scandir. There's also a race where
if you scandir(), then someone touches the file, then you look at the
cached stat you get the wrong information too (an any platform). Making
clearer that it's for non-time sensitive queries is most accurate,
though we could also give an example of "access times may not be up to
date depending on OS-level caching" without committing us to being
responsible for OS decisions.

Cheers,
Steve
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/EBWUDEQEPRWJN36FLUUJQWP5EWLPWRPD/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: os.scandir bug in Windows? [ In reply to ]
On 10/19/20, Steve Dower <steve.dower@python.org> wrote:
> On 15Oct2020 2239, Rob Cliffe via Python-Dev wrote:
>> TLDR: In os.scandir directory entries, atime is always a copy of mtime
>> rather than the actual access time.
>
> Correction - os.stat() updates the access time to _now_, while
> os.scandir() returns the last access time without updating it.

os.stat() shouldn't affect st_atime because it doesn't access the file
data. That has me curious if it can be reproduced.

With NTFS in Windows 10, I'd expect the os.stat() st_atime to change
immediately when the file data is read or modified. With other
filesystems, it may not be updated until the kernel file object that
was used to access the file's data is closed.

Note that updating the access time in NTFS can be disabled by the
"NtfsDisableLastAccessUpdate" value in
"HKLM\System\CurrentControlSet\Control\FileSystem". The default value
in Windows 10 should be 0x80000002, which means the value is system
managed and updating the access time is enabled.

If it's only the access time that changes, the directory entry may be
updated with a significant granularity such as hourly or daily. For
NTFS, it's hourly. To confirm this, wait an hour from the current
access time in the directory entry; open the file; read some data; and
close the file. The access time in the directory entry should be
updated.

For details, download the [MS-FSA] PDF [1] and look for all references
to the following sections:

* 2.1.4.17 Algorithm for Noting That a File Has Been Modified
* 2.1.4.19 Algorithm for Noting That a File Has Been Accessed
* 2.1.4.18 Algorithm for Updating Duplicated Information

Also check the tables in Appendix A, which provide the update
granularity of file time stamps (presumably for directory entries) for
common Windows filesystems.

[1] https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-fsa/860b1516-c452-47b4-bdbc-625d344e2041

Going back to my initial message, I can't stress enough that this
problem is at its worst when a file has multiple hardlinks. If a
particular link in a directory wasn't the last link used to access the
file, its duplicated metadata may have the wrong file size, access
time, modify time, and change time (the latter is not reported by
Python). As is, for the current implementation, I'd only rely on the
basic attributes such as whether it's a directory or reparse point
(symlink, mountpoint, etc) when using scandir() to quickly process a
directory. For reliable stat information, call os.stat().

I do think, however, that os.scandir() can be improved in Windows
without significant performance loss if it calls GetFileAttributesExW
to get st_file_attributes, st_size, st_ctime (create time), st_mtime,
and st_atime. This API call is relatively fast because it doesn't
require opening the file via CreateFileW, which is one of the more
expensive operations in os.stat(). But I haven't tried modifying
scandir() to benchmark it.

Ultimately, I'm waiting for Windows 10 to provide a WinAPI function
that calls the relatively new NTAPI function NtQueryInformationByName
[2] (by name, not by handle!) to get the FileStatInformation, as well
as for this information to be made available by handle via
GetFileInformationByHandleEx. Compared to GetFileAttributesExW, the
FileStatInformation additionally provides the file ID (if implemented
by the filesystem), change time, reparse tag, number of links, and the
effective access of the security context of the caller (i.e. process
or thread access token). The latter is something that we've never
impemented with os.stat(). It's not the same as POSIX
owner-group-other permissions. It would need a new attribute such as
st_effective_access. It could be used to provide a real implementation
of os.access() in Windows.

https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-ntqueryinformationbyname
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/NPP6GKAEI7SOVA45WTJ222YVEALTF6WO/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: os.scandir bug in Windows? [ In reply to ]
On 19Oct2020 1846, Eryk Sun wrote:
> On 10/19/20, Steve Dower <steve.dower@python.org> wrote:
>> On 15Oct2020 2239, Rob Cliffe via Python-Dev wrote:
>>> TLDR: In os.scandir directory entries, atime is always a copy of mtime
>>> rather than the actual access time.
>>
>> Correction - os.stat() updates the access time to _now_, while
>> os.scandir() returns the last access time without updating it.
>
> os.stat() shouldn't affect st_atime because it doesn't access the file
> data. That has me curious if it can be reproduced.
>
> With NTFS in Windows 10, I'd expect the os.stat() st_atime to change
> immediately when the file data is read or modified. With other
> filesystems, it may not be updated until the kernel file object that
> was used to access the file's data is closed.

I thought I got my self-correction fired off quickly enough to save you
from writing this :)

> For details, download the [MS-FSA] PDF [1] and look for all references
> to the following sections:

> [1] https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-fsa/860b1516-c452-47b4-bdbc-625d344e2041

Thanks for the detailed reference.

> Going back to my initial message, I can't stress enough that this
> problem is at its worst when a file has multiple hardlinks. If a
> particular link in a directory wasn't the last link used to access the
> file, its duplicated metadata may have the wrong file size, access
> time, modify time, and change time (the latter is not reported by
> Python). As is, for the current implementation, I'd only rely on the
> basic attributes such as whether it's a directory or reparse point
> (symlink, mountpoint, etc) when using scandir() to quickly process a
> directory. For reliable stat information, call os.stat().
>
> I do think, however, that os.scandir() can be improved in Windows
> without significant performance loss if it calls GetFileAttributesExW
> to get st_file_attributes, st_size, st_ctime (create time), st_mtime,
> and st_atime. This API call is relatively fast because it doesn't
> require opening the file via CreateFileW, which is one of the more
> expensive operations in os.stat(). But I haven't tried modifying
> scandir() to benchmark it.

Resolving the path is the most expensive part, even if the file is not
opened (I've been working with the NTFS team on this area, and we've
been benchmarking/analysing all of it). There are a few improvements
coming across the board, but I'd much rather just emphasise that
os.scandir() is as fast as we can manage using cached information
(including as cached by the OS). Otherwise we prevent people from using
the fastest available option when they can, if they don't need the
additional information/accuracy.

Cheers,
Steve
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/MMRMLWGEV2ZGIACXQTSEQC6TPWGL3UZ3/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: os.scandir bug in Windows? [ In reply to ]
On 10/19/20, Steve Dower <steve.dower@python.org> wrote:
> On 19Oct2020 1242, Steve Dower wrote:
>> On 15Oct2020 2239, Rob Cliffe via Python-Dev wrote:
>>> TLDR: In os.scandir directory entries, atime is always a copy of mtime
>>> rather than the actual access time.
>>
>> Correction - os.stat() updates the access time to _now_, while
>> os.scandir() returns the last access time without updating it.
>
> Let me correct myself first :)
>
> *Windows* has decided not to update file access time metadata *in
> directory entries* on reads. os.stat() always[1] looks at the file entry
> metadata, while os.scandir() always looks at the directory entry metadata.
>
> My suggested approach still applies, other than the bit where we might
> fix os.stat(). The best we can do is regress os.scandir() to have
> similarly poor performance, but the best *you* can do is use os.stat()
> for accurate timings when files might be being modified while your
> program is running, and don't do it when you just need names/kinds (and
> I'm okay adding that note to the docs).

If this is the correction to which you're referring in the previous
message, I assumed you stood by the claim that os.stat() may update
st_atime. That shouldn't be the case, so there shouldn't be anything
that needs to be fixed there, unless I'm missing what you think needs
to be fixed. If it's actually a problem, then I'd really, really like
a test case that reproduces it. If it was just a misinterpreted test
case or mis-remembered fact, then that's good news for me. ;-)

Regarding updating the access time in the directory entry, in my
previous reply I explained that NTFS should update it with a one-hour
granularity. With FAT, it's daily.

Regarding the view that this is only about "accurate timings when
files might be being modified while your program is running", in my
previous messages I stressed that the directory entry for a hard link
may have the wrong size, change time, write time, and access time if
it wasn't the last link used to update the file. That has nothing to
do with the file being modified while the program is running. It's a
stale directory entry. If you call os.stat() on the stale link, NTFS
will update it with the correct metadata.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/SUGIZ6OAXOD37USVBWAW7JRSUDBSMG7Q/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: os.scandir bug in Windows? [ In reply to ]
On 10/19/20, Steve Dower <steve.dower@python.org> wrote:
>
> Resolving the path is the most expensive part, even if the file is not
> opened (I've been working with the NTFS team on this area, and we've
> been benchmarking/analysing all of it).

If you say it's been extensively benchmarked and there's no direct way
around the speed bottleneck, then I take your word for it. To clarify
what I had in mind, I was hoping that because NTFS implements the fast
I/O function FastIoQueryOpen [1] (via NtfsNetworkOpenCreate, as given
by its FastIoDispatch table) that IRP_MJ_CREATE would be bypassed and
that the filesystem would not incur a significant cost to parse the
remaining path. I figured that most of the work would be in the
ObObjectObjectByName and IopParseDevice executive calls that lead up
to querying the filesystem.

Anyway, it's unfortunate that the Windows API doesn't support NT
handle-relative names, except in the registry API. If we could call
NTAPI NtQueryAttributesFile [2] directly, then the ObjectAttributes
argument could be relative to a directory handle set in the
RootDirectory field. That would eliminate the vast majority of the
path-resolution cost. A handle-relative open or query goes straight to
the filesystem device, which goes straight to the directory that
contains the file.

To eliminate the cost of opening the directory handle, scandir() could
be rewritten to use CreateFileW and GetFileInformationByHandleEx:
FileIdBothDirectoryInfo [3] instead of FindFirstFileW / FindNextFileW.
Just cache the directory handle in place of caching the find handle.
scandir() would gain fd support in Windows. Opening a directory via
os.open requires the flag _O_OBTAIN_DIR (0x2000), defined in fcntl.h.

FileIdBothDirectoryInfo provides the file ID, so the implementation
would support the inode() method without calling stat(). It would
still directly support is_dir() and is_file() based on the file
attributes, and is_symlink() based on the file attributes and the
EaSize field. The Windows Protocols document that the latter contains
the reparse tag for a reparse point. The field is reused because a
reparse point can't have extended attributes.

All that said, I don't prefer to call NtQueryAttributesFile or any
other NTAPI function in Windows Python. I'd rather do the best
possible with just the Windows API. I wish there were a new
GetFileAttributesExExW function that supported handle-relative names.
Even better would be a new function that calls
NtQueryInformationByName -- something like GetFileInformationByName --
for FileStatInfo (and FileCaseSensitiveInfo as well, which is becoming
more of an issue), also with support for handle-relative names.

[1] https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/ns-wdm-_fast_io_dispatch
[2] https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nf-wdm-zwqueryfullattributesfile
[3] https://docs.microsoft.com/en-us/windows/win32/api/winbase/ns-winbase-file_id_both_dir_info
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/GODUIB5WKVZLX4BVPEM2NS37JFHUXIID/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: os.scandir bug in Windows? [ In reply to ]
On 20/10/20 4:52 am, Gregory P. Smith wrote:
> Those of us with a traditional posix filesystem background may raise
> eyeballs at this duplication, seeing a directory as a place that merely
> maps names to inodes

This is probably a holdover from MS-DOS, where there was no separate
inode-like structure -- it was all in the directory entry.

--
Greg
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/QJVZ2EXFKCMZ4YHERFI2FXJTWWPFCFSA/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: os.scandir bug in Windows? [ In reply to ]
On 19/10/2020 12:42, Steve Dower wrote:
> On 15Oct2020 2239, Rob Cliffe via Python-Dev wrote:
>> TLDR: In os.scandir directory entries, atime is always a copy of
>> mtime rather than the actual access time.
>
> Correction - os.stat() updates the access time to _now_, while
> os.scandir() returns the last access time without updating it.
>
> Eryk replied with a deeper explanation of the cause, but fundamentally
> this is what you are seeing.
>
> Feel free to file a bug, but we'll likely only add a vague note to the
> docs about how Windows works here rather than changing anything. If
> anything, we should probably fix os.stat() to avoid updating the
> access time so that both functions behave the same, but that might be
> too complicated.
>
> Cheers,
> Steve
Sorry - what you say does not match the behaviour I observe, which is that
    (1) Neither os.stat, nor reading os.scandir directory entries,
update any of the times on disk.
    (2) os.stat.st_atime returns the "correct" time the file was last
accessed.
    (3) os.scandir always returns st.atime equal to st.mtime.

Modified demo program:

# osscandirtest.py
import time, os

print(f'[1] {time.time()=}')
with open('Test', 'w') as f: f.write('Anything\n')

time.sleep(20)

print(f'[2] {time.time()=}')
with open('Test', 'r') as f: f.readline() # Read the file

time.sleep(10)

print(f'[3] {time.time()=}')
print(os.stat('Test'))
for DirEntry in os.scandir('.'):
    if DirEntry.name == 'Test':
        stat = DirEntry.stat()
        print(f'scandir DirEntry {stat.st_ctime=} {stat.st_mtime=}
{stat.st_atime=}')
print(os.stat('Test'))
for DirEntry in os.scandir('.'):
    if DirEntry.name == 'Test':
        stat = DirEntry.stat()
        print(f'scandir DirEntry {stat.st_ctime=} {stat.st_mtime=}
{stat.st_atime=}')
print(f'[4] {time.time()=}')

Sample output:

[1] time.time()=1603166161.12121
[2] time.time()=1603166181.1306772
[3] time.time()=1603166191.1426473
os.stat_result(st_mode=33206, st_ino=9851624184951253,
st_dev=2230120362, st_nlink=1, st_uid=0, st_gid=0, st_size=10,
st_atime=1603166181, st_mtime=1603166161, st_ctime=1603166161)
scandir DirEntry stat.st_ctime=1603166161.12121
stat.st_mtime=1603166161.12121 stat.st_atime=1603166161.12121
os.stat_result(st_mode=33206, st_ino=9851624184951253,
st_dev=2230120362, st_nlink=1, st_uid=0, st_gid=0, st_size=10,
st_atime=1603166181, st_mtime=1603166161, st_ctime=1603166161)
scandir DirEntry stat.st_ctime=1603166161.12121
stat.st_mtime=1603166161.12121 stat.st_atime=1603166161.12121
[4] time.time()=1603166191.1426473

You will observe that
    (1) The results from the two os.stat calls are the same, as are the
results from the two scandir calls.
    (2) The os.stat.st_atime (1603166181) *IS* the time that the file
was read with the
            with open('Test', 'r') as f: f.readline() # Read the file
        line of code, as it matches the
            [2] time.time()=1603166181.1306772
        line of output (apart from discarded fractions of a second) and
is 20 seconds (*not* 30 seconds) after the file creation time, as expected.
    (3) The os.scandir atime is a copy of mtime (and in this case, of
ctime as well).

So it really does seem that the only thing "wrong" is that os.scandir
returns atime as a copy of mtime, rather than the correct value.
And since os.stat returns the "right" answer and os.scandir doesn't, it
really seems that this is a bug, or at least a deficiency, in os.scandir.

Demo run on Windows 10 Home version 1903 OS build 18362.1139
Python version 3.8.3 (32-bit).
Best wishes
Rob Cliffe
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/MGICSKCSTSKS36XUP6IZTXZOSGBPMQYY/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: os.scandir bug in Windows? [ In reply to ]
On 20Oct2020 0520, Rob Cliffe wrote:
> On 19/10/2020 12:42, Steve Dower wrote:
>> On 15Oct2020 2239, Rob Cliffe via Python-Dev wrote:
>>> TLDR: In os.scandir directory entries, atime is always a copy of
>>> mtime rather than the actual access time.
>>
>> Correction - os.stat() updates the access time to _now_, while
>> os.scandir() returns the last access time without updating it.
>>
>> Eryk replied with a deeper explanation of the cause, but fundamentally
>> this is what you are seeing.
>>
>> Feel free to file a bug, but we'll likely only add a vague note to the
>> docs about how Windows works here rather than changing anything. If
>> anything, we should probably fix os.stat() to avoid updating the
>> access time so that both functions behave the same, but that might be
>> too complicated.
>>
>> Cheers,
>> Steve
> Sorry - what you say does not match the behaviour I observe, which is that

Yes, I posted a correction already (immediately after sending the first
email).

What you are seeing is what Windows decided was the best approach. If
you want to avoid that, os.stat() will get the latest available
information. But I don't want to penalise people who don't need it by
slowing down their scandir calls unnecessarily.

A documentation patch to make this difference between os.stat() and
DirEntry even clearer would be fine.

Cheers,
Steve
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/NAR7LTW2XMBKAPKLVBQQFVK6EA4ZWQZP/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: os.scandir bug in Windows? [ In reply to ]
On 10/19/20, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
> On 20/10/20 4:52 am, Gregory P. Smith wrote:
>> Those of us with a traditional posix filesystem background may raise
>> eyeballs at this duplication, seeing a directory as a place that merely
>> maps names to inodes
>
> This is probably a holdover from MS-DOS, where there was no separate
> inode-like structure -- it was all in the directory entry.

DOS implemented a find-first/find-next API (int 21h 4E/4F) that
provided a file's name, attributes, size, and last write time/date. I
think it's clear that the design was influenced by the
readily-available contents of a FAT dirent. The Win32 API extended
this to FindFirstFile/FindNextFile, with added support for the long
filename, create and access times, and, in NT 5+, the reparse tag for
a reparse point.

NTFS had to support this metadata in the directory index, else
FindFirstFile/FindNextFile would be too expensive if the filesystem
had to fetch the metadata from the MFT for every matching file in a
listing. It tries to keep the duplicated metadata in sync -- such as
when a file is open, closed, manually extended in size, when the cache
is flushed, or when metadata is explicitly set (e.g.
SetFileInformationByHandle: FileBasicInfo). But for performance it
doesn't update the duplicated data every time a file is read from or
written to. And, in particular, if it's just the access time that
changed, it updates the duplicated access time with a one-hour
granularity. (There's also a registry value, as I mentioned
previously, that disables updating access times completely -- in both
the MFT record and the directory index.)

That said, if a file has multiple hardlinks the current NTFS
implementation for updating duplicated data is totally unreliable. It
only updates the accessed link. All other links go stale. We don't
have any reasonable way to special case this situation because the
directory entry doesn't include the number of links a file has. It has
to be opened and queried directly, but then one might as well do a
full stat() for every file.

I recommend relying on only the high-level is_dir(), is_file(), and
is_symlink() methods of os.scandir() items, to quickly process a
directory. inode() is reliable -- as much as is possible in Windows --
because the implementation gets the full stat info, but check to
ensure it's not 0. It's based on the file ID, which Windows
filesystems aren't required to support (or reliably support; it's not
stable in FAT). NTFS and ReFS support reliable 64-bit file IDs, and
opening by file ID.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/JKK47AWKUOWPPBEAIRGIFRMW6FCPZILG/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: os.scandir bug in Windows? [ In reply to ]
On Tue, Oct 20, 2020, at 07:42, Steve Dower wrote:
> On 20Oct2020 0520, Rob Cliffe wrote:
> > On 19/10/2020 12:42, Steve Dower wrote:
> >> On 15Oct2020 2239, Rob Cliffe via Python-Dev wrote:
> >>> TLDR: In os.scandir directory entries, atime is always a copy of
> >>> mtime rather than the actual access time.
> >>
> >> Correction - os.stat() updates the access time to _now_, while
> >> os.scandir() returns the last access time without updating it.
> >>
> >> Eryk replied with a deeper explanation of the cause, but fundamentally
> >> this is what you are seeing.
> >>
> >> Feel free to file a bug, but we'll likely only add a vague note to the
> >> docs about how Windows works here rather than changing anything. If
> >> anything, we should probably fix os.stat() to avoid updating the
> >> access time so that both functions behave the same, but that might be
> >> too complicated.
> >>
> >> Cheers,
> >> Steve
> > Sorry - what you say does not match the behaviour I observe, which is that
>
> Yes, I posted a correction already (immediately after sending the first
> email).

ok, see, the correction you posted doesn't address the part of your claim that people are taking issue with, which is that *calling os.stat() causes the atime to be set to the time of the call to os.stat()*. This is not the same thing as [correctly] saying that "calling os.stat() may return a more up-to-date atime, the time of the last read, write, or other operation", and the phrasing "updates the access time to _now_" certainly *seemed* unambiguous.

And at this point it's not clear to me whether you understand that people are reading your claim this way.

What correction, exactly, do you mean? The post I saw with the word "Correction" on it is the one that *makes* the claim people are taking issue with.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/O63FMQYOHASHZ33CWBYQMD3H3XYGT5QC/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: os.scandir bug in Windows? [ In reply to ]
On Fri, Oct 23, 2020, at 02:14, Random832 wrote:
> What correction, exactly, do you mean? The post I saw with the word
> "Correction" on it is the one that *makes* the claim people are taking
> issue with.

okay, sorry, I see the other correction post now...

My issue I guess was the same as Eryk Sun, it wasn't clear which parts of the previous post you were correcting and which (if any) you stood by, since they were about the behavior of different parts of the system, so it didn't register as a correction to that part when I originally read it.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/U4MZFDDMM4L52DKA6NBB7MKRJJ7QWEOB/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: os.scandir bug in Windows? [ In reply to ]
Le lun. 19 oct. 2020 à 13:50, Steve Dower <steve.dower@python.org> a écrit :
> Feel free to file a bug, but we'll likely only add a vague note to the
> docs about how Windows works here rather than changing anything.

I agree that this surprising behavior can be documented. Attempting to
provide accurate access time in os.scandir() is likely to slow-down
the function which would defeat its whole purpose.

--

By the way, who relies on the access time? I don't understand why the
creation and modification times are not enough for all usages. I would
rather want to kill the whole concept of "access" time in operating
systems (or just configure the OS to not update it anymore). I guess
that it's really hard to make it efficient and accurate at the same
time...

Linux has a "relatime" mount option (Fedora enables it by default):
"With this option enabled, atime data is written to the disk only if
the file has been modified since the atime data was last updated
(mtime), or if the file was last accessed more than a certain amount
of time ago (by default, one day)." Minor enhancement over always
updating atime.

Victor
--
Night gathers, and now my watch begins. It shall not end until my death.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/VKL5VXI6R4BNN36RX2FJ5G4YEHS372UV/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: os.scandir bug in Windows? [ In reply to ]
On 10/26/20, Victor Stinner <vstinner@python.org> wrote:
> Le lun. 19 oct. 2020 à 13:50, Steve Dower <steve.dower@python.org> a écrit
> :
>> Feel free to file a bug, but we'll likely only add a vague note to the
>> docs about how Windows works here rather than changing anything.
>
> I agree that this surprising behavior can be documented. Attempting to
> provide accurate access time in os.scandir() is likely to slow-down
> the function which would defeat its whole purpose.

I don't think the access time (st_atime) is a significant concern. I'm
concerned with the reliability of the file size (st_size) and
last-write time (st_mtime) in stat() results. Developers are used to
various filesystem policies on various platforms that limit when the
access time gets updated, if at all. FAT32 filesystems only have an
access date, and the driver in Windows fixes the access time at
midnight. Updating the access time in NTFS and ReFS can be completely
disabled at the system level; otherwise it's updated with a
granularity of one hour if it's only the access time that would be
updated.

The biggest concern for me is NTFS hardlinks, for which the st_size
and st_mtime in the directory entry is unreliable. When a file with
multiple hardlinks is modified, the filesystem only updates the
duplicated information in the directory entry of the opened link.
Because the entry in the directory doesn't include the link count or
even a boolean value to indicate that a file has multiple hardlinks,
if you don't know whether or not there's a possibility of hardlinks,
then os.stat() is required in order to reliably determine st_size and
st_mtime, to the extent that reliably knowing st_mtime is possible.

A general problem that affects even os.stat() is that a modified file
may only be noted by setting a flag (FO_FILE_MODIFIED) in the kernel
file object of the particular open. Whether it's immediately noted in
the last-write time of the shared FCB (file control block) is up to
filesystem policy.

Starting with Windows 10 1809 (as noted in [MS-FSA]), NTFS immediately
notes the modification time, so the st_mtime value from os.stat() is
current. In prior versions of NTFS, and with other Microsoft
filesystems such as FAT32, the last-write time is only noted when the
file is flushed to disk via FlushFileBuffers (i.e. os.fsync) or when
the open is closed.

This means that st_size may change without also changing st_mtime. I'm
using Windows 10 2004 currently, so I can't show an NTFS example, but
the following shows the behavior with FAT32:

f = open('spam.txt', 'w')
st1 = os.stat('spam.txt')
time.sleep(10)
f.write('spam')
f.flush()
st2 = os.stat('spam.txt')

The above write was noted only by setting the FO_FILE_MODIFIED flag on
the kernel file object. (The file object can be inspected with a local
kernel debugger.) The write time wasn't noted in the FCB, i.e.
st_mtime hasn't changed in st2:

>>> st2.st_size - st1.st_size
4
>>> st2.st_mtime - st1.st_mtime
0.0

The last-write time is noted when FlushFileBuffers (os.fsync) is
called on the open:

>>> os.fsync(f.fileno())
>>> st3 = os.stat('spam.txt')
>>> st3.st_mtime - st1.st_mtime
10.0

Note also that, with NTFS, to the extent that the FCB metadata is
current, calling os.stat() on a link updates the duplicated
information in the directory entry. So calling os.stat() on a NTFS
file may update the entry that's returned by a subsequent os.scandir()
call.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/LEBCSKGSL7PMAFH6AQR5LFL7UJ4T5774/
Code of Conduct: http://python.org/psf/codeofconduct/

1 2  View All