Mailing List Archive: xor operator

Re: xor operator (DEPRECATED) [ In reply to ]

Nov 13, 2023, 7:11 PM

Post #26 of 34 (246 views)

Fair point. However, I gave it a shot for the following reason:

I couldn’t find a way to make such performant function. Using python builtin components still ends up several times slower than builtin `all`. Cython or numba or similar is not an option as they do not support `truth` values. Or if they do, it ends up slower than pure python variant.

So all what is left is writing a proper extension. Which I would prefer not to do for 1 function. I thought maybe `xor`, as in logical XOR functionality in its vanilla case could be compelling. And after doing a bit of search I see that very very few languages have that and it seems for a good reason.

Some that do: R, thats all I could find. Although some (if not many) went through the proposal phase. And yes, none of them have a function that I am proposing.

So yes, you are right, not a good proposal.

But there still seems to be the need for short-circuiting performant implementations in python space. The issue is that there are many variants of what might be needed while there is no efficient solution to sourcing predicates from python to lower level implementations. Someone mentioned that numpy experimented with such implementations in C, but they did not get anywhere with it.

The best I could come up with is cached numba for numpy problems, which does perform very well and more than worth it if function is re-used. It even ends up faster than cython or cffi extensions, however can’t have many of those due to JIT and AOT is currently being deprecated (which wouldn’t solve anything anyway). However, as I mentioned earlier it does not apply to this case.

So it’s either:
a) Something very clever and flexible implemented that covers most of such needs and doesn’t require predicates.
b) I welcome any thoughts on this.

DG

> On 14 Nov 2023, at 04:27, AVI GROSS via Python-list <python-list@python.org> wrote:
>
> I was going to ask a dumb question. Has any other language you know of made
> something available that does what is being asked for and included it in the
> main program environment rather than an add-on?
>
> A secondary mention here has been whether short-circuiting functions like
> "any" and "all" have been augmented with something like "has_n" that
> evaluates arguments till it has found n or perhaps n+1 of what it wants then
> skips the rest. Does any language supply something like that? What would
> such a function return and does it have an "any" or an "all" side?
>
> It sounds like if I asked if a list of integers has at least n prime numbers
> in "any" mode, it should ignore any that are not primes till it finds n
> primes or fails and returns true or false. If in "all" mode, I assume it
> would have to be the first n items without a failure.
>
> Fine, but then someone may want to know WHERE you stopped or for you to
> return the sublist of the ones that made the match, or even return
> everything that was skipped so you can later process that. Consider a long
> list of jurors you process to place a dozen that qualify on a jury and then
> later you want to choose from among the rest for another jury.
>
> Human minds can come up with an amazing number of ideas including for
> "useful" functions or features but I find the vast majority would rarely be
> used as nobody remembers it is available and some fairly simple method using
> other functions can easily be cobbled together.
>
> -----Original Message-----
> From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On
> Behalf Of Grant Edwards via Python-list
> Sent: Monday, November 13, 2023 8:19 PM
> To: python-list@python.org
> Subject: Re: xor operator
>
> On 2023-11-14, Dom Grigonis via Python-list <python-list@python.org> wrote:
>>
>>> Except the 'any' and 'all' builtins are _exactly_ the same as bitwise
>>> or and and applided to many bits. To do something "in line" with that
>>> using the 'xor' operator would return True for an odd number of True
>>> values and False for an even Number of True values.
>>
>> Fair point.
>>
>> Have you ever encountered the need for xor for many bits (the one
>> that I am NOT referring to)? Would be interested in what sort of
>> case it could be useful.
>
> Yes, it's used all the time in low-level communications protocols,
> where it's often implemented in hardware. But, it is also not at all
> unusual to implement it in software.
>
> It's also not that unusual for the "count-ones" part of the function
> you're asking for to be implemented in hardware by a CPU having an
> instruction that counts the number of 1 bits in a register.
>
> GCC has a low-level builtins called __builtin_popcount() and
> __builtin-popcountl() that counts the number of 1's in an unsigned
> (long) int.
>
>
> --
> https://mail.python.org/mailman/listinfo/python-list
>
> --
> https://mail.python.org/mailman/listinfo/python-list

--
https://mail.python.org/mailman/listinfo/python-list

RE: xor operator (DEPRECATED) [ In reply to ]

python-list at python

Nov 13, 2023, 8:44 PM

Post #27 of 34 (245 views)

Permalink

Dom,

I hear you.

As you say, writing your own extension in something like C++ may not appeal to you even if it is faster.

I was wondering if using a generator or something similar in R might make sense.

I mean what happens if you write a function that includes a "yield" or two and does a part of what you want. It maintains some internal state between invocations. So you can call it once to setup things then call it repeatedly to keep processing the next item. You stop calling it when you get a result you want, such as that it has seen what you want N times.

Since the code stays in memory, it may effectively run faster than some other kinds of functions calls. It can keep things in internal storage such as not just how many N you want but how many it has seen.

Your outer function can maintain a list of the items you want to XOR or generate a new one dynamically as needed. It can use functional programming techniques to create a new customized version of the iterator, such as with a value of N built in. You would then call the outer function and let it use the inner function till the result is available or until the data in the iterator runs out or perhaps other tweaks involving two way communication of sorts between the functions.

I am NOT suggesting this approach is optimal or fast but merely wondering if something along these lines is worth trying that might speed things up even if not very fast. Such approaches can be even more effective if what you are working on need not all be instantiated up front but can be dynamically calculated or incrementally read from files. With care, you can make multiple instantiations that each iterate over their own sets of data without interference.

Just a thought. In a sense, this can be a slightly decent substitute for the non-standard evaluation in R where you can arrange for lots of your data to not be interpreted till absolutely needed.

-----Original Message-----
From: Dom Grigonis <dom.grigonis@gmail.com>
Sent: Monday, November 13, 2023 10:12 PM
To: avi.e.gross@gmail.com
Cc: Grant Edwards <grant.b.edwards@gmail.com>; Python <python-list@python.org>
Subject: Re: xor operator (DEPRECATED)

Fair point. However, I gave it a shot for the following reason:

I couldn’t find a way to make such performant function. Using python builtin components still ends up several times slower than builtin `all`. Cython or numba or similar is not an option as they do not support `truth` values. Or if they do, it ends up slower than pure python variant.

So all what is left is writing a proper extension. Which I would prefer not to do for 1 function. I thought maybe `xor`, as in logical XOR functionality in its vanilla case could be compelling. And after doing a bit of search I see that very very few languages have that and it seems for a good reason.

Some that do: R, thats all I could find. Although some (if not many) went through the proposal phase. And yes, none of them have a function that I am proposing.

So yes, you are right, not a good proposal.

But there still seems to be the need for short-circuiting performant implementations in python space. The issue is that there are many variants of what might be needed while there is no efficient solution to sourcing predicates from python to lower level implementations. Someone mentioned that numpy experimented with such implementations in C, but they did not get anywhere with it.

The best I could come up with is cached numba for numpy problems, which does perform very well and more than worth it if function is re-used. It even ends up faster than cython or cffi extensions, however can’t have many of those due to JIT and AOT is currently being deprecated (which wouldn’t solve anything anyway). However, as I mentioned earlier it does not apply to this case.

So it’s either:
a) Something very clever and flexible implemented that covers most of such needs and doesn’t require predicates.
b) I welcome any thoughts on this.

DG

> On 14 Nov 2023, at 04:27, AVI GROSS via Python-list <python-list@python.org> wrote:
>
> I was going to ask a dumb question. Has any other language you know of made
> something available that does what is being asked for and included it in the
> main program environment rather than an add-on?
>
> A secondary mention here has been whether short-circuiting functions like
> "any" and "all" have been augmented with something like "has_n" that
> evaluates arguments till it has found n or perhaps n+1 of what it wants then
> skips the rest. Does any language supply something like that? What would
> such a function return and does it have an "any" or an "all" side?
>
> It sounds like if I asked if a list of integers has at least n prime numbers
> in "any" mode, it should ignore any that are not primes till it finds n
> primes or fails and returns true or false. If in "all" mode, I assume it
> would have to be the first n items without a failure.
>
> Fine, but then someone may want to know WHERE you stopped or for you to
> return the sublist of the ones that made the match, or even return
> everything that was skipped so you can later process that. Consider a long
> list of jurors you process to place a dozen that qualify on a jury and then
> later you want to choose from among the rest for another jury.
>
> Human minds can come up with an amazing number of ideas including for
> "useful" functions or features but I find the vast majority would rarely be
> used as nobody remembers it is available and some fairly simple method using
> other functions can easily be cobbled together.
>
> -----Original Message-----
> From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On
> Behalf Of Grant Edwards via Python-list
> Sent: Monday, November 13, 2023 8:19 PM
> To: python-list@python.org
> Subject: Re: xor operator
>
> On 2023-11-14, Dom Grigonis via Python-list <python-list@python.org> wrote:
>>
>>> Except the 'any' and 'all' builtins are _exactly_ the same as bitwise
>>> or and and applided to many bits. To do something "in line" with that
>>> using the 'xor' operator would return True for an odd number of True
>>> values and False for an even Number of True values.
>>
>> Fair point.
>>
>> Have you ever encountered the need for xor for many bits (the one
>> that I am NOT referring to)? Would be interested in what sort of
>> case it could be useful.
>
> Yes, it's used all the time in low-level communications protocols,
> where it's often implemented in hardware. But, it is also not at all
> unusual to implement it in software.
>
> It's also not that unusual for the "count-ones" part of the function
> you're asking for to be implemented in hardware by a CPU having an
> instruction that counts the number of 1 bits in a register.
>
> GCC has a low-level builtins called __builtin_popcount() and
> __builtin-popcountl() that counts the number of 1's in an unsigned
> (long) int.
>
>
> --
> https://mail.python.org/mailman/listinfo/python-list
>
> --
> https://mail.python.org/mailman/listinfo/python-list

--
https://mail.python.org/mailman/listinfo/python-list

Re: xor operator (DEPRECATED) [ In reply to ]

python-list at python

Nov 13, 2023, 9:23 PM

Post #28 of 34 (245 views)

Permalink

On 11/13/2023 11:44 PM, AVI GROSS via Python-list wrote:
> Dom,
>
> I hear you.
>
> As you say, writing your own extension in something like C++ may not appeal to you even if it is faster.
>
> I was wondering if using a generator or something similar in R might make sense.
>
> I mean what happens if you write a function that includes a "yield" or two and does a part of what you want. It maintains some internal state between invocations. So you can call it once to setup things then call it repeatedly to keep processing the next item. You stop calling it when you get a result you want, such as that it has seen what you want N times.
>
> Since the code stays in memory, it may effectively run faster than some other kinds of functions calls. It can keep things in internal storage such as not just how many N you want but how many it has seen.

I'm inclined to just turn the iterable into a set to get the values,
then iterate through those values calling count() on a listified version
of the iterable. If the count >= target, return.

It may not be the fastest one could do but it's simple and probably
pretty fast for many uses. I suppose that for some iterables it would
be better not to turn them into lists, but one could figure out about
that after working out more carefully what cases need to be covered.

> Your outer function can maintain a list of the items you want to XOR or generate a new one dynamically as needed. It can use functional programming techniques to create a new customized version of the iterator, such as with a value of N built in. You would then call the outer function and let it use the inner function till the result is available or until the data in the iterator runs out or perhaps other tweaks involving two way communication of sorts between the functions.
>
> I am NOT suggesting this approach is optimal or fast but merely wondering if something along these lines is worth trying that might speed things up even if not very fast. Such approaches can be even more effective if what you are working on need not all be instantiated up front but can be dynamically calculated or incrementally read from files. With care, you can make multiple instantiations that each iterate over their own sets of data without interference.
>
> Just a thought. In a sense, this can be a slightly decent substitute for the non-standard evaluation in R where you can arrange for lots of your data to not be interpreted till absolutely needed.
>
>
>
> -----Original Message-----
> From: Dom Grigonis <dom.grigonis@gmail.com>
> Sent: Monday, November 13, 2023 10:12 PM
> To: avi.e.gross@gmail.com
> Cc: Grant Edwards <grant.b.edwards@gmail.com>; Python <python-list@python.org>
> Subject: Re: xor operator (DEPRECATED)
>
> Fair point. However, I gave it a shot for the following reason:
>
> I couldn’t find a way to make such performant function. Using python builtin components still ends up several times slower than builtin `all`. Cython or numba or similar is not an option as they do not support `truth` values. Or if they do, it ends up slower than pure python variant.
>
> So all what is left is writing a proper extension. Which I would prefer not to do for 1 function. I thought maybe `xor`, as in logical XOR functionality in its vanilla case could be compelling. And after doing a bit of search I see that very very few languages have that and it seems for a good reason.
>
> Some that do: R, thats all I could find. Although some (if not many) went through the proposal phase. And yes, none of them have a function that I am proposing.
>
> So yes, you are right, not a good proposal.
>
> But there still seems to be the need for short-circuiting performant implementations in python space. The issue is that there are many variants of what might be needed while there is no efficient solution to sourcing predicates from python to lower level implementations. Someone mentioned that numpy experimented with such implementations in C, but they did not get anywhere with it.
>
> The best I could come up with is cached numba for numpy problems, which does perform very well and more than worth it if function is re-used. It even ends up faster than cython or cffi extensions, however can’t have many of those due to JIT and AOT is currently being deprecated (which wouldn’t solve anything anyway). However, as I mentioned earlier it does not apply to this case.
>
> So it’s either:
> a) Something very clever and flexible implemented that covers most of such needs and doesn’t require predicates.
> b) I welcome any thoughts on this.
>
> DG
>
>> On 14 Nov 2023, at 04:27, AVI GROSS via Python-list <python-list@python.org> wrote:
>>
>> I was going to ask a dumb question. Has any other language you know of made
>> something available that does what is being asked for and included it in the
>> main program environment rather than an add-on?
>>
>> A secondary mention here has been whether short-circuiting functions like
>> "any" and "all" have been augmented with something like "has_n" that
>> evaluates arguments till it has found n or perhaps n+1 of what it wants then
>> skips the rest. Does any language supply something like that? What would
>> such a function return and does it have an "any" or an "all" side?
>>
>> It sounds like if I asked if a list of integers has at least n prime numbers
>> in "any" mode, it should ignore any that are not primes till it finds n
>> primes or fails and returns true or false. If in "all" mode, I assume it
>> would have to be the first n items without a failure.
>>
>> Fine, but then someone may want to know WHERE you stopped or for you to
>> return the sublist of the ones that made the match, or even return
>> everything that was skipped so you can later process that. Consider a long
>> list of jurors you process to place a dozen that qualify on a jury and then
>> later you want to choose from among the rest for another jury.
>>
>> Human minds can come up with an amazing number of ideas including for
>> "useful" functions or features but I find the vast majority would rarely be
>> used as nobody remembers it is available and some fairly simple method using
>> other functions can easily be cobbled together.
>>
>> -----Original Message-----
>> From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On
>> Behalf Of Grant Edwards via Python-list
>> Sent: Monday, November 13, 2023 8:19 PM
>> To: python-list@python.org
>> Subject: Re: xor operator
>>
>> On 2023-11-14, Dom Grigonis via Python-list <python-list@python.org> wrote:
>>>
>>>> Except the 'any' and 'all' builtins are _exactly_ the same as bitwise
>>>> or and and applided to many bits. To do something "in line" with that
>>>> using the 'xor' operator would return True for an odd number of True
>>>> values and False for an even Number of True values.
>>>
>>> Fair point.
>>>
>>> Have you ever encountered the need for xor for many bits (the one
>>> that I am NOT referring to)? Would be interested in what sort of
>>> case it could be useful.
>>
>> Yes, it's used all the time in low-level communications protocols,
>> where it's often implemented in hardware. But, it is also not at all
>> unusual to implement it in software.
>>
>> It's also not that unusual for the "count-ones" part of the function
>> you're asking for to be implemented in hardware by a CPU having an
>> instruction that counts the number of 1 bits in a register.
>>
>> GCC has a low-level builtins called __builtin_popcount() and
>> __builtin-popcountl() that counts the number of 1's in an unsigned
>> (long) int.
>>
>>
>> --
>> https://mail.python.org/mailman/listinfo/python-list
>>
>> --
>> https://mail.python.org/mailman/listinfo/python-list
>
>

--
https://mail.python.org/mailman/listinfo/python-list

Re: xor operator (DEPRECATED) [ In reply to ]

python-list at python

Nov 13, 2023, 10:19 PM

Post #29 of 34 (244 views)

Permalink

Thank you, I have spent a fair bit of time on these and found optimal solutions (at least I think so), but they are still multiple times slower than python builtin function.

I am currently out of ideas, maybe will dig something out in time.

> On 14 Nov 2023, at 07:23, Thomas Passin via Python-list <python-list@python.org> wrote:
>
> On 11/13/2023 11:44 PM, AVI GROSS via Python-list wrote:
>> Dom,
>> I hear you.
>> As you say, writing your own extension in something like C++ may not appeal to you even if it is faster.
>> I was wondering if using a generator or something similar in R might make sense.
>> I mean what happens if you write a function that includes a "yield" or two and does a part of what you want. It maintains some internal state between invocations. So you can call it once to setup things then call it repeatedly to keep processing the next item. You stop calling it when you get a result you want, such as that it has seen what you want N times.
>> Since the code stays in memory, it may effectively run faster than some other kinds of functions calls. It can keep things in internal storage such as not just how many N you want but how many it has seen.
>
> I'm inclined to just turn the iterable into a set to get the values, then iterate through those values calling count() on a listified version of the iterable. If the count >= target, return.
>
> It may not be the fastest one could do but it's simple and probably pretty fast for many uses. I suppose that for some iterables it would be better not to turn them into lists, but one could figure out about that after working out more carefully what cases need to be covered.
>
>> Your outer function can maintain a list of the items you want to XOR or generate a new one dynamically as needed. It can use functional programming techniques to create a new customized version of the iterator, such as with a value of N built in. You would then call the outer function and let it use the inner function till the result is available or until the data in the iterator runs out or perhaps other tweaks involving two way communication of sorts between the functions.
>> I am NOT suggesting this approach is optimal or fast but merely wondering if something along these lines is worth trying that might speed things up even if not very fast. Such approaches can be even more effective if what you are working on need not all be instantiated up front but can be dynamically calculated or incrementally read from files. With care, you can make multiple instantiations that each iterate over their own sets of data without interference.
>> Just a thought. In a sense, this can be a slightly decent substitute for the non-standard evaluation in R where you can arrange for lots of your data to not be interpreted till absolutely needed.
>> -----Original Message-----
>> From: Dom Grigonis <dom.grigonis@gmail.com>
>> Sent: Monday, November 13, 2023 10:12 PM
>> To: avi.e.gross@gmail.com
>> Cc: Grant Edwards <grant.b.edwards@gmail.com>; Python <python-list@python.org>
>> Subject: Re: xor operator (DEPRECATED)
>> Fair point. However, I gave it a shot for the following reason:
>> I couldn’t find a way to make such performant function. Using python builtin components still ends up several times slower than builtin `all`. Cython or numba or similar is not an option as they do not support `truth` values. Or if they do, it ends up slower than pure python variant.
>> So all what is left is writing a proper extension. Which I would prefer not to do for 1 function. I thought maybe `xor`, as in logical XOR functionality in its vanilla case could be compelling. And after doing a bit of search I see that very very few languages have that and it seems for a good reason.
>> Some that do: R, thats all I could find. Although some (if not many) went through the proposal phase. And yes, none of them have a function that I am proposing.
>> So yes, you are right, not a good proposal.
>> But there still seems to be the need for short-circuiting performant implementations in python space. The issue is that there are many variants of what might be needed while there is no efficient solution to sourcing predicates from python to lower level implementations. Someone mentioned that numpy experimented with such implementations in C, but they did not get anywhere with it.
>> The best I could come up with is cached numba for numpy problems, which does perform very well and more than worth it if function is re-used. It even ends up faster than cython or cffi extensions, however can’t have many of those due to JIT and AOT is currently being deprecated (which wouldn’t solve anything anyway). However, as I mentioned earlier it does not apply to this case.
>> So it’s either:
>> a) Something very clever and flexible implemented that covers most of such needs and doesn’t require predicates.
>> b) I welcome any thoughts on this.
>> DG
>>> On 14 Nov 2023, at 04:27, AVI GROSS via Python-list <python-list@python.org> wrote:
>>>
>>> I was going to ask a dumb question. Has any other language you know of made
>>> something available that does what is being asked for and included it in the
>>> main program environment rather than an add-on?
>>>
>>> A secondary mention here has been whether short-circuiting functions like
>>> "any" and "all" have been augmented with something like "has_n" that
>>> evaluates arguments till it has found n or perhaps n+1 of what it wants then
>>> skips the rest. Does any language supply something like that? What would
>>> such a function return and does it have an "any" or an "all" side?
>>>
>>> It sounds like if I asked if a list of integers has at least n prime numbers
>>> in "any" mode, it should ignore any that are not primes till it finds n
>>> primes or fails and returns true or false. If in "all" mode, I assume it
>>> would have to be the first n items without a failure.
>>>
>>> Fine, but then someone may want to know WHERE you stopped or for you to
>>> return the sublist of the ones that made the match, or even return
>>> everything that was skipped so you can later process that. Consider a long
>>> list of jurors you process to place a dozen that qualify on a jury and then
>>> later you want to choose from among the rest for another jury.
>>>
>>> Human minds can come up with an amazing number of ideas including for
>>> "useful" functions or features but I find the vast majority would rarely be
>>> used as nobody remembers it is available and some fairly simple method using
>>> other functions can easily be cobbled together.
>>>
>>> -----Original Message-----
>>> From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On
>>> Behalf Of Grant Edwards via Python-list
>>> Sent: Monday, November 13, 2023 8:19 PM
>>> To: python-list@python.org
>>> Subject: Re: xor operator
>>>
>>> On 2023-11-14, Dom Grigonis via Python-list <python-list@python.org> wrote:
>>>>
>>>>> Except the 'any' and 'all' builtins are _exactly_ the same as bitwise
>>>>> or and and applided to many bits. To do something "in line" with that
>>>>> using the 'xor' operator would return True for an odd number of True
>>>>> values and False for an even Number of True values.
>>>>
>>>> Fair point.
>>>>
>>>> Have you ever encountered the need for xor for many bits (the one
>>>> that I am NOT referring to)? Would be interested in what sort of
>>>> case it could be useful.
>>>
>>> Yes, it's used all the time in low-level communications protocols,
>>> where it's often implemented in hardware. But, it is also not at all
>>> unusual to implement it in software.
>>>
>>> It's also not that unusual for the "count-ones" part of the function
>>> you're asking for to be implemented in hardware by a CPU having an
>>> instruction that counts the number of 1 bits in a register.
>>>
>>> GCC has a low-level builtins called __builtin_popcount() and
>>> __builtin-popcountl() that counts the number of 1's in an unsigned
>>> (long) int.
>>>
>>>
>>> --
>>> https://mail.python.org/mailman/listinfo/python-list
>>>
>>> --
>>> https://mail.python.org/mailman/listinfo/python-list
>
> --
> https://mail.python.org/mailman/listinfo/python-list <https://mail.python.org/mailman/listinfo/python-list>
--
https://mail.python.org/mailman/listinfo/python-list

Re: xor operator [ In reply to ]

python-list at python

Nov 14, 2023, 4:34 PM

Post #30 of 34 (233 views)

Permalink

On 2023-11-14 00:11:30 +0200, Dom Grigonis via Python-list wrote:
> Benchmarks:
> test1 = [False] * 100 + [True] * 2
> test2 = [True] * 100 + [False] * 2
>
> TIMER.repeat([.
> lambda: xor(test1), # 0.0168
> lambda: xor(test2), # 0.0172
> lambda: xor_ss(test1), # 0.1392
> lambda: xor_ss(test2), # 0.0084
> lambda: xor_new(test1), # 0.0116
> lambda: xor_new(test2), # 0.0074
> lambda: all(test1), # 0.0016
> lambda: all(test2) # 0.0046
> ])
> Your first function is fairly slow.
> Second one deals with short-circuiting, but is super slow on full search.
>
> `xor_new` is the best what I could achieve using python builtins.
>
> But builtin `all` has the best performance.

One question worth asking is if a list of bool is the best data
structure for the job. This is essentially a bitmap, and a bitmap is
equivalent to a set of integers. len(s) == 1 is also a fairly quick
operation if s is small. On my system, len(test1s) == 1 (where test1s is
{100, 101}) is about as fast as all(test1) and len(test2s) == 1 (where
test2s is set(range(100))) is about twice as fast as all(test2).

If you are willing to stray from the standard library, you could e.g.
use pyroaring instead of sets: This is about as fast as all(test1)
whether there are two bits set or a hundred.

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

Re: xor operator [ In reply to ]

python-list at python

Nov 15, 2023, 2:26 AM

Post #31 of 34 (232 views)

Permalink

Thank you,

test2 = [True] * 100 + [False] * 2
test2i = list(range(100))

%timeit len(set(test2i)) == 1 # 1.6 µs ± 63.6 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
%timeit all(test2) # 386 ns ± 9.58 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

test2s = set(test2i)
%timeit len(test2s) == 1 # 46.1 ns ± 1.65 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
If you pre-convert to set it is obviously faster. However, set operation is most likely going to be part of the procedure. In which case it ends up to be significantly slower.

Regards,
DG

> On 15 Nov 2023, at 02:34, Peter J. Holzer via Python-list <python-list@python.org> wrote:
>
> On 2023-11-14 00:11:30 +0200, Dom Grigonis via Python-list wrote:
>> Benchmarks:
>> test1 = [False] * 100 + [True] * 2
>> test2 = [True] * 100 + [False] * 2
>>
>> TIMER.repeat([.
>> lambda: xor(test1), # 0.0168
>> lambda: xor(test2), # 0.0172
>> lambda: xor_ss(test1), # 0.1392
>> lambda: xor_ss(test2), # 0.0084
>> lambda: xor_new(test1), # 0.0116
>> lambda: xor_new(test2), # 0.0074
>> lambda: all(test1), # 0.0016
>> lambda: all(test2) # 0.0046
>> ])
>> Your first function is fairly slow.
>> Second one deals with short-circuiting, but is super slow on full search.
>>
>> `xor_new` is the best what I could achieve using python builtins.
>>
>> But builtin `all` has the best performance.
>
> One question worth asking is if a list of bool is the best data
> structure for the job. This is essentially a bitmap, and a bitmap is
> equivalent to a set of integers. len(s) == 1 is also a fairly quick
> operation if s is small. On my system, len(test1s) == 1 (where test1s is
> {100, 101}) is about as fast as all(test1) and len(test2s) == 1 (where
> test2s is set(range(100))) is about twice as fast as all(test2).
>
> If you are willing to stray from the standard library, you could e.g.
> use pyroaring instead of sets: This is about as fast as all(test1)
> whether there are two bits set or a hundred.
>
> hp
>
> --
> _ | Peter J. Holzer | Story must make more sense than reality.
> |_|_) | |
> | | | hjp@hjp.at <mailto:hjp@hjp.at> | -- Charles Stross, "Creative writing
> __/ | http://www.hjp.at/ <http://www.hjp.at/> | challenge!"
> --
> https://mail.python.org/mailman/listinfo/python-list <https://mail.python.org/mailman/listinfo/python-list>
--
https://mail.python.org/mailman/listinfo/python-list

Re: xor operator [ In reply to ]

python-list at python

Nov 15, 2023, 9:16 AM

Post #32 of 34 (231 views)

Permalink

On 2023-11-15 12:26:32 +0200, Dom Grigonis wrote:
>
> Thank you,
>
>
> test2 = [True] * 100 + [False] * 2
> test2i = list(range(100))
>
> %timeit len(set(test2i)) == 1 # 1.6 ?s ? 63.6 ns per loop (mean ? std. dev. of 7 runs, 1,000,000 loops each)
> %timeit all(test2) # 386 ns ? 9.58 ns per loop (mean ? std. dev. of 7 runs, 1,000,000 loops each)
>
> test2s = set(test2i)
> %timeit len(test2s) == 1 # 46.1 ns ? 1.65 ns per loop (mean ? std. dev. of 7 runs, 10,000,000 loops each)
>
> If you pre-convert to set it is obviously faster. However, set
> operation is most likely going to be part of the procedure. In which
> case it ends up to be significantly slower.

Obviously, if you convert a list to a set just to count the elements
it's going to be slow. My suggestion was to use the set *instead* of the
list. I don't know whether that's possible in your situation, because
you haven't told us anything about it. All I'm suggesting is taking a
step back and reconsider your choice of data structure.

hp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

Re: xor operator [ In reply to ]

python-list at python

Nov 15, 2023, 10:13 AM

Post #33 of 34 (230 views)

Permalink

The specific situation was related to truth values and following out of that my considerations regarding equivalent of all and any for counting `truths`.

So one of the specific examples:
class Publisher:
def __init__(self):
self.subscribers = dict()

def subscribe(self, sub, criteria, agg=all):
self.subscribers[sub] = (criteria, agg)

def publish(self, msg):
for sub, criteria in self.subscribers.items():
if criteria(msg):
sub.handle(msg)

p = Publisher()
p.subscribe(sub, lambda x: all(r > 3 for r in x.ratings))

# So what I needed is:
p.subscribe(sub, lambda x: set(r > 3 for r in x.ratings) > 50)
# Note, that elements might not necessarily be bool.
# E.g. at least 51 non-empty pages
p.subscribe(sub, lambda x: set(bool(p) for p in x.pages) > 50)
# So the function:
p.subscribe(sub, lambda x: xor_more_than(x.pages, 50)
# would ideally deal with truth values too.
The question is: can you construct a function? which:
a) performs: lambda x: set(bool(el) for el in iterable) > n
b) is in line with performance of python’s `all`

Then, as I said the case where one would need `set() == n` is hard to think of, but it seems fairly probable to need `set() > n` and `set() < n`.

Regards,
DG

> On 15 Nov 2023, at 19:16, Peter J. Holzer via Python-list <python-list@python.org> wrote:
>
> On 2023-11-15 12:26:32 +0200, Dom Grigonis wrote:
>>
>> Thank you,
>>
>>
>> test2 = [True] * 100 + [False] * 2
>> test2i = list(range(100))
>>
>> %timeit len(set(test2i)) == 1 # 1.6 µs ± 63.6 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
>> %timeit all(test2) # 386 ns ± 9.58 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
>>
>> test2s = set(test2i)
>> %timeit len(test2s) == 1 # 46.1 ns ± 1.65 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>
>> If you pre-convert to set it is obviously faster. However, set
>> operation is most likely going to be part of the procedure. In which
>> case it ends up to be significantly slower.
>
> Obviously, if you convert a list to a set just to count the elements
> it's going to be slow. My suggestion was to use the set *instead* of the
> list. I don't know whether that's possible in your situation, because
> you haven't told us anything about it. All I'm suggesting is taking a
> step back and reconsider your choice of data structure.
>
> hp
> --
> _ | Peter J. Holzer | Story must make more sense than reality.
> |_|_) | |
> | | | hjp@hjp.at | -- Charles Stross, "Creative writing
> __/ | http://www.hjp.at/ | challenge!"
> --
> https://mail.python.org/mailman/listinfo/python-list

--
https://mail.python.org/mailman/listinfo/python-list

Re: xor operator [ In reply to ]

python-list at python

Nov 15, 2023, 11:00 AM

Post #34 of 34 (230 views)

Permalink

In case someone is actually going to execute the code, there is a bug:

`set` need to be wrapped in `len` for criteria args.

> On 15 Nov 2023, at 20:13, Dom Grigonis <dom.grigonis@gmail.com> wrote:
>
>
> The specific situation was related to truth values and following out of that my considerations regarding equivalent of all and any for counting `truths`.
>
> So one of the specific examples:
> class Publisher:
> def __init__(self):
> self.subscribers = dict()
>
> def subscribe(self, sub, criteria, agg=all):
> self.subscribers[sub] = (criteria, agg)
>
> def publish(self, msg):
> for sub, criteria in self.subscribers.items():
> if criteria(msg):
> sub.handle(msg)
>
> p = Publisher()
> p.subscribe(sub, lambda x: all(r > 3 for r in x.ratings))
>
> # So what I needed is:
> p.subscribe(sub, lambda x: set(r > 3 for r in x.ratings) > 50)
> # Note, that elements might not necessarily be bool.
> # E.g. at least 51 non-empty pages
> p.subscribe(sub, lambda x: set(bool(p) for p in x.pages) > 50)
> # So the function:
> p.subscribe(sub, lambda x: xor_more_than(x.pages, 50)
> # would ideally deal with truth values too.
> The question is: can you construct a function? which:
> a) performs: lambda x: set(bool(el) for el in iterable) > n
> b) is in line with performance of python’s `all`
>
> Then, as I said the case where one would need `set() == n` is hard to think of, but it seems fairly probable to need `set() > n` and `set() < n`.
>
> Regards,
> DG
>
>> On 15 Nov 2023, at 19:16, Peter J. Holzer via Python-list <python-list@python.org <mailto:python-list@python.org>> wrote:
>>
>> On 2023-11-15 12:26:32 +0200, Dom Grigonis wrote:
>>>
>>> Thank you,
>>>
>>>
>>> test2 = [True] * 100 + [False] * 2
>>> test2i = list(range(100))
>>>
>>> %timeit len(set(test2i)) == 1 # 1.6 µs ± 63.6 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
>>> %timeit all(test2) # 386 ns ± 9.58 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
>>>
>>> test2s = set(test2i)
>>> %timeit len(test2s) == 1 # 46.1 ns ± 1.65 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>>
>>> If you pre-convert to set it is obviously faster. However, set
>>> operation is most likely going to be part of the procedure. In which
>>> case it ends up to be significantly slower.
>>
>> Obviously, if you convert a list to a set just to count the elements
>> it's going to be slow. My suggestion was to use the set *instead* of the
>> list. I don't know whether that's possible in your situation, because
>> you haven't told us anything about it. All I'm suggesting is taking a
>> step back and reconsider your choice of data structure.
>>
>> hp
>> --
>> _ | Peter J. Holzer | Story must make more sense than reality.
>> |_|_) | |
>> | | | hjp@hjp.at <mailto:hjp@hjp.at> | -- Charles Stross, "Creative writing
>> __/ | http://www.hjp.at/ <http://www.hjp.at/> | challenge!"
>> --
>> https://mail.python.org/mailman/listinfo/python-list <https://mail.python.org/mailman/listinfo/python-list>
>

--
https://mail.python.org/mailman/listinfo/python-list

Mailing List Archive

Attached Files:

Attached Files: