Mailing List Archive: Extracting dataframe column with multiple conditions on row values

Extracting dataframe column with multiple conditions on row values

Jan 7, 2022, 3:07 PM

Post #1 of 5 (237 views)

Hi

I have a csv file like this

V0,V1,V2,V3

4,1,1,1

6,4,5,2

2,3,6,7

And I want to search two rows for a match and find the column. For
example, I want to search row[0] for 1 and row[1] for 5. The corresponding
column is V2 (which is the third column). Then I want to return the value
at row[2] and the found column. The result should be 6 then.

I can manually extract the specified rows (with index 0 and 1 which are
fixed) and manually iterate over them like arrays to find a match. Then I

key1 = 1

key2 = 5

row1 = df.iloc[0] # row=[4,1,1,1]

row2 = df.iloc[1] # row=[6,4,5,2]

for i in range(len(row1)):

if row1[i] == key1:

for j in range(len(row2)):

if row2[j] == key2:

res = df.iloc[:,j]

print(res) # 6

Is there any way to use built-in function for a more efficient code?

Regards,

Mahmood

--
https://mail.python.org/mailman/listinfo/python-list

Re: Extracting dataframe column with multiple conditions on row values [ In reply to ]

python-list at python

Jan 7, 2022, 5:21 PM

Post #2 of 5 (237 views)

Permalink

Salaam Mahmood,

On 08/01/2022 12.07, Mahmood Naderan via Python-list wrote:
> I have a csv file like this
> V0,V1,V2,V3
> 4,1,1,1
> 6,4,5,2
> 2,3,6,7
>
> And I want to search two rows for a match and find the column. For
> example, I want to search row[0] for 1 and row[1] for 5. The corresponding
> column is V2 (which is the third column). Then I want to return the value
> at row[2] and the found column. The result should be 6 then.

Not quite: isn't the "found column" also required?

> I can manually extract the specified rows (with index 0 and 1 which are
> fixed) and manually iterate over them like arrays to find a match. Then I

Perhaps this idea has been influenced by a similar solution in another
programming language. May I suggest that the better-answer you seek lies
in using Python idioms (as well as Python's tools)...

> key1 = 1
> key2 = 5

Fine, so far - excepting that this 'problem' is likely to be a small
part of some larger system. Accordingly, consider writing it as a
function. In which case, these two "keys" will become
function-parameters (and the two 'results' become return-values).

> row1 = df.iloc[0] # row=[4,1,1,1]
> row2 = df.iloc[1] # row=[6,4,5,2]

This is likely not native-Python. Let's create lists for 'everything',
just-because:

>>> headings = [ "V0","V1","V2","V3" ]
>>> row1 = [4,1,1,1]
>>> row2 = [6,4,5,2]
>>> results = [ 2,3,6,7 ]

Note how I'm using the Python REPL (in a "terminal", type "python" (as
appropriate to your OpSys) at the command-line). IMHO the REPL is a
grossly under-rated tool, and is a very good means towards
trial-and-error, and learning by example. Highly recommended!

> for i in range(len(row1)):

This construction is very much a "code smell" for thinking that it is
not "pythonic". (and perhaps the motivation for this post)

In Python (compared with many other languages) the "for" loop should
actually be pronounced "for-each". In other words when we pair the
code-construct with a list (for example):

for each item in the list the computer should perform some suite of
commands.

(the "suite" is everything 'inside' the for-each-loop - NB my
'Python-betters' will quickly point-out that this feature is not limited
to Python-lists, but will work with any :iterable" - ref:
https://docs.python.org/3/tutorial/controlflow.html#for-statements)

Thus:

> for item in headings: print( item )
...
V0
V1
V2
V3

The problem is that when working with matrices/matrixes, a math
background equips one with the idea of indices/indexes, eg the
ubiquitous subscript-i. Accordingly, when reading 'math' where a formula
uses the upper-case Greek "sigma" character, remember that it means "for
all" or "for each"!

So, if Python doesn't use indexing or "pointers", how do we deal with
the problem?

Unfortunately, at first glance, the pythonic approach may seem
more-complicated or even somewhat convoluted, but once the concepts
(and/or the Python idioms) are learned, it is quite manageable (and
applicable to many more applications than matrices/matrixes!)...

> if row1[i] == key1:
> for j in range(len(row2)):
> if row2[j] == key2:
> res = df.iloc[:,j]
> print(res) # 6
>
> Is there any way to use built-in function for a more efficient code?

This is where your idea bears fruit!

There is a Python "built-in function": zip(), which will 'join' lists.
NB do not become confused between zip() and zip archive/compressed files!

Most of the time reference book and web-page examples show zip() being
used to zip-together two lists into a single data-construct (which is an
iterable(!)). However, zip() will actually zip-together multiple (more
than two) "iterables". As the manual says:

«zip() returns an iterator of tuples, where the i-th tuple contains the
i-th element from each of the argument iterables.»

Ah, so that's where the math-idea of subscript-i went! It has become
'hidden' in Python's workings - or putting that another way: Python
looks after the subscripting for us (and given that 'out by one' errors
in pointers is a major source of coding-error in other languages,
thank-you very much Python!)

First re-state the source-data as Python lists, (per above) - except
that I recommend the names be better-chosen to be more meaningful (to
your application)!

Now, (in the REPL) try using zip():

>>> zip( headings, row1, row2, results )
<zip object at 0x7f655cca6bc0>

Does that seem a very good illustration? Not really, but re-read the
quotation from the manual (above) where it says that zip returns an
iterator. If we want to see the values an iterator will produce, then
turn it into an iterable data-structure, eg:

>>> list( zip( headings, row1, row2, results ) )
[.('V0', 4, 6, 2), ('V1', 1, 4, 3), ('V2', 1, 5, 6), ('V3', 1, 2, 7)]

or, to see things more clearly, let me re-type it as:

[.
('V0', 4, 6, 2),
('V1', 1, 4, 3),
('V2', 1, 5, 6),
('V3', 1, 2, 7)
]

What we now see is actually a "transpose" of the original 'matrix'
presented in the post/question!

(NB Python will perform this layout for us - read about the pprint library)

Another method which can also be employed (and which will illustrate the
loop required to code the eventual-solution(!)) is that Python's next()
will extract the first row of the transpose:

>>> row = next( zip( headings, row1, row2, results ) )
>>> row
('V0', 4, 6, 2)

This is all-well-and-good, but that result is a tuple of four items
(corresponding to one column in the way the source-data was explained).

If we need to consider the four individual data-items, that can be
improved using a Python feature called "tuple unpacking". Instead of the
above delivering a tuple which is then assigned to "row", the tuple can
be assigned to four "identifiers", eg

>>> heading, row1_item, row2_item, result= next( zip( headings, row1,
row2, results ) )

(apologies about email word-wrapping - this is a single line of Python-code)

Which, to prove the case, could be printed:

>>> heading, row1_item, row2_item, result
('V0', 4, 6, 2)

(ref:
https://docs.python.org/3/tutorial/datastructures.html?highlight=tuple%20unpacking#tuples-and-sequences)

Thus, if we repeatedly ask for the next() row from the zip-ped
transpose, eventually it will respond with the row starting 'V2' - which
is the desired-result, ie the row containing the 1, the 5, and the 6 -
and if you follow-through using the REPL, will be clearly visible.

Finally, 'all' that is required, is a for-each-loop which will iterate
across/down the zip object, one tuple (row of the transpose) at a time,
AND perform the "tuple-unpacking" all in one command, with an
if-statement to detect the correct row/column:

>>> for *tuple-unpacking* in *zip() etc*:
... if row1_item == *what?* and row2_item == *what?*
... print( *which* and *which identifier* )
...
V2 6

Yes, three lines. It's as easy as that!
(when you know how)

Worse: when you become more expert, you'll be able to compress all of
that down into a single-line solution - but it won't be as "readable" as
is this!

NB this question has a 'question-smell' of 'homework', so I'll not
complete the code for you - this is something *you* asked to learn and
the best way to learn is by 'doing' (not by 'reading').

However, please respond with your solution, or any further question
(with the next version of the code so-far, per this first-post - which
we appreciate!)

Regardless, you asked 'the right question' (curiosity is the key to
learning) and in the right way/manner. Well done!

NBB the above code-outline does not consider the situation where the
search fails/the keys are not found!

For further information, please review:
https://docs.python.org/3/library/functions.html?highlight=zip#zip

Also, further to the above discussion of combining lists and loops:
https://docs.python.org/3/tutorial/datastructures.html?highlight=zip#looping-techniques

and with a similar application (to this post):
https://docs.python.org/3/faq/programming.html?highlight=zip#how-can-i-sort-one-list-by-values-from-another-list

--
Regards,
=dn
--
https://mail.python.org/mailman/listinfo/python-list

Re: Extracting dataframe column with multiple conditions on row values [ In reply to ]

edmondo.giovannozzi at gmail

Jan 8, 2022, 5:00 AM

Post #3 of 5 (232 views)

Permalink

Il giorno sabato 8 gennaio 2022 alle 02:21:40 UTC+1 dn ha scritto:
> Salaam Mahmood,
> On 08/01/2022 12.07, Mahmood Naderan via Python-list wrote:
> > I have a csv file like this
> > V0,V1,V2,V3
> > 4,1,1,1
> > 6,4,5,2
> > 2,3,6,7
> >
> > And I want to search two rows for a match and find the column. For
> > example, I want to search row[0] for 1 and row[1] for 5. The corresponding
> > column is V2 (which is the third column). Then I want to return the value
> > at row[2] and the found column. The result should be 6 then.
> Not quite: isn't the "found column" also required?
> > I can manually extract the specified rows (with index 0 and 1 which are
> > fixed) and manually iterate over them like arrays to find a match. Then I
> Perhaps this idea has been influenced by a similar solution in another
> programming language. May I suggest that the better-answer you seek lies
> in using Python idioms (as well as Python's tools)...
> > key1 = 1
> > key2 = 5
> Fine, so far - excepting that this 'problem' is likely to be a small
> part of some larger system. Accordingly, consider writing it as a
> function. In which case, these two "keys" will become
> function-parameters (and the two 'results' become return-values).
> > row1 = df.iloc[0] # row=[4,1,1,1]
> > row2 = df.iloc[1] # row=[6,4,5,2]
> This is likely not native-Python. Let's create lists for 'everything',
> just-because:
>
> >>> headings = [ "V0","V1","V2","V3" ]
> >>> row1 = [4,1,1,1]
> >>> row2 = [6,4,5,2]
> >>> results = [ 2,3,6,7 ]
>
>
> Note how I'm using the Python REPL (in a "terminal", type "python" (as
> appropriate to your OpSys) at the command-line). IMHO the REPL is a
> grossly under-rated tool, and is a very good means towards
> trial-and-error, and learning by example. Highly recommended!
>
>
> > for i in range(len(row1)):
>
> This construction is very much a "code smell" for thinking that it is
> not "pythonic". (and perhaps the motivation for this post)
>
> In Python (compared with many other languages) the "for" loop should
> actually be pronounced "for-each". In other words when we pair the
> code-construct with a list (for example):
>
> for each item in the list the computer should perform some suite of
> commands.
>
> (the "suite" is everything 'inside' the for-each-loop - NB my
> 'Python-betters' will quickly point-out that this feature is not limited
> to Python-lists, but will work with any :iterable" - ref:
> https://docs.python.org/3/tutorial/controlflow.html#for-statements)
>
>
> Thus:
>
> > for item in headings: print( item )
> ...
> V0
> V1
> V2
> V3
>
>
> The problem is that when working with matrices/matrixes, a math
> background equips one with the idea of indices/indexes, eg the
> ubiquitous subscript-i. Accordingly, when reading 'math' where a formula
> uses the upper-case Greek "sigma" character, remember that it means "for
> all" or "for each"!
>
> So, if Python doesn't use indexing or "pointers", how do we deal with
> the problem?
>
> Unfortunately, at first glance, the pythonic approach may seem
> more-complicated or even somewhat convoluted, but once the concepts
> (and/or the Python idioms) are learned, it is quite manageable (and
> applicable to many more applications than matrices/matrixes!)...
> > if row1[i] == key1:
> > for j in range(len(row2)):
> > if row2[j] == key2:
> > res = df.iloc[:,j]
> > print(res) # 6
> >
> > Is there any way to use built-in function for a more efficient code?
> This is where your idea bears fruit!
>
> There is a Python "built-in function": zip(), which will 'join' lists.
> NB do not become confused between zip() and zip archive/compressed files!
>
> Most of the time reference book and web-page examples show zip() being
> used to zip-together two lists into a single data-construct (which is an
> iterable(!)). However, zip() will actually zip-together multiple (more
> than two) "iterables". As the manual says:
>
> «zip() returns an iterator of tuples, where the i-th tuple contains the
> i-th element from each of the argument iterables.»
>
> Ah, so that's where the math-idea of subscript-i went! It has become
> 'hidden' in Python's workings - or putting that another way: Python
> looks after the subscripting for us (and given that 'out by one' errors
> in pointers is a major source of coding-error in other languages,
> thank-you very much Python!)
>
> First re-state the source-data as Python lists, (per above) - except
> that I recommend the names be better-chosen to be more meaningful (to
> your application)!
>
>
> Now, (in the REPL) try using zip():
>
> >>> zip( headings, row1, row2, results )
> <zip object at 0x7f655cca6bc0>
>
> Does that seem a very good illustration? Not really, but re-read the
> quotation from the manual (above) where it says that zip returns an
> iterator. If we want to see the values an iterator will produce, then
> turn it into an iterable data-structure, eg:
>
> >>> list( zip( headings, row1, row2, results ) )
> [.('V0', 4, 6, 2), ('V1', 1, 4, 3), ('V2', 1, 5, 6), ('V3', 1, 2, 7)]
>
> or, to see things more clearly, let me re-type it as:
>
> [.
> ('V0', 4, 6, 2),
> ('V1', 1, 4, 3),
> ('V2', 1, 5, 6),
> ('V3', 1, 2, 7)
> ]
>
>
> What we now see is actually a "transpose" of the original 'matrix'
> presented in the post/question!
>
> (NB Python will perform this layout for us - read about the pprint library)
>
>
> Another method which can also be employed (and which will illustrate the
> loop required to code the eventual-solution(!)) is that Python's next()
> will extract the first row of the transpose:
>
> >>> row = next( zip( headings, row1, row2, results ) )
> >>> row
> ('V0', 4, 6, 2)
>
>
> This is all-well-and-good, but that result is a tuple of four items
> (corresponding to one column in the way the source-data was explained).
>
> If we need to consider the four individual data-items, that can be
> improved using a Python feature called "tuple unpacking". Instead of the
> above delivering a tuple which is then assigned to "row", the tuple can
> be assigned to four "identifiers", eg
>
> >>> heading, row1_item, row2_item, result= next( zip( headings, row1,
> row2, results ) )
>
> (apologies about email word-wrapping - this is a single line of Python-code)
>
>
> Which, to prove the case, could be printed:
>
> >>> heading, row1_item, row2_item, result
> ('V0', 4, 6, 2)
>
>
> (ref:
> https://docs.python.org/3/tutorial/datastructures.html?highlight=tuple%20unpacking#tuples-and-sequences)
>
>
> Thus, if we repeatedly ask for the next() row from the zip-ped
> transpose, eventually it will respond with the row starting 'V2' - which
> is the desired-result, ie the row containing the 1, the 5, and the 6 -
> and if you follow-through using the REPL, will be clearly visible.
>
>
> Finally, 'all' that is required, is a for-each-loop which will iterate
> across/down the zip object, one tuple (row of the transpose) at a time,
> AND perform the "tuple-unpacking" all in one command, with an
> if-statement to detect the correct row/column:
>
> >>> for *tuple-unpacking* in *zip() etc*:
> ... if row1_item == *what?* and row2_item == *what?*
> ... print( *which* and *which identifier* )
> ...
> V2 6
>
> Yes, three lines. It's as easy as that!
> (when you know how)
>
> Worse: when you become more expert, you'll be able to compress all of
> that down into a single-line solution - but it won't be as "readable" as
> is this!
>
>
> NB this question has a 'question-smell' of 'homework', so I'll not
> complete the code for you - this is something *you* asked to learn and
> the best way to learn is by 'doing' (not by 'reading').
>
> However, please respond with your solution, or any further question
> (with the next version of the code so-far, per this first-post - which
> we appreciate!)
>
> Regardless, you asked 'the right question' (curiosity is the key to
> learning) and in the right way/manner. Well done!
>
>
> NBB the above code-outline does not consider the situation where the
> search fails/the keys are not found!
>
>
> For further information, please review:
> https://docs.python.org/3/library/functions.html?highlight=zip#zip
>
> Also, further to the above discussion of combining lists and loops:
> https://docs.python.org/3/tutorial/datastructures.html?highlight=zip#looping-techniques
>
> and with a similar application (to this post):
> https://docs.python.org/3/faq/programming.html?highlight=zip#how-can-i-sort-one-list-by-values-from-another-list
>
> --
> Regards,

You may also transpose your dataset. Then the index will become your column name and the column name become your index:
To read your dataset:

import pandas as pd
import io

DN = """
V0,V1,V2,V3
4,1,1,1
6,4,5,2
2,3,6,7
"""
df = pd.read_csv(io.StringIO(DN))

Transpose it:

dft = df.T

Find all the index with your condition:

idt = (dft[0] == 1) & (dft[1] == 5)

Print the columns that satisfy your condition:

print(dft[idt])

As you see, without explicit loop.
--
https://mail.python.org/mailman/listinfo/python-list

Re: Extracting dataframe column with multiple conditions on row values [ In reply to ]

python-list at python

Jan 8, 2022, 2:00 PM

Post #4 of 5 (232 views)

Permalink

I have to wonder if when something looks like HOMEWORK, if it should be answered in detail, let alone using methods beyond what is expected in class.
The goal of this particular project seems to be to find one (or perhaps more) columns in some data structure like a dataframe that match two conditions (containing a copy of two numbers in one or more places) and then KNOW what column it was in. The reason I say that is because the next fairly nonsensical request is to then explicitly return what that column has in the row called 2, meaning the third row.
Perhaps stated another way: "what it the item in row/address 2 of the column that somewhere contains two additional specified contents called key1 and key2"
My guess is that if the instructor wanted this to be solved using methods being taught, then loops may well be a way to go. Python and numpy/pandas make it often easier to do things with columns rather than in rows across them, albeit many things allow you to specify an axis. So, yes, transposing is a way to go that transforms the problem in a way easier to solve without thinking deeply. Some other languages allow relatively easy access in both directions of horizontally versus vertically. And this may be an example where solving it as a list of lists may also be easier.
Is the solution at the bottom a solution? Before I check, I want to see if I understand the required functionality and ask if it is completely and unambiguously specified.
For completeness, the question being asked may need to deal with a uniqueness issue. Is it possible multiple columns match the request and thus more than one answer is required to be returned? Is the row called 2 allowed to participate in the match or must it be excluded and the question becomes to find one (or more) columns that contain key1 somewhere else than row 2 and key2 (which may have to be different than key1 or not) somewhere else and THEN provide the corresponding entry from row 2 and that (or those) column(s)?
So in looking at the solution offered, what exactly was this supposed to do when dft is the transpose?
idt = (dft[0] == 1) & (dft[1] == 5)
Was the code (way below in this message) tried out or just written for us to ponder? I tried it. I got an answer of: 0 1 2
V2 1 5 6
That is not my understanding of what was requested. Row 2 (shown transposed as a column) is being shown as a whole. The request was for item "2" which would be just 6. Something more like this:
print(dft[idt][2])

But the code makes no sense to me. seems to explicitly test the first column (0) to see if it contains a 1 and then the second column (1) to see if it contains a 5. Not sure who cares about this hard-wired query as this is not my understanding of the question. You want any of the original three rows (now transposed) tested to see if it contains BOTH.
I may have read the requirements wrong or it may not be explained well. Until I am sure what is being asked and whether there is a good reason someone wants a different solution, I see no reason to provide yet another solution.But just for fund, assuming dft contains the transpose of the original data, will this work?
first = dft[dft.values == key1 ]second = first[first.values == key2 ]print(second[2])
I get a 6 as an answer and suppose it could be done in one more complex expression if needed! LOL!
-----Original Message-----
From: Edmondo Giovannozzi <edmondo.giovannozzi@gmail.com>
To: python-list@python.org
Sent: Sat, Jan 8, 2022 8:00 am
Subject: Re: Extracting dataframe column with multiple conditions on row values

Il giorno sabato 8 gennaio 2022 alle 02:21:40 UTC+1 dn ha scritto:
> Salaam Mahmood,
> On 08/01/2022 12.07, Mahmood Naderan via Python-list wrote:
> > I have a csv file like this
> > V0,V1,V2,V3
> > 4,1,1,1
> > 6,4,5,2
> > 2,3,6,7
> >
> > And I want to search two rows for a match and find the column. For
> > example, I want to search row[0] for 1 and row[1] for 5. The corresponding
> > column is V2 (which is the third column). Then I want to return the value
> > at row[2] and the found column. The result should be 6 then.
> Not quite: isn't the "found column" also required?
> > I can manually extract the specified rows (with index 0 and 1 which are
> > fixed) and manually iterate over them like arrays to find a match. Then I
> Perhaps this idea has been influenced by a similar solution in another
> programming language. May I suggest that the better-answer you seek lies
> in using Python idioms (as well as Python's tools)...
> > key1 = 1
> > key2 = 5
> Fine, so far - excepting that this 'problem' is likely to be a small
> part of some larger system. Accordingly, consider writing it as a
> function. In which case, these two "keys" will become
> function-parameters (and the two 'results' become return-values).
> > row1 = df.iloc[0] # row=[4,1,1,1]
> > row2 = df.iloc[1] # row=[6,4,5,2]
> This is likely not native-Python. Let's create lists for 'everything',
> just-because:
>
> >>> headings = [ "V0","V1","V2","V3" ]
> >>> row1 = [4,1,1,1]
> >>> row2 = [6,4,5,2]
> >>> results = [ 2,3,6,7 ]
>
>
> Note how I'm using the Python REPL (in a "terminal", type "python" (as
> appropriate to your OpSys) at the command-line). IMHO the REPL is a
> grossly under-rated tool, and is a very good means towards
> trial-and-error, and learning by example. Highly recommended!
>
>
> > for i in range(len(row1)):
>
> This construction is very much a "code smell" for thinking that it is
> not "pythonic". (and perhaps the motivation for this post)
>
> In Python (compared with many other languages) the "for" loop should
> actually be pronounced "for-each". In other words when we pair the
> code-construct with a list (for example):
>
> for each item in the list the computer should perform some suite of
> commands.
>
> (the "suite" is everything 'inside' the for-each-loop - NB my
> 'Python-betters' will quickly point-out that this feature is not limited
> to Python-lists, but will work with any :iterable" - ref:
> https://docs.python.org/3/tutorial/controlflow.html#for-statements)
>
>
> Thus:
>
> > for item in headings: print( item )
> ...
> V0
> V1
> V2
> V3
>
>
> The problem is that when working with matrices/matrixes, a math
> background equips one with the idea of indices/indexes, eg the
> ubiquitous subscript-i. Accordingly, when reading 'math' where a formula
> uses the upper-case Greek "sigma" character, remember that it means "for
> all" or "for each"!
>
> So, if Python doesn't use indexing or "pointers", how do we deal with
> the problem?
>
> Unfortunately, at first glance, the pythonic approach may seem
> more-complicated or even somewhat convoluted, but once the concepts
> (and/or the Python idioms) are learned, it is quite manageable (and
> applicable to many more applications than matrices/matrixes!)...
> > if row1[i] == key1:
> > for j in range(len(row2)):
> > if row2[j] == key2:
> > res = df.iloc[:,j]
> > print(res) # 6
> >
> > Is there any way to use built-in function for a more efficient code?
> This is where your idea bears fruit!
>
> There is a Python "built-in function": zip(), which will 'join' lists.
> NB do not become confused between zip() and zip archive/compressed files!
>
> Most of the time reference book and web-page examples show zip() being
> used to zip-together two lists into a single data-construct (which is an
> iterable(!)). However, zip() will actually zip-together multiple (more
> than two) "iterables". As the manual says:
>
> «zip() returns an iterator of tuples, where the i-th tuple contains the
> i-th element from each of the argument iterables.»
>
> Ah, so that's where the math-idea of subscript-i went! It has become
> 'hidden' in Python's workings - or putting that another way: Python
> looks after the subscripting for us (and given that 'out by one' errors
> in pointers is a major source of coding-error in other languages,
> thank-you very much Python!)
>
> First re-state the source-data as Python lists, (per above) - except
> that I recommend the names be better-chosen to be more meaningful (to
> your application)!
>
>
> Now, (in the REPL) try using zip():
>
> >>> zip( headings, row1, row2, results )
> <zip object at 0x7f655cca6bc0>
>
> Does that seem a very good illustration? Not really, but re-read the
> quotation from the manual (above) where it says that zip returns an
> iterator. If we want to see the values an iterator will produce, then
> turn it into an iterable data-structure, eg:
>
> >>> list( zip( headings, row1, row2, results ) )
> [.('V0', 4, 6, 2), ('V1', 1, 4, 3), ('V2', 1, 5, 6), ('V3', 1, 2, 7)]
>
> or, to see things more clearly, let me re-type it as:
>
> [.
> ('V0', 4, 6, 2),
> ('V1', 1, 4, 3),
> ('V2', 1, 5, 6),
> ('V3', 1, 2, 7)
> ]
>
>
> What we now see is actually a "transpose" of the original 'matrix'
> presented in the post/question!
>
> (NB Python will perform this layout for us - read about the pprint library)
>
>
> Another method which can also be employed (and which will illustrate the
> loop required to code the eventual-solution(!)) is that Python's next()
> will extract the first row of the transpose:
>
> >>> row = next( zip( headings, row1, row2, results ) )
> >>> row
> ('V0', 4, 6, 2)
>
>
> This is all-well-and-good, but that result is a tuple of four items
> (corresponding to one column in the way the source-data was explained).
>
> If we need to consider the four individual data-items, that can be
> improved using a Python feature called "tuple unpacking". Instead of the
> above delivering a tuple which is then assigned to "row", the tuple can
> be assigned to four "identifiers", eg
>
> >>> heading, row1_item, row2_item, result= next( zip( headings, row1,
> row2, results ) )
>
> (apologies about email word-wrapping - this is a single line of Python-code)
>
>
> Which, to prove the case, could be printed:
>
> >>> heading, row1_item, row2_item, result
> ('V0', 4, 6, 2)
>
>
> (ref:
> https://docs.python.org/3/tutorial/datastructures.html?highlight=tuple%20unpacking#tuples-and-sequences)
>
>
> Thus, if we repeatedly ask for the next() row from the zip-ped
> transpose, eventually it will respond with the row starting 'V2' - which
> is the desired-result, ie the row containing the 1, the 5, and the 6 -
> and if you follow-through using the REPL, will be clearly visible.
>
>
> Finally, 'all' that is required, is a for-each-loop which will iterate
> across/down the zip object, one tuple (row of the transpose) at a time,
> AND perform the "tuple-unpacking" all in one command, with an
> if-statement to detect the correct row/column:
>
> >>> for *tuple-unpacking* in *zip() etc*:
> ... if row1_item == *what?* and row2_item == *what?*
> ... print( *which* and *which identifier* )
> ...
> V2 6
>
> Yes, three lines. It's as easy as that!
> (when you know how)
>
> Worse: when you become more expert, you'll be able to compress all of
> that down into a single-line solution - but it won't be as "readable" as
> is this!
>
>
> NB this question has a 'question-smell' of 'homework', so I'll not
> complete the code for you - this is something *you* asked to learn and
> the best way to learn is by 'doing' (not by 'reading').
>
> However, please respond with your solution, or any further question
> (with the next version of the code so-far, per this first-post - which
> we appreciate!)
>
> Regardless, you asked 'the right question' (curiosity is the key to
> learning) and in the right way/manner. Well done!
>
>
> NBB the above code-outline does not consider the situation where the
> search fails/the keys are not found!
>
>
> For further information, please review:
> https://docs.python.org/3/library/functions.html?highlight=zip#zip
>
> Also, further to the above discussion of combining lists and loops:
> https://docs.python.org/3/tutorial/datastructures.html?highlight=zip#looping-techniques
>
> and with a similar application (to this post):
> https://docs.python.org/3/faq/programming.html?highlight=zip#how-can-i-sort-one-list-by-values-from-another-list
>
> --
> Regards,

You may also transpose your dataset. Then the index will become your column name and the column name become your index:
To read your dataset:

import pandas as pd
import io

DN = """
V0,V1,V2,V3
4,1,1,1
6,4,5,2
2,3,6,7
"""
df = pd.read_csv(io.StringIO(DN))

Transpose it:

dft = df.T

Find all the index with your condition:

idt = (dft[0] == 1) & (dft[1] == 5)

Print the columns that satisfy your condition:

print(dft[idt])

As you see, without explicit loop.
--
https://mail.python.org/mailman/listinfo/python-list
--
https://mail.python.org/mailman/listinfo/python-list

Re: Extracting dataframe column with multiple conditions on row values [ In reply to ]

edmondo.giovannozzi at gmail

Jan 9, 2022, 10:29 AM

Post #5 of 5 (232 views)

Permalink

Il giorno sabato 8 gennaio 2022 alle 23:01:13 UTC+1 Avi Gross ha scritto:
> I have to wonder if when something looks like HOMEWORK, if it should be answered in detail, let alone using methods beyond what is expected in class.
> The goal of this particular project seems to be to find one (or perhaps more) columns in some data structure like a dataframe that match two conditions (containing a copy of two numbers in one or more places) and then KNOW what column it was in. The reason I say that is because the next fairly nonsensical request is to then explicitly return what that column has in the row called 2, meaning the third row.
> Perhaps stated another way: "what it the item in row/address 2 of the column that somewhere contains two additional specified contents called key1 and key2"
> My guess is that if the instructor wanted this to be solved using methods being taught, then loops may well be a way to go. Python and numpy/pandas make it often easier to do things with columns rather than in rows across them, albeit many things allow you to specify an axis. So, yes, transposing is a way to go that transforms the problem in a way easier to solve without thinking deeply. Some other languages allow relatively easy access in both directions of horizontally versus vertically. And this may be an example where solving it as a list of lists may also be easier.
> Is the solution at the bottom a solution? Before I check, I want to see if I understand the required functionality and ask if it is completely and unambiguously specified.
> For completeness, the question being asked may need to deal with a uniqueness issue. Is it possible multiple columns match the request and thus more than one answer is required to be returned? Is the row called 2 allowed to participate in the match or must it be excluded and the question becomes to find one (or more) columns that contain key1 somewhere else than row 2 and key2 (which may have to be different than key1 or not) somewhere else and THEN provide the corresponding entry from row 2 and that (or those) column(s)?
> So in looking at the solution offered, what exactly was this supposed to do when dft is the transpose?
> idt = (dft[0] == 1) & (dft[1] == 5)
> Was the code (way below in this message) tried out or just written for us to ponder? I tried it. I got an answer of: 0 1 2
> V2 1 5 6
> That is not my understanding of what was requested. Row 2 (shown transposed as a column) is being shown as a whole. The request was for item "2" which would be just 6. Something more like this:
> print(dft[idt][2])
>
> But the code makes no sense to me. seems to explicitly test the first column (0) to see if it contains a 1 and then the second column (1) to see if it contains a 5. Not sure who cares about this hard-wired query as this is not my understanding of the question. You want any of the original three rows (now transposed) tested to see if it contains BOTH.
> I may have read the requirements wrong or it may not be explained well. Until I am sure what is being asked and whether there is a good reason someone wants a different solution, I see no reason to provide yet another solution.But just for fund, assuming dft contains the transpose of the original data, will this work?
> first = dft[dft.values == key1 ]second = first[first.values == key2 ]print(second[2])
> I get a 6 as an answer and suppose it could be done in one more complex expression if needed! LOL!
> -----Original Message-----
> From: Edmondo Giovannozzi <edmondo.g...@gmail.com>
> To: pytho...@python.org
> Sent: Sat, Jan 8, 2022 8:00 am
> Subject: Re: Extracting dataframe column with multiple conditions on row values
>
> Il giorno sabato 8 gennaio 2022 alle 02:21:40 UTC+1 dn ha scritto:
> > Salaam Mahmood,
> > On 08/01/2022 12.07, Mahmood Naderan via Python-list wrote:
> > > I have a csv file like this
> > > V0,V1,V2,V3
> > > 4,1,1,1
> > > 6,4,5,2
> > > 2,3,6,7
> > >
> > > And I want to search two rows for a match and find the column. For
> > > example, I want to search row[0] for 1 and row[1] for 5. The corresponding
> > > column is V2 (which is the third column). Then I want to return the value
> > > at row[2] and the found column. The result should be 6 then.
> > Not quite: isn't the "found column" also required?
> > > I can manually extract the specified rows (with index 0 and 1 which are
> > > fixed) and manually iterate over them like arrays to find a match. Then I
> > Perhaps this idea has been influenced by a similar solution in another
> > programming language. May I suggest that the better-answer you seek lies
> > in using Python idioms (as well as Python's tools)...
> > > key1 = 1
> > > key2 = 5
> > Fine, so far - excepting that this 'problem' is likely to be a small
> > part of some larger system. Accordingly, consider writing it as a
> > function. In which case, these two "keys" will become
> > function-parameters (and the two 'results' become return-values).
> > > row1 = df.iloc[0] # row=[4,1,1,1]
> > > row2 = df.iloc[1] # row=[6,4,5,2]
> > This is likely not native-Python. Let's create lists for 'everything',
> > just-because:
> >
> > >>> headings = [ "V0","V1","V2","V3" ]
> > >>> row1 = [4,1,1,1]
> > >>> row2 = [6,4,5,2]
> > >>> results = [ 2,3,6,7 ]
> >
> >
> > Note how I'm using the Python REPL (in a "terminal", type "python" (as
> > appropriate to your OpSys) at the command-line). IMHO the REPL is a
> > grossly under-rated tool, and is a very good means towards
> > trial-and-error, and learning by example. Highly recommended!
> >
> >
> > > for i in range(len(row1)):
> >
> > This construction is very much a "code smell" for thinking that it is
> > not "pythonic". (and perhaps the motivation for this post)
> >
> > In Python (compared with many other languages) the "for" loop should
> > actually be pronounced "for-each". In other words when we pair the
> > code-construct with a list (for example):
> >
> > for each item in the list the computer should perform some suite of
> > commands.
> >
> > (the "suite" is everything 'inside' the for-each-loop - NB my
> > 'Python-betters' will quickly point-out that this feature is not limited
> > to Python-lists, but will work with any :iterable" - ref:
> > https://docs.python.org/3/tutorial/controlflow.html#for-statements)
> >
> >
> > Thus:
> >
> > > for item in headings: print( item )
> > ...
> > V0
> > V1
> > V2
> > V3
> >
> >
> > The problem is that when working with matrices/matrixes, a math
> > background equips one with the idea of indices/indexes, eg the
> > ubiquitous subscript-i. Accordingly, when reading 'math' where a formula
> > uses the upper-case Greek "sigma" character, remember that it means "for
> > all" or "for each"!
> >
> > So, if Python doesn't use indexing or "pointers", how do we deal with
> > the problem?
> >
> > Unfortunately, at first glance, the pythonic approach may seem
> > more-complicated or even somewhat convoluted, but once the concepts
> > (and/or the Python idioms) are learned, it is quite manageable (and
> > applicable to many more applications than matrices/matrixes!)...
> > > if row1[i] == key1:
> > > for j in range(len(row2)):
> > > if row2[j] == key2:
> > > res = df.iloc[:,j]
> > > print(res) # 6
> > >
> > > Is there any way to use built-in function for a more efficient code?
> > This is where your idea bears fruit!
> >
> > There is a Python "built-in function": zip(), which will 'join' lists.
> > NB do not become confused between zip() and zip archive/compressed files!
> >
> > Most of the time reference book and web-page examples show zip() being
> > used to zip-together two lists into a single data-construct (which is an
> > iterable(!)). However, zip() will actually zip-together multiple (more
> > than two) "iterables". As the manual says:
> >
> > «zip() returns an iterator of tuples, where the i-th tuple contains the
> > i-th element from each of the argument iterables.»
> >
> > Ah, so that's where the math-idea of subscript-i went! It has become
> > 'hidden' in Python's workings - or putting that another way: Python
> > looks after the subscripting for us (and given that 'out by one' errors
> > in pointers is a major source of coding-error in other languages,
> > thank-you very much Python!)
> >
> > First re-state the source-data as Python lists, (per above) - except
> > that I recommend the names be better-chosen to be more meaningful (to
> > your application)!
> >
> >
> > Now, (in the REPL) try using zip():
> >
> > >>> zip( headings, row1, row2, results )
> > <zip object at 0x7f655cca6bc0>
> >
> > Does that seem a very good illustration? Not really, but re-read the
> > quotation from the manual (above) where it says that zip returns an
> > iterator. If we want to see the values an iterator will produce, then
> > turn it into an iterable data-structure, eg:
> >
> > >>> list( zip( headings, row1, row2, results ) )
> > [.('V0', 4, 6, 2), ('V1', 1, 4, 3), ('V2', 1, 5, 6), ('V3', 1, 2, 7)]
> >
> > or, to see things more clearly, let me re-type it as:
> >
> > [.
> > ('V0', 4, 6, 2),
> > ('V1', 1, 4, 3),
> > ('V2', 1, 5, 6),
> > ('V3', 1, 2, 7)
> > ]
> >
> >
> > What we now see is actually a "transpose" of the original 'matrix'
> > presented in the post/question!
> >
> > (NB Python will perform this layout for us - read about the pprint library)
> >
> >
> > Another method which can also be employed (and which will illustrate the
> > loop required to code the eventual-solution(!)) is that Python's next()
> > will extract the first row of the transpose:
> >
> > >>> row = next( zip( headings, row1, row2, results ) )
> > >>> row
> > ('V0', 4, 6, 2)
> >
> >
> > This is all-well-and-good, but that result is a tuple of four items
> > (corresponding to one column in the way the source-data was explained).
> >
> > If we need to consider the four individual data-items, that can be
> > improved using a Python feature called "tuple unpacking". Instead of the
> > above delivering a tuple which is then assigned to "row", the tuple can
> > be assigned to four "identifiers", eg
> >
> > >>> heading, row1_item, row2_item, result= next( zip( headings, row1,
> > row2, results ) )
> >
> > (apologies about email word-wrapping - this is a single line of Python-code)
> >
> >
> > Which, to prove the case, could be printed:
> >
> > >>> heading, row1_item, row2_item, result
> > ('V0', 4, 6, 2)
> >
> >
> > (ref:
> > https://docs.python.org/3/tutorial/datastructures.html?highlight=tuple%20unpacking#tuples-and-sequences)
> >
> >
> > Thus, if we repeatedly ask for the next() row from the zip-ped
> > transpose, eventually it will respond with the row starting 'V2' - which
> > is the desired-result, ie the row containing the 1, the 5, and the 6 -
> > and if you follow-through using the REPL, will be clearly visible.
> >
> >
> > Finally, 'all' that is required, is a for-each-loop which will iterate
> > across/down the zip object, one tuple (row of the transpose) at a time,
> > AND perform the "tuple-unpacking" all in one command, with an
> > if-statement to detect the correct row/column:
> >
> > >>> for *tuple-unpacking* in *zip() etc*:
> > ... if row1_item == *what?* and row2_item == *what?*
> > ... print( *which* and *which identifier* )
> > ...
> > V2 6
> >
> > Yes, three lines. It's as easy as that!
> > (when you know how)
> >
> > Worse: when you become more expert, you'll be able to compress all of
> > that down into a single-line solution - but it won't be as "readable" as
> > is this!
> >
> >
> > NB this question has a 'question-smell' of 'homework', so I'll not
> > complete the code for you - this is something *you* asked to learn and
> > the best way to learn is by 'doing' (not by 'reading').
> >
> > However, please respond with your solution, or any further question
> > (with the next version of the code so-far, per this first-post - which
> > we appreciate!)
> >
> > Regardless, you asked 'the right question' (curiosity is the key to
> > learning) and in the right way/manner. Well done!
> >
> >
> > NBB the above code-outline does not consider the situation where the
> > search fails/the keys are not found!
> >
> >
> > For further information, please review:
> > https://docs.python.org/3/library/functions.html?highlight=zip#zip
> >
> > Also, further to the above discussion of combining lists and loops:
> > https://docs.python.org/3/tutorial/datastructures.html?highlight=zip#looping-techniques
> >
> > and with a similar application (to this post):
> > https://docs.python.org/3/faq/programming.html?highlight=zip#how-can-i-sort-one-list-by-values-from-another-list
> >
> > --
> > Regards,
>
> You may also transpose your dataset. Then the index will become your column name and the column name become your index:
> To read your dataset:
>
> import pandas as pd
> import io
>
> DN = """
> V0,V1,V2,V3
> 4,1,1,1
> 6,4,5,2
> 2,3,6,7
> """
> df = pd.read_csv(io.StringIO(DN))
>
> Transpose it:
>
> dft = df.T
>
> Find all the index with your condition:
>
> idt = (dft[0] == 1) & (dft[1] == 5)
>
> Print the columns that satisfy your condition:
>
> print(dft[idt])
>
> As you see, without explicit loop.
> --
> https://mail.python.org/mailman/listinfo/python-list

I was just showing that one can transpose the dataframe and use logical indexing, as the OP was asking for a fast and efficient solution. It is just a hint and not a complete solution. And, of course, one have to put everything inside a function.

And yes, I tested what I have written.

There could be other problems, of course, if some columns are not numeric, but all this details depend on the problem at hands.
But, I may not have understood completely the problem presented by the OP.

Cheers, :-)

--
https://mail.python.org/mailman/listinfo/python-list