Mailing List Archive

double bracket integer index in pandas; Is this a legal syntax
Hello,

I am hope that pandas questions are OK here.

In a panda lecture, I did not get the expected result.

I tried this on two different platforms
(old macOS distro and up-to-date Ubuntu Linux distro, 22.04)

The Linux distro has:
python 3.10.11
pandas 1.5.2
conda 23.3.1

Is this double bracket form, df[[1]], deprecated... maybe?

There is data in a dataframe, df.

>>> subset = df[[1]]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File
"/home/dks/anaconda3/lib/python3.10/site-packages/pandas/core/frame.py",
line 3811, in __getitem__
indexer = self.columns._get_indexer_strict(key, "columns")[1]
File
"/home/dks/anaconda3/lib/python3.10/site-packages/pandas/core/indexes/base.py",
line 6113, in _get_indexer_strict
self._raise_if_missing(keyarr, indexer, axis_name)
File
"/home/dks/anaconda3/lib/python3.10/site-packages/pandas/core/indexes/base.py",
line 6173, in _raise_if_missing
raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Int64Index([1], dtype='int64')] are in the [columns]"

What could be making this fail?

Thank you!
az
--
https://mail.python.org/mailman/listinfo/python-list
Re: double bracket integer index in pandas; Is this a legal syntax [ In reply to ]
On 03May2023 03:41, Artie Ziff <artie.ziff@gmail.com> wrote:
>I am hope that pandas questions are OK here.

There are some pandas users here.

>In a panda lecture, I did not get the expected result.
>I tried this on two different platforms
>(old macOS distro and up-to-date Ubuntu Linux distro, 22.04)
>
>The Linux distro has:
>python 3.10.11
>pandas 1.5.2
>conda 23.3.1
>
>Is this double bracket form, df[[1]], deprecated... maybe?
>
>There is data in a dataframe, df.
>
>>>> subset = df[[1]]

Whether this works depends on the contents of the dataframe. You're
supplying this index:

[1]

which is a list of ints (with just one int).

Have a look at this page:
https://pandas.pydata.org/docs/user_guide/indexing.html

If you suppply a list, it expects a list of labels. Is 1 a valid label
for your particular dataframe?

Cheers,
Cameron Simpson <cs@cskk.id.au>
--
https://mail.python.org/mailman/listinfo/python-list
Re: double bracket integer index in pandas; Is this a legal syntax [ In reply to ]
I agree with your analysis, Cameron.

The code came from a video course, "Pandas Data Analysis with Python
Fundamentals" by Daniel Chen.

I am curious why the author may have said this. To avoid attaching
screenshots, I'll describe this section of the content. Perhaps someone can
say, "oh that's how it used to work"... haha

D.CHEN:
"You can also subset the columns by number. If we wanted to get the first
column from our data set, we would use zero":

df = pandas.read_csv('./data/gapminder.tsv', sep='\t')
>>> subset = df[[0]]
>>> print(subset.head())
country
0 Afghanistan
1 Afghanistan
2 Afghanistan
3 Afghanistan
4 Afghanistan

Data for the course:
https://github.com/chendaniely/pandas_for_everyone.git

"df[[0]]" is being described to the course student as selecting the first
column of data. :-)

I'll study that link.
Thank you.
--
https://mail.python.org/mailman/listinfo/python-list
Re: double bracket integer index in pandas; Is this a legal syntax [ In reply to ]
On 03May2023 17:52, Artie Ziff <artie.ziff@gmail.com> wrote:
>The code came from a video course, "Pandas Data Analysis with Python
>Fundamentals" by Daniel Chen.
>
>I am curious why the author may have said this. To avoid attaching
>screenshots, I'll describe this section of the content. Perhaps someone can
>say, "oh that's how it used to work"... haha

Unlikely; Python indices (and by implication Pandas indices) have
counted from 0 since forever. I suspect just a typo/braino.

>D.CHEN:
>"You can also subset the columns by number. If we wanted to get the first
>column from our data set, we would use zero":
>
>df = pandas.read_csv('./data/gapminder.tsv', sep='\t')
>>>> subset = df[[0]]
>>>> print(subset.head())
> country
>0 Afghanistan
>1 Afghanistan
>2 Afghanistan
>3 Afghanistan
>4 Afghanistan
>
>Data for the course:
>https://github.com/chendaniely/pandas_for_everyone.git
>
>"df[[0]]" is being described to the course student as selecting the first
>column of data. :-)

Well, I would say it makes a new dataframe with just the first column.

So:

df[ 0 ] # spaces for clarity

would (probably, need to check) return the Series for the first column.
versus:

df[ [0] ] # spaces for clarity

makes a new dataframe with only the first column.

A dataframe can be thought of as an array of Series (one per column).

Cheers,
Cameron Simpson <cs@cskk.id.au>
--
https://mail.python.org/mailman/listinfo/python-list