Mailing List Archive

Problem with concatenating two dataframes
In the following code, I am trying to create some key-value pairs in a dictionary where the first element is a name and the second element is a dataframe.

# Creating a dictionary
data = {'Value':[0,0,0]}
kernel_df = pd.DataFrame(data, index=['M1','M2','M3'])
dict = {'dummy':kernel_df}
# dummy  ->          Value
#               M1      0
#               M2      0
#               M3      0


Then I read a file and create some batches and compare the name in the batch with the stored names in dictionary. If it doesn't exist, a new key-value (name and dataframe) is created. Otherwise, the Value column is appended to the existing dataframe.


df = pd.read_csv('test.batch.csv')
print(df)
for i in range(0, len(df), 3):
    print("\n------BATCH BEGIN")
    batch_df = df.iloc[i:i+3]
    name = batch_df.loc[i].at["Name"]
    values = batch_df.loc[:,["Value"]]
    print(name)
    print(values)
    print("------BATCH END")
    if name in dict:
        # Append values to the existing key
        dict[name] = pd.concat( dict[name],values )   #### ERROR
    else:
        # Create a new pair in dictionary
        dict[name] = values;



As you can see in the output, the join statement has error.



   ID Name Metric  Value
0   0   K1     M1     10
1   0   K1     M2      5
2   0   K1     M3     10
3   1   K2     M1     20
4   1   K2     M2     10
5   1   K2     M3     15
6   2   K1     M1      2
7   2   K1     M2      2
8   2   K1     M3      2

------BATCH BEGIN
K1
   Value
0     10
1      5
2     10
------BATCH END

------BATCH BEGIN
K2
   Value
3     20
4     10
5     15
------BATCH END

------BATCH BEGIN
K1
   Value
6      2
7      2
8      2
------BATCH END




As it reaches the contact() statement, I get this error:

TypeError: first argument must be an iterable of pandas objects, you passed an object of type "DataFrame"


Based on the definition I wrote in the beginning of the code, "dict[name]" should be a dataframe. Isn't that?

How can I fix that?



Regards,
Mahmood
--
https://mail.python.org/mailman/listinfo/python-list
Re: Problem with concatenating two dataframes [ In reply to ]
On 2021-11-06 16:16, Mahmood Naderan via Python-list wrote:
> In the following code, I am trying to create some key-value pairs in a dictionary where the first element is a name and the second element is a dataframe.
>
> # Creating a dictionary
> data = {'Value':[0,0,0]}
> kernel_df = pd.DataFrame(data, index=['M1','M2','M3'])
> dict = {'dummy':kernel_df}
> # dummy  ->          Value
> #               M1      0
> #               M2      0
> #               M3      0
>
>
> Then I read a file and create some batches and compare the name in the batch with the stored names in dictionary. If it doesn't exist, a new key-value (name and dataframe) is created. Otherwise, the Value column is appended to the existing dataframe.
>
>
> df = pd.read_csv('test.batch.csv')
> print(df)
> for i in range(0, len(df), 3):
>     print("\n------BATCH BEGIN")
>     batch_df = df.iloc[i:i+3]
>     name = batch_df.loc[i].at["Name"]
>     values = batch_df.loc[:,["Value"]]
>     print(name)
>     print(values)
>     print("------BATCH END")
>     if name in dict:
>         # Append values to the existing key
>         dict[name] = pd.concat( dict[name],values )   #### ERROR
>     else:
>         # Create a new pair in dictionary
>         dict[name] = values;
>
>
>
> As you can see in the output, the join statement has error.
>
>
>
>    ID Name Metric  Value
> 0   0   K1     M1     10
> 1   0   K1     M2      5
> 2   0   K1     M3     10
> 3   1   K2     M1     20
> 4   1   K2     M2     10
> 5   1   K2     M3     15
> 6   2   K1     M1      2
> 7   2   K1     M2      2
> 8   2   K1     M3      2
>
> ------BATCH BEGIN
> K1
>    Value
> 0     10
> 1      5
> 2     10
> ------BATCH END
>
> ------BATCH BEGIN
> K2
>    Value
> 3     20
> 4     10
> 5     15
> ------BATCH END
>
> ------BATCH BEGIN
> K1
>    Value
> 6      2
> 7      2
> 8      2
> ------BATCH END
>
>
>
>
> As it reaches the contact() statement, I get this error:
>
> TypeError: first argument must be an iterable of pandas objects, you passed an object of type "DataFrame"
>
>
> Based on the definition I wrote in the beginning of the code, "dict[name]" should be a dataframe. Isn't that?
>
> How can I fix that?
>
You're trying to concatenate by passing the 2 items as the first 2
arguments to pd.concat, but I think that you're supposed to pass them as
an _iterable_, e.g. a list, as the first argument to pd.concat.

Try this instead:

dict[name] = pd.concat([dict[name], values])
--
https://mail.python.org/mailman/listinfo/python-list
Re: Problem with concatenating two dataframes [ In reply to ]
>Try this instead:
>
>
>    dict[name] = pd.concat([dict[name], values])


OK. That fixed the problem, however, I see that they are concatenated vertically. How can I change that to horizontal? The printed dictionary in the end looks like


{'dummy':     Value
M1      0
M2      0
M3      0, 'K1':    Value
0     10
1      5
2     10
6      2
7      2
8      2, 'K2':    Value
3     20
4     10
5     15}



For K1, there should be three rows and two columns labeled as Value.




Regards,
Mahmood





--
https://mail.python.org/mailman/listinfo/python-list
Re: Problem with concatenating two dataframes [ In reply to ]
On 2021-11-06 20:12, Mahmood Naderan wrote:
> >Try this instead:
> >
> >
> >    dict[name] = pd.concat([dict[name], values])
>
>
> OK. That fixed the problem, however, I see that they are concatenated vertically. How can I change that to horizontal? The printed dictionary in the end looks like
>
>
> {'dummy':     Value
> M1      0
> M2      0
> M3      0, 'K1':    Value
> 0     10
> 1      5
> 2     10
> 6      2
> 7      2
> 8      2, 'K2':    Value
> 3     20
> 4     10
> 5     15}
>
>
>
> For K1, there should be three rows and two columns labeled as Value.
>
The second argument of pd.concat is 'axis', which defaults to 0. Try
using 1 instead of 0.
--
https://mail.python.org/mailman/listinfo/python-list
Re: Problem with concatenating two dataframes [ In reply to ]
>The second argument of pd.concat is 'axis', which defaults to 0. Try
>using 1 instead of 0.


Unfortunately, that doesn't help...


dict[name] = pd.concat( [dict[name],values], axis=1 )



{'dummy':     Value
M1      0
M2      0
M3      0, 'K1':    Value  Value
0   10.0    NaN
1    5.0    NaN
2   10.0    NaN
6    NaN    2.0
7    NaN    2.0
8    NaN    2.0, 'K2':    Value
3     20
4     10
5     15}



Regards,
Mahmood



--
https://mail.python.org/mailman/listinfo/python-list
RE: Problem with concatenating two dataframes [ In reply to ]
The problem I see here is use of Pandas. I know I have he losing opinion, but people who use Python to load Panadas and then only use Pandas are missing out on the functionality of Python.

I'll bet you could code combining this data in pure Python, into one dictionary. In fact I'd be shocked if you couldn't do it.

---- Joseph S.


Teledyne Confidential; Commercially Sensitive Business Data

-----Original Message-----
From: Mahmood Naderan <nt_mahmood@yahoo.com>
Sent: Saturday, November 6, 2021 6:01 PM
To: python-list@python.org; MRAB <python@mrabarnett.plus.com>
Subject: Re: Problem with concatenating two dataframes

>The second argument of pd.concat is 'axis', which defaults to 0. Try
>using 1 instead of 0.


Unfortunately, that doesn't help...


dict[name] = pd.concat( [dict[name],values], axis=1 )



{'dummy':     Value
M1      0
M2      0
M3      0, 'K1':    Value  Value
0   10.0    NaN
1    5.0    NaN
2   10.0    NaN
6    NaN    2.0
7    NaN    2.0
8    NaN    2.0, 'K2':    Value
3     20
4     10
5     15}



Regards,
Mahmood



--
https://mail.python.org/mailman/listinfo/python-list