python pandas Problem repeating the previous value

Question

is my code

import pandas as pd

columns1 = ['Student ID', 'Course ID', 'Marks']
data1 = [(1, 10, 100), (2, 400, 200), (3, 30, 300), (3, 30, 300), (3, 30, 300), (3, 30, 300), (3, 30, 300), (3, 30, 300)]
df1 = pd.DataFrame(data1, columns=columns1)

Student ID	Course ID	Marks
1	10	100
2	400	200
3	30	300
3	30	300
3	30	300
3	30	300
3	30	300
3	30	300

df1['s']  = np.where((df1['Course ID']  > df1['Marks'])  == True, df1['Student ID'],  df1['s'].shift(1)) 
df1

Student ID	Course ID	Marks	s
1	10	100	NaN
2	400	200	2
3	30	300	2
3	30	300	NaN
3	30	300	NaN
3	30	300	NaN
3	30	300	NaN
3	30	300	NaN

As you can see, only the information of two rows has changed and the rest are null. This is the result I expect because after column 2 condition "df1['Course ID'] > df1['Marks']" is true

Student ID	Course ID	Marks	s
1	10	100	NaN
2	400	200	2
3	30	300	2
3	30	300	2
3	30	300	2
3	30	300	2
3	30	300	2
3	30	300	2

Thank you for your help

Hi, welcome to SO. With df1['s'].shift(1) you are shifting a column ('s') that does not yet exist, so there is no way that the second image represents the result of running this exact code. You would receive an error. Please update your post with the expected output (as text, not as an image), and explain what logic you are trying to implement. — ouroboros1
– ouroboros1, Commented May 19, 2024 at 10:21
It's hard to know what you want, and your code doesn't reproduce your output. — Panda Kim
– Panda Kim, Commented May 19, 2024 at 10:24
Looks like you want df1['s'] = df1['Student ID'].where(df1['Course ID'] > df1['Marks']).ffill(). Chain .astype('Int64') if you want nullable integers. — ouroboros1
– ouroboros1, Commented May 19, 2024 at 10:43

mozway · Accepted Answer · 2024-05-19 16:32:31Z

If you want to assign the Student ID for rows matching the df1['Course ID'] > df1['Marks'] condition, and for other rows take the previous value, use ffill:

df1['s'] = (df1['Student ID']
            .where(df1['Course ID'] > df1['Marks'])
            .ffill()
            .convert_dtypes() # optional
           )

Output (with a slightly different input):

   Student ID  Course ID  Marks     s
0           1         10    100  <NA>
1           2        400    200     2
2           3         30    300     2
3           3         30    300     2
4           3        400    300     3
5           3         30    300     3
6           3         30    300     3
7           3         30    300     3

If you only want to apply this logic per Student ID, which might make more sense to avoid "leaking" values from one student to another rather use groupby.ffill:

df1['s'] = (df1['Student ID']
            .where(df1['Course ID'] > df1['Marks'])
            .groupby(df1['Student ID']).ffill()
            .convert_dtypes() # optional
           )

Or:

df1['s'] = (df1['Student ID']
            .where(df1['Course ID'].gt(df1['Marks'])
                   .groupby(df1['Student ID']).cummax())
            .convert_dtypes() # optional
           )

Output:

   Student ID  Course ID  Marks     s
0           1         10    100  <NA>
1           2        400    200     2
2           3         30    300  <NA>
3           3         30    300  <NA>
4           3        400    300     3
5           3         30    300     3
6           3         30    300     3
7           3         30    300     3

Collectives™ on Stack Overflow

python pandas Problem repeating the previous value

1 Answer 1

Comments

Your Answer

Hot Network Questions

Student ID	Course ID	Marks	s
1	10	100	NaN
2	400	200	2
3	30	300	2
3	30	300	NaN
3	30	300	NaN
3	30	300	NaN
3	30	300	NaN
3	30	300	NaN

Student ID	Course ID	Marks	s
1	10	100	NaN
2	400	200	2
3	30	300	2
3	30	300	NaN
3	30	300	NaN
3	30	300	NaN
3	30	300	NaN
3	30	300	NaN

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related

Student ID	Course ID	Marks	s
1	10	100	NaN
2	400	200	2
3	30	300	2
3	30	300	NaN
3	30	300	NaN
3	30	300	NaN
3	30	300	NaN
3	30	300	NaN