Questions tagged [pandas]
Pandas is a Python library for Panel Data manipulation and analysis, e.g. multidimensional time series and cross-sectional data sets commonly found in statistics, experimental science results, econometrics, or finance. IMPORTANT: When asking a question with this tag, please tag your questions: [...
79,035 questions
0
votes
0answers
4 views
How to get the index value in a dataframe by comparing date with a datetime object in that dataframe?
I have a dataframe like the following. I would like to get the index value by checking the date. For example if the date is 2018-04-05, I need to get the index value as 3. Can someone let me know how ...
1
vote
0answers
15 views
ValueError when trying to create a DF out of two lists with the same length [duplicate]
I have 2 different lists of lists A and B.
print(len(A))
288
print(len(B))
288
Making them flat:
flat_list = [item for sublist in A for item in sublist]
flat_list2 = [item for sublist in B for item ...
0
votes
1answer
35 views
What datetime format is this and how do I parse it?
I have some data that I'm pulling from an API and the date is formatted like this: '1522454400000'
Not sure how to parse it but this is what I have (unsuccessfully tried)
df = DataFrame(test)
df....
0
votes
2answers
21 views
Excel Rows to sentences Python
Let's say I have an Excel file with 5 rows and 2 columns.
apples color
honeycrsp red
gala red
goldendel orange
fuji red
grannys green
I want to place each of the rows ...
0
votes
1answer
20 views
How to return different type of objects overloading sum function in python?
I have a class Data and I want to overload the __add__ function and get different type of objects based on the type of the objects I sum.
Toy example code
The Data class is as follows:
class Data(...
0
votes
0answers
16 views
Refering Column with a index in Pandas Jupyter Notebook
I am using python 2.7.13 and pandas in Jupyter notebook.
I have the following data
https://drive.google.com/file/d/1pko9oRmCllAxipZoa3aoztGZfPAD2iwj/view?usp=sharing
which is available on the ...
1
vote
1answer
29 views
how to get the minimum increase in a table Pandas?
I'm trying to get the minimum increase between rows in a column in my table.
my attempt so far
import pandas as pd
df = pd.DataFrame({'A': [0, 100, 50, 100],
'B': [5, 2, 2, 0],
...
0
votes
1answer
24 views
Pandas Counting Character Occurrences
Let's say I have a dataframe that looks like this:
df2 = pd.DataFrame(['2018/10/02, 10/2', '02/20/18', '10-31/2018', '1111-0-1000000', '2018/10/11/2019/9999', '10-2, 11/2018/01', '10/2'], columns=['A'...
0
votes
0answers
18 views
df.replace produces “Passing list-likes to .loc or []” warning
I see lots of people getting this error but none for the same reason I am (or at least, doesn't seem to be the same reason!)
I'm trying to replace the string '--' in an imported file with a 0. Simple ...
0
votes
0answers
27 views
Pandas - Faster way to split by last \ and use part of string in new column
I created a while loop which seperates the file path from the file/exe column of a pandas dataframe and puts the file path into a new column.
#Count rows
rows = len(DF1)
#While loop to grab file path ...
0
votes
2answers
26 views
Value_counts on multiple columns with groupby
I need some help with Pandas.
I have following dataframe:
df = pd.DataFrame({'1Country': ['FR', 'FR', 'GER','GER','IT','IT', 'FR','GER','IT'],
'2City': ['Paris', 'Paris', 'Berlin', '...
0
votes
1answer
26 views
Arguments of Functions within Functions [on hold]
This is a more broad question instead of a specific problem but I'm finding that I'm writing functions that use other functions that i've previously written and i keep having to pass the previous ...
-1
votes
1answer
32 views
Keep order after melt in pandas
input Data:
╔════╦══════╦══════╦══════╦══════╦══════╗
║ ID ║ q104 ║ q204 ║ q304 ║ q404 ║ q105 ║
╠════╬══════╬══════╬══════╬══════╬══════╣
║ 1 ║ 12 ║ 43 ║ 23 ║ 22 ║ 42 ║
║ 2 ║ 23 ║ 56 ║...
1
vote
1answer
38 views
Split multiple times?
So I'm currently transferring a txt file into a csv. It's mostly cleaned up, but even after splitting there are still empty columns between some of my data.
Below is my messy CSV file
And here is ...
2
votes
3answers
20 views
How to set column names with DataFrame.T
I have a data frame that I learned that I can "flip" with df.T but I am wondering how to add the new column names at the same time that I transpose the data frame.
My data is like this:
dict = {"a":[...
-1
votes
0answers
16 views
Pandas Colored Dataframe not Appearing [duplicate]
I'm fairly new to Python, and I'm running into a few issues with creating a color-coded pandas dataframe.
I have my dataframe created and a color coding definition and I run the following code:
...
1
vote
2answers
58 views
How to get average of increasing values using Pandas?
I'm trying to figure out the average of increasing values in my table per column.
my table
A | B | C
----------------
0 | 5 | 10
100 | 2 | 20
50 | 2 | 30
100 | 0 | 40
function I'...
0
votes
1answer
19 views
Generate object with data from csv efficiently in python
I have a .csv file with node information (including node_id, x, y), and I try to generate object for each record in .csv file. Now I'm using apply method, but it take almost same running time compared ...
0
votes
2answers
40 views
Python Pandas compare values in multiple columns for partial duplicates and drop record
I need to create a function/expression that compares multiple columns ('Cust ID Count', 'Revenue' and possibly 'Family Name' for a record match and then keeps only the first record based on ascending ...
0
votes
2answers
25 views
Pandas Dataframe Yahoo Finance Checking if Volume Meets Criteria
The program below imports thousands of stock tickers from a .CSV file to a list and passes the tickers as a parameter to a function which pulls the 'Adjusted Close' column of that particular stock and ...
0
votes
1answer
20 views
Python - running into x_test y_test fit errors
I have built a neural network and it worked fine with a small dataset of around 300,000 rows with 2 categorical variables and 1 independent variable, but was running into memory errors when i ...
-1
votes
4answers
43 views
How to check if elements in one array exist in another array if so print the count using Python
I have two arrays
A=[1,2,3,4,6,5,5,5,8,9,7,7,7]
B=[1,5,7]
If elements of B in A then print the number of occurrences
output
1:1
5:3
7:3
1
vote
1answer
46 views
Dropping rows in Python using != operator is not working
I want to drop rows in my dataset using:
totes = df3.loc[(df3['Reporting Date'] != '18/08/2017') & (df3['Business Line'] != 'Bondy')]
However it is not what I expect; I know that the number of ...
-1
votes
0answers
31 views
over 170G RAM usage - Pandas crosstab - purchases data
I have a transactions data with csv columns head:
userId, timestamp, event_data, itemId
with 171530 rows.
I want to make transform it into the form - just an illustration - :
In the image, ...
0
votes
2answers
17 views
How to replace the values in a dataframe column based on another dataframe condition
I have two dataframe, XXX and override.
XXX = pd.DataFrame({'A':['One', 'Two', 'Three'], 'B': [6,4,3], 'C': ['red','green','blue']})
override = pd.DataFrame({'A':['One','Two'], 'C': ['apple','pie']})...
1
vote
2answers
30 views
How do I calculate moving average with customized weight in pandas?
I have a dataframe than contains two columns, a: [1,2,3,4,5]; b: [1,0.4,0.3,0.5,0.2]. How can I make a column c such that:
c[0] = 1
c[i] = c[i-1]*b[i]+a[i]*(1-b[i])
so that c:[1,1.6,2.58,3.29,4....
0
votes
2answers
14 views
Python Pandas: Create New Column With Calculations Based on Categorical Values in A Different Column
I have the following sample data frame:
id category time
43 S 8
22 I 10
15 T 350
18 L 46
I want to apply the following logic:
1)...
0
votes
0answers
10 views
Pandas with RegExp Producing Leading and Trailing NAN columns
I have some simple data in a file that I'm reading in with pandas:
2018:08:23:07:35:22:INFO:__main__:Info logger message
There are no beginning or trailing tabs, spaces, etc. in the file.
I read ...
1
vote
1answer
12 views
For loop for dropping a string pattern from a column name
I am attempting to drop '_Adj' from a column name, in a 'df_merged' data frame if (1) a column name contains 'eTIV' or "eTIV1'.
for col in df_merged.columns:
if 'eTIV1' in col or 'eTIV' in col:
...
0
votes
1answer
28 views
Reading all excel files from a directory instead of listing them individually
I made a program to merge excel files based on listing their specific file names [4] but if I want merge all files listed in a particular directory (say a folder called test on my desktop) how would I ...
0
votes
3answers
23 views
Reading a file with pandas and use correlation coefficients on two columns
I have a file like following with no header
0.000000 0.330001 0.280120
1.000000 0.355590 0.298581
2.000000 0.305945 0.280231
I want to read this file using pandas dataframe and want to perform ...
0
votes
0answers
29 views
Iterate over dataframe to optimize project management
I have a pandas dataframe that contains the existing relationships between three ids: manager_id, employee_id, project_id. By changing which managers manage which employees, I'd need to find the ...
0
votes
0answers
12 views
Pivoting / Unstacking Text fields in Pandas [duplicate]
Struggling with an unstacking problem. What I am trying to do is similar to the unstack function in jmp, while keeping all of my columns, but I want to do it in Python instead of jsl. I have a table ...
0
votes
1answer
21 views
Math on rows in column of pandas dataframe
I am trying to find code that will allow me to subtract a value in the last row of a column from the value in the second to last value in the same column. Here's what I have tried.
df_stock2['...
0
votes
1answer
27 views
Legend Overlapping in graph plotting area
I have a dataframe as below (obtained after lot of preprocessing)
Please find dataframe
d = {'token': {361: '180816_031', 119: '180816_031', 101: '180816_031', 135: '180816_031', 292: '180816_031',
...
0
votes
1answer
32 views
Pandas find columns with unique values
I have two databases (each with 1000's of tables) which are supposed to reflect the same data but they come from two different sources. I compared two tables to see what the differences were, but to ...
1
vote
3answers
34 views
Pandas - Extract Text from Rows
Let's say I have a dataframe that looks like this:
df2 = pd.DataFrame(['Apple, 10/01/2016, 31/10/18, david/kate', 'orange', 'pear', 'Apple', '10/01/2016', '02/20/2017'], columns=['A'])
>>> ...
0
votes
0answers
4 views
What other data profiling libraries except pandas_profiling
I an working on to get data profiling on tables in database like redshift or snowflake, are there any other data profiling python libraries except pandas_profiling? Thanks!
0
votes
3answers
30 views
Normalizing rows of pandas dataframe
I need to normalize the rows of a dataframe containing rows populated with all zero. For example:
df= pd.DataFrame({"ID": ['1', '2', '3', '4'], "A": [1, 0, 10, 0], "B": [4, 0, 30, 0]})
ID A B
1 ...
0
votes
1answer
19 views
Pandas Datetime AVERAGE
DataFrame where Date is datetime:
Column | Date
:-----------|----------------------:
A | 2018-08-05 17:06:01
A | 2018-08-05 17:06:02
A | ...
0
votes
0answers
17 views
Set an attribute as default for every object of a class
In my script each time a certain type of object is called I have to repeat a certain attribute :
if my object df is a dataframe, each time I'm calling df in my script I have to write df.style....
0
votes
0answers
12 views
Complicated Groupby average
I am having trouble doing a groupby mean on this data. I am using this groupby.
What I am trying to do is find a vectorized way to average o1 and o2 or the last two columns of the data by the ...
0
votes
0answers
20 views
Pythonic way to write functions for pandas dataframe manipulation
I am doing some data analysis in python using pandas. In the analysis, I am writing a lot of functions that look something like this
import pandas as pd
def my_func(data):
"""
A function to ...
0
votes
2answers
19 views
read pandas colum with number values and missing data as string
I have an Id column in my data frame like this:
a = pandas.DataFrame([12673, 44, 847])
This data has some missing values. If I Keep_default_NA = True, then the missing value is filled by NaN, and ...
2
votes
1answer
37 views
Pandas query multiindex dataframe based on another single index dataframe
I have two dataframes: Data & Positions.
Data has multiindex: 'Date' and 'Symbol'.
Positions has a single index: 'Date' (and has a column with the label 'Symbol').
Both 'Date'-s are ...
0
votes
1answer
20 views
pandas - fill in empty row values with other row values conditionally
I have a table that looks like this (the ratio column was merged from another table based on the codename and date):
date codename ratio
2018-01-01 A .5
2018-02-01 A
...
0
votes
4answers
29 views
panda aggregate by functions
I have data like below:
id movie details value
5 cane11 good 6
5 wind2 ok 30.3
5 wind1 ok 18
5 cane2 good 2
5 cane12 ok 4
5 cane14 good 7
5 wind2 ok 2
I want ...
1
vote
2answers
20 views
Filter dataframe by two columns in Pandas
I have a dataframe A, contains hourly weather data for each city.
City Hour Temperature
A 1 30
A 2 32
...
B 1 39
B 2 40
I have another dataframe B, which ...
0
votes
0answers
25 views
put values in single columns after pd.DataFrame.from_dict [duplicate]
I have a dataframe created with pd.DataFrame.from_dict the result looks something like this:
a b
0 [0.042167, 2.913] [0.042168, 0.245]
1 [0.042164, 1.739]...
-2
votes
0answers
41 views
Python for loop writing to dataframe
I'm pretty new to coding and I have made a for loop that iterates through rows in a 40k row counting dataframe. In the for loop it connects to an API and gets data which it then puts in a dataframe. ...