home
So I am dealing with a large data file which has 1.3 million rows. What I'm trying to do is simple,...
I am working with several big squared matrices of 1.3e6 rows, and I want to the diagonal of all of...
I have been wondering about this since the cloud technologies have exploded and this question...
I am a beginner in Big Data. So I have installed VirtualBox on my system with 8GB RAM, but after...
I am writing Orc files using MultipleOutputs format to create custom file name. I am settings in...
As per the documentation of accumulators in spark: Note that tasks on worker nodes cannot...
As per the documentation of accumulators: Note that tasks on worker nodes cannot access the...
I have a huge dataset of 292 million rows (6GB) in CSV format. Panda's read_csv function is not...
This question is an exact duplicate of: Spark Fixed Width File Import...
DocumentAccess access = DocumentAccess.createDefault(); DocumentType type =...
import pyspark.sql.functions as f df_categories4 = df_categories3.select("alias", "title",...
This question already has an answer here: Identifying country by IP...
Hello I have written python code which I need to convert to PySpark, But I am new to PySpark. I am...
I am using spark streaming and I read streams from Kafka. After reading this stream, I am adding it...
I have created a Searcher class in my application which have multiple documents. In Searcher class...
I'm working on a deep learning project with about 700GB of table-like time series data in thousands...
I am currently working on a project which requires generating predictions every month based on a...
I want to create an application in Vespa whcih fetch the data from some appications and feed into...
This is the first time I am using Indexes, I am confused why we need them and how to implement them...
I have a very large pandas dataframe. The dataframe looks like this: >> df "a_1" "a_2"...
I am trying to understand all if there is a real difference between data lake and Big data if you...
I am trying to sum up moving data within 4 range. I need to consider only Max of date device row...
I am trying to import my excel file to PySpark in Azure-DataBricks machine, which I have to move to...
This question is an exact duplicate of: Spark Fixed Width File Import...
How spark handle bigger files that are greater than available cluster memory? I believe during...
Here is the scenario that we have to solve : Pull all PDF files from server. convert PDF to...
Please note my special requirement of passing value from Child Spark job to Parent standard job. I...
We installed some Big Data components like Apache Hadoop, Spark, and Kafka differents virtual...
I want to covert row into column using spark dataframe. My table is like...
I have tried conf.set("mapreduce.output.textoutputformat.separator",""); but it didnt work. O/P...