home
I am trying to work with file as a stream with window. Here is the code object Prog { def...
I'm trying to understand better the immutability of sstables in Cassandra. It's very clear what...
I want to change this code to specifically read from line 1400001 to 1450000. What is...
I am new to Apache Spark, Scala and Hadoop tools. I have setup a new local single node Hadoop...
I'm searching a very large text database file (5+ GB) line by line for a specific pattern and...
I'm trying to move from Microsoft SQL to Big Data/Big Data Analytics/BI. I live in Orlando, FL and...
How to HBase table data to convert .CSV file, im trying to convert table data to csv format , but...
I have one spark dataset Dataset<T> loaded from Cassandra Table, and I want to apply list of...
My hiveql file has create table statement with huge struct<>, array> I am able to create the...
I have a crunch job where a cell can contain hundreds of thousands of cells (the data is split into...
I have made a web app in python using Dash, and now I am in the deployment stage. My overall...
I'm very new in Hadoop, I'm using Spark with Java. I have dynamic JSON, exmaple: { ...
4 days ago

Live data analysis

There's an application running and I need to pull data from the application for analysis. I have a...
I have installed virtualBox from Oracle version 5.2.12 and I have imported appliance...
I'm completely new to hadoop setup. So far I've configured and installed Hortanworks HDP sandbox...
While installing pig version 0.17.0 on my ubuntu system i am facing an error after i run a command...
I have learnt hadoop recently. I have an idea in mind. As a recruiter, if I want to analyse all the...
I have written a query to find records which exists in one table but not in the other. Here is the...
I want to try Spark using Amazon EC2, unfortunately companies like amazon and Microsoft don't...
for each row that I read from hive through spark scala, I need to create the values A1,A2...An as a...
I want to initialise an empty DataFrame in Spark (Scala). The number of columns in the DataFrame...
I couldn't find any plain English explanations regarding Apache Parquet files. Such as: What are...
I have tested Gobblin with Hadoop and Apache Kafka using Kafka-HDFS-Ingestion Job. The example is...
I'm having difficulties mining a big (100K entries) dataset of mine concerning logistics...
My understanding is that hive stores all the metadata information in the hive-metastore and if...
I use below code for reading review json file. recently I have a problem for reading 1450000 and...
I have a lot of data (JSON string) per day (around 150-200B). I want to insert the JSON to Hadoop,...
Can we have different block sizes for different types of datasets? Suppose I have two tables, one...
I have a hive table in ORC format with 30million records. I wanted to improve performance and hence...
I want to implement SFTP client using JSCH java library. I have below queries. Please suggest your...