JDBC River missing documents??

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

JDBC River missing documents??

GWired
Hi All,

I've just been informed that i'm off by up to 100k records or so in my jdbc river fed index.

I am using the column strategy using a createddate and lastmodified date.

Kibana is reporting an entirely different # than what i see reported in the DB..

Table A has 978634 in SQL, 934646 shown in Kibana.
Table B has 957327 in SQL, 876725 shown in Kibana.
Table C has 312826 in SQL, 238534 shown in Kibana

I see in the ES logs

Table A metrics: 979044 rows,
Table B metrics: 957591 rows
Table C metrics: 312827 rows,

These are the right numbers...well at least closer to right.

But if i do this using Sense:

GET jdbc/mytable/_count?q=*

It returns the same # as Kibana is return. 

This erring version is running on ES 1.5.1 with Kibana version 3.0

On another server with ES 1.5.0 and Kibana 3.0 it is working just fine #'s match up.

Any ideas?





--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ff5c5f56-219f-4cf0-8e90-b93cfdaa399e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: JDBC River missing documents??

GWired
Found this in the logs:

[2015-04-22 22:01:25,063][ERROR][river.jdbc.BulkNodeClient] bulk [15] failed with 945 failed items, failure message = failure in bulk execution:



On Wednesday, April 22, 2015 at 7:53:25 PM UTC-5, GWired wrote:
Hi All,

I've just been informed that i'm off by up to 100k records or so in my jdbc river fed index.

I am using the column strategy using a createddate and lastmodified date.

Kibana is reporting an entirely different # than what i see reported in the DB..

Table A has 978634 in SQL, 934646 shown in Kibana.
Table B has 957327 in SQL, 876725 shown in Kibana.
Table C has 312826 in SQL, 238534 shown in Kibana

I see in the ES logs

Table A metrics: 979044 rows,
Table B metrics: 957591 rows
Table C metrics: 312827 rows,

These are the right numbers...well at least closer to right.

But if i do this using Sense:

GET jdbc/mytable/_count?q=*

It returns the same # as Kibana is return. 

This erring version is running on ES 1.5.1 with Kibana version 3.0

On another server with ES 1.5.0 and Kibana 3.0 it is working just fine #'s match up.

Any ideas?





--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5561753d-9553-4bc5-bea2-102b7e030396%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: JDBC River missing documents??

joergprante@gmail.com
There are log messages at ES cluster side, you should look there why bulk indexing failed. 

Jörg

On Thu, Apr 23, 2015 at 5:45 AM, GWired <[hidden email]> wrote:
Found this in the logs:

[2015-04-22 22:01:25,063][ERROR][river.jdbc.BulkNodeClient] bulk [15] failed with 945 failed items, failure message = failure in bulk execution:



On Wednesday, April 22, 2015 at 7:53:25 PM UTC-5, GWired wrote:
Hi All,

I've just been informed that i'm off by up to 100k records or so in my jdbc river fed index.

I am using the column strategy using a createddate and lastmodified date.

Kibana is reporting an entirely different # than what i see reported in the DB..

Table A has 978634 in SQL, 934646 shown in Kibana.
Table B has 957327 in SQL, 876725 shown in Kibana.
Table C has 312826 in SQL, 238534 shown in Kibana

I see in the ES logs

Table A metrics: 979044 rows,
Table B metrics: 957591 rows
Table C metrics: 312827 rows,

These are the right numbers...well at least closer to right.

But if i do this using Sense:

GET jdbc/mytable/_count?q=*

It returns the same # as Kibana is return. 

This erring version is running on ES 1.5.1 with Kibana version 3.0

On another server with ES 1.5.0 and Kibana 3.0 it is working just fine #'s match up.

Any ideas?





--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5561753d-9553-4bc5-bea2-102b7e030396%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGqSZ6_5qY6%3D6QSm8RAq2%3Dwbp9%2BV7R0ZsACfnNzMZ7vjg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: JDBC River missing documents??

GWired
My theory is that i was overloading my ES VM's on initial loads or when doing large loads.

My cpu would jump to 99% and during the pulls it would fail silently and the river thought it got all the documents but didn't.

I have since rewritten my rivers using Stored procedures instead and it pulls much faster and CPU doesn't go to high.

The procedure keeps track of the created_at and updated_at for me instead of using the column strategy.  I also chunked my initial calls into logical batches.  Mine happen to be date based by month.  I couldn't figure out a way that made sense using row sizes or anything and month seemed just as good as any and it works.

This also solved my problem of when / ES is turned off.  It now will just start back up where it last left off because SQL server is storing that info instead of ES.  I also made it so I can rebuild at will.

Thanks Garrett



On Thursday, April 23, 2015 at 2:48:03 AM UTC-4, Jörg Prante wrote:
There are log messages at ES cluster side, you should look there why bulk indexing failed. 

Jörg

On Thu, Apr 23, 2015 at 5:45 AM, GWired <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="JOvS-U2ALY0J" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">garrett...@...> wrote:
Found this in the logs:

[2015-04-22 22:01:25,063][ERROR][river.jdbc.BulkNodeClient] bulk [15] failed with 945 failed items, failure message = failure in bulk execution:



On Wednesday, April 22, 2015 at 7:53:25 PM UTC-5, GWired wrote:
Hi All,

I've just been informed that i'm off by up to 100k records or so in my jdbc river fed index.

I am using the column strategy using a createddate and lastmodified date.

Kibana is reporting an entirely different # than what i see reported in the DB..

Table A has 978634 in SQL, 934646 shown in Kibana.
Table B has 957327 in SQL, 876725 shown in Kibana.
Table C has 312826 in SQL, 238534 shown in Kibana

I see in the ES logs

Table A metrics: 979044 rows,
Table B metrics: 957591 rows
Table C metrics: 312827 rows,

These are the right numbers...well at least closer to right.

But if i do this using Sense:

GET jdbc/mytable/_count?q=*

It returns the same # as Kibana is return. 

This erring version is running on ES 1.5.1 with Kibana version 3.0

On another server with ES 1.5.0 and Kibana 3.0 it is working just fine #'s match up.

Any ideas?





--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="JOvS-U2ALY0J" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/5561753d-9553-4bc5-bea2-102b7e030396%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/5561753d-9553-4bc5-bea2-102b7e030396%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/5561753d-9553-4bc5-bea2-102b7e030396%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/5561753d-9553-4bc5-bea2-102b7e030396%40googlegroups.com.

For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5c2016b5-ef5f-4f54-ac37-b2991eba0775%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: JDBC River missing documents??

joergprante@gmail.com
Nice work, can you share the recipe with the community?

I could post it on the JDBC plugin wiki

Jörg

On Wed, Apr 29, 2015 at 1:56 PM, GWired <[hidden email]> wrote:
My theory is that i was overloading my ES VM's on initial loads or when doing large loads.

My cpu would jump to 99% and during the pulls it would fail silently and the river thought it got all the documents but didn't.

I have since rewritten my rivers using Stored procedures instead and it pulls much faster and CPU doesn't go to high.

The procedure keeps track of the created_at and updated_at for me instead of using the column strategy.  I also chunked my initial calls into logical batches.  Mine happen to be date based by month.  I couldn't figure out a way that made sense using row sizes or anything and month seemed just as good as any and it works.

This also solved my problem of when / ES is turned off.  It now will just start back up where it last left off because SQL server is storing that info instead of ES.  I also made it so I can rebuild at will.

Thanks Garrett



On Thursday, April 23, 2015 at 2:48:03 AM UTC-4, Jörg Prante wrote:
There are log messages at ES cluster side, you should look there why bulk indexing failed. 

Jörg

On Thu, Apr 23, 2015 at 5:45 AM, GWired <[hidden email]> wrote:
Found this in the logs:

[2015-04-22 22:01:25,063][ERROR][river.jdbc.BulkNodeClient] bulk [15] failed with 945 failed items, failure message = failure in bulk execution:



On Wednesday, April 22, 2015 at 7:53:25 PM UTC-5, GWired wrote:
Hi All,

I've just been informed that i'm off by up to 100k records or so in my jdbc river fed index.

I am using the column strategy using a createddate and lastmodified date.

Kibana is reporting an entirely different # than what i see reported in the DB..

Table A has 978634 in SQL, 934646 shown in Kibana.
Table B has 957327 in SQL, 876725 shown in Kibana.
Table C has 312826 in SQL, 238534 shown in Kibana

I see in the ES logs

Table A metrics: 979044 rows,
Table B metrics: 957591 rows
Table C metrics: 312827 rows,

These are the right numbers...well at least closer to right.

But if i do this using Sense:

GET jdbc/mytable/_count?q=*

It returns the same # as Kibana is return. 

This erring version is running on ES 1.5.1 with Kibana version 3.0

On another server with ES 1.5.0 and Kibana 3.0 it is working just fine #'s match up.

Any ideas?





--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5561753d-9553-4bc5-bea2-102b7e030396%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5c2016b5-ef5f-4f54-ac37-b2991eba0775%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGzT%2BCFJ9XOSt%2BYPBJ724A40yUF3HPB7iOqoMf%2BygAnKA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: JDBC River missing documents??

GWired
The below will build a table in SQL to store Refresh times.  The first time it runs it will put in an entry and going backwards in time until all records are retrieved.  Once compete it  will retrieve based on fields lastmodifieddate and createddate. The time based chunks allow ES enough time to process based on your window.  Replace your tables time fields with lastmodifieddate and createddate.  This could be genericized to pass in the field names of LMD and CD.

In your SQL DB create a table to hold your refreshtime data:

ElasticSearchRefreshTime

CREATE TABLE [dbo].[ElasticSearchRefreshTime](
[id] [int] IDENTITY(1,1) NOT NULL,
[DataObjectID] [nvarchar](100) NOT NULL,
[LastRefreshDateTime] [datetime2](7) NULL,
[RebuildSearchIndex] [bit] NOT NULL CONSTRAINT [DF_Metadata_ElasticSearchRefreshTime_RebuildSearchIndex]  DEFAULT ((1)),
[PreviousRunTime] [datetime2](7) NULL,
[NumberofExecutionsSinceRebuild] [bigint] NOT NULL CONSTRAINT [DF_Metadata_ElasticSearchRefreshTime_NumberofExecutionsSinceRebuild]  DEFAULT ((0)),
[RebuildMonth] [int] NOT NULL CONSTRAINT [DF_Metadata_ElasticSearchRefreshTime_RebuildPageSize]  DEFAULT ((0)),
[RebuildYear] [int] NOT NULL CONSTRAINT [DF_Metadata_ElasticSearchRefreshTime_RebuildTransferPostion]  DEFAULT ((0)),
[RebuildTotalCompleted] [bigint] NOT NULL CONSTRAINT [DF_Metadata_ElasticSearchRefreshTime_RebuildTotalCompleted]  DEFAULT ((0)),
[InProcess] [bit] NOT NULL CONSTRAINT [DF_Metadata_ElasticSearchRefreshTime_InProcess]  DEFAULT ((0)),
 CONSTRAINT [PK_Metadata_ElasticSearchRefreshTime_1] PRIMARY KEY CLUSTERED 
(
[DataObjectID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

Then create your procedure:

Create PROCEDURE [dbo].[ESTransport]
@EntityName as nvarchar(50)
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
declare @lastrefreshdatetime datetime2
declare @RebuildSearchIndex bit
declare @SprocEntryTime Datetime2 = SYSDATETIME()
declare @RebuildMonth int = 0
declare @RebuildYear int = 0
declare @InProcess bit = 0
declare @TotalRows bigint = 0
declare @RebuildTotalCompleted bigint = 0
declare @MaxDate Datetime2
-- select count(1),max(createddate) from [Case]
declare @TotalSQL nvarchar(200)
declare @ParmDefinition nvarchar(255)
SET @TotalSQL = 'select @TotalRowsOut = count(1), @MaxDateOut = Max(CreatedDate) from [' +  @EntityName +'] where isDeleted=''false'''
SET @ParmDefinition = N'@TotalRowsOut bigint OUTPUT,
                            @MaxDateOut datetime2 OUTPUT'
EXECUTE sp_executesql
@TotalSQL,
@ParmDefinition,
@TotalRowsOut = @TotalRows   OutPut,
@MaxDateOut = @MaxDate OUTPUT
PRINT @TotalRows
PRINT @MaxDate

select @lastrefreshdatetime=[LastRefreshDateTime],@RebuildMonth=RebuildMonth, @RebuildYear=RebuildYear, @InProcess = InProcess,@RebuildSearchIndex=[RebuildSearchIndex],@RebuildTotalCompleted=RebuildTotalCompleted from [Metadata_ElasticSearchRefreshTime] where DataObjectID=@EntityName

IF @lastrefreshdatetime is null
BEGIN
SET @RebuildSearchIndex=1
SET @InProcess = 0
INSERT INTO [dbo].[Metadata_ElasticSearchRefreshTime]
([DataObjectID]
,[LastRefreshDateTime]
,[RebuildSearchIndex]
,[NumberofExecutionsSinceRebuild]
,[RebuildMonth]
,[RebuildYear]
,[RebuildTotalCompleted]
,[InProcess])
VALUES
(@EntityName
,@SprocEntryTime
,@RebuildSearchIndex
,0
,MONTH(@MaxDate)
,YEAR(@MaxDate)
,0
,@InProcess)
END
-- Begin statements

IF @EntityName = 'Contact' and @RebuildSearchIndex  = 1
BEGIN
IF @InProcess=0 
BEGIN
--First Pass here Direction is descending
SET @InProcess=1
SET @RebuildMonth = Month(@MaxDate)
SET @RebuildYear = Year(@MaxDate)
END
ELSE IF @InProcess=1
BEGIN
--If you hit January drop to December of the prior year
IF @RebuildMonth=1 
BEGIN
SET @RebuildMonth = 12
SET @RebuildYear = @RebuildYear-1
END
ELSE
BEGIN
SET @RebuildMonth =@RebuildMonth-1
END
END

SELECT Entity.id as _id,Entity.*
FROM [MyDB].[dbo].[Entity] WITH (NOLOCK) 
and Month(contact.CreatedDate)=@RebuildMonth and Year(contact.CreatedDate) = @RebuildYear

UPDATE [dbo].[Metadata_ElasticSearchRefreshTime]
  SET [DataObjectID] = @EntityName
 ,[LastRefreshDateTime] = @SprocEntryTime
 ,[RebuildSearchIndex] = CASE WHEN ((@RebuildTotalCompleted + @@ROWCOUNT)<@TotalRows) THEN 1 ELSE 0 END
 ,[RebuildMonth] = @RebuildMonth 
 ,[RebuildYear] = @RebuildYear
 ,[PreviousRunTime] = @lastrefreshdatetime
 ,[NumberofExecutionsSinceRebuild]  = 0
 ,[InProcess] = CASE WHEN ((@RebuildTotalCompleted + @@ROWCOUNT)<@TotalRows) THEN 1 ELSE 0 END
 ,[RebuildTotalCompleted] = CASE WHEN (@RebuildTotalCompleted=0) THEN @@ROWCOUNT ELSE @RebuildTotalCompleted+@@ROWCOUNT END
WHERE DataObjectID=@EntityName 
END
ELSE IF @EntityName = 'Entity' 
BEGIN 
UPDATE [dbo].[Metadata_ElasticSearchRefreshTime]
  SET [DataObjectID] = @EntityName
 ,[LastRefreshDateTime] = @SprocEntryTime
 ,[RebuildSearchIndex] = 0
 ,PreviousRunTime = @lastrefreshdatetime
 ,NumberofExecutionsSinceRebuild = NumberofExecutionsSinceRebuild+1
WHERE DataObjectID=@EntityName

                 SELECT Entity.id as _id,Entity.*
         FROM [MyDB].[dbo].[Entity] WITH (NOLOCK) 
and ([Contact].LastModifiedDate >= @lastrefreshdatetime or [Contact].CreatedDate >=@lastrefreshdatetime)
END
END

Finally in Sense I create the river (i used integrated security):

PUT _river/Entity/_meta
{
    "type":"jdbc",
    "jdbc": {
        "url":"jdbc:sqlserver://dbserver.mydomain.com:1433;databaseName=MyDB;integratedSecurity=true;",
        "sql":{
        "callable" : true,
        "statement" : "{call ESTransport(?)}",
        "parameter" : "Entity"
        },
"schedule":"0/30 0-59 0-23 ? * *",
"autocommit":true,
        "index":"jdbc",
        "type":"Entity"
}
}



On Wednesday, April 29, 2015 at 7:42:33 AM UTC-5, Jörg Prante wrote:
Nice work, can you share the recipe with the community?

I could post it on the JDBC plugin wiki

Jörg

On Wed, Apr 29, 2015 at 1:56 PM, GWired <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="XqueYiIZcB0J" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">garrett...@...> wrote:
My theory is that i was overloading my ES VM's on initial loads or when doing large loads.

My cpu would jump to 99% and during the pulls it would fail silently and the river thought it got all the documents but didn't.

I have since rewritten my rivers using Stored procedures instead and it pulls much faster and CPU doesn't go to high.

The procedure keeps track of the created_at and updated_at for me instead of using the column strategy.  I also chunked my initial calls into logical batches.  Mine happen to be date based by month.  I couldn't figure out a way that made sense using row sizes or anything and month seemed just as good as any and it works.

This also solved my problem of when / ES is turned off.  It now will just start back up where it last left off because SQL server is storing that info instead of ES.  I also made it so I can rebuild at will.

Thanks Garrett



On Thursday, April 23, 2015 at 2:48:03 AM UTC-4, Jörg Prante wrote:
There are log messages at ES cluster side, you should look there why bulk indexing failed. 

Jörg

On Thu, Apr 23, 2015 at 5:45 AM, GWired <[hidden email]> wrote:
Found this in the logs:

[2015-04-22 22:01:25,063][ERROR][river.jdbc.BulkNodeClient] bulk [15] failed with 945 failed items, failure message = failure in bulk execution:



On Wednesday, April 22, 2015 at 7:53:25 PM UTC-5, GWired wrote:
Hi All,

I've just been informed that i'm off by up to 100k records or so in my jdbc river fed index.

I am using the column strategy using a createddate and lastmodified date.

Kibana is reporting an entirely different # than what i see reported in the DB..

Table A has 978634 in SQL, 934646 shown in Kibana.
Table B has 957327 in SQL, 876725 shown in Kibana.
Table C has 312826 in SQL, 238534 shown in Kibana

I see in the ES logs

Table A metrics: 979044 rows,
Table B metrics: 957591 rows
Table C metrics: 312827 rows,

These are the right numbers...well at least closer to right.

But if i do this using Sense:

GET jdbc/mytable/_count?q=*

It returns the same # as Kibana is return. 

This erring version is running on ES 1.5.1 with Kibana version 3.0

On another server with ES 1.5.0 and Kibana 3.0 it is working just fine #'s match up.

Any ideas?





--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/5561753d-9553-4bc5-bea2-102b7e030396%40googlegroups.com?utm_medium=email&amp;utm_source=footer" rel="nofollow" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/5561753d-9553-4bc5-bea2-102b7e030396%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/5561753d-9553-4bc5-bea2-102b7e030396%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/5561753d-9553-4bc5-bea2-102b7e030396%40googlegroups.com.

For more options, visit <a href="https://groups.google.com/d/optout" rel="nofollow" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="XqueYiIZcB0J" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/5c2016b5-ef5f-4f54-ac37-b2991eba0775%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/5c2016b5-ef5f-4f54-ac37-b2991eba0775%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/5c2016b5-ef5f-4f54-ac37-b2991eba0775%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/5c2016b5-ef5f-4f54-ac37-b2991eba0775%40googlegroups.com.

For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ca6554c1-5cd9-46e2-af65-6c12a2bd23f9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: JDBC River missing documents??

joergprante@gmail.com
wow, thanks for sharing!

Best,

Jörg

On Thu, Apr 30, 2015 at 10:43 PM, GWired <[hidden email]> wrote:
The below will build a table in SQL to store Refresh times.  The first time it runs it will put in an entry and going backwards in time until all records are retrieved.  Once compete it  will retrieve based on fields lastmodifieddate and createddate. The time based chunks allow ES enough time to process based on your window.  Replace your tables time fields with lastmodifieddate and createddate.  This could be genericized to pass in the field names of LMD and CD.

In your SQL DB create a table to hold your refreshtime data:

ElasticSearchRefreshTime

CREATE TABLE [dbo].[ElasticSearchRefreshTime](
[id] [int] IDENTITY(1,1) NOT NULL,
[DataObjectID] [nvarchar](100) NOT NULL,
[LastRefreshDateTime] [datetime2](7) NULL,
[RebuildSearchIndex] [bit] NOT NULL CONSTRAINT [DF_Metadata_ElasticSearchRefreshTime_RebuildSearchIndex]  DEFAULT ((1)),
[PreviousRunTime] [datetime2](7) NULL,
[NumberofExecutionsSinceRebuild] [bigint] NOT NULL CONSTRAINT [DF_Metadata_ElasticSearchRefreshTime_NumberofExecutionsSinceRebuild]  DEFAULT ((0)),
[RebuildMonth] [int] NOT NULL CONSTRAINT [DF_Metadata_ElasticSearchRefreshTime_RebuildPageSize]  DEFAULT ((0)),
[RebuildYear] [int] NOT NULL CONSTRAINT [DF_Metadata_ElasticSearchRefreshTime_RebuildTransferPostion]  DEFAULT ((0)),
[RebuildTotalCompleted] [bigint] NOT NULL CONSTRAINT [DF_Metadata_ElasticSearchRefreshTime_RebuildTotalCompleted]  DEFAULT ((0)),
[InProcess] [bit] NOT NULL CONSTRAINT [DF_Metadata_ElasticSearchRefreshTime_InProcess]  DEFAULT ((0)),
 CONSTRAINT [PK_Metadata_ElasticSearchRefreshTime_1] PRIMARY KEY CLUSTERED 
(
[DataObjectID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

Then create your procedure:

Create PROCEDURE [dbo].[ESTransport]
@EntityName as nvarchar(50)
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
declare @lastrefreshdatetime datetime2
declare @RebuildSearchIndex bit
declare @SprocEntryTime Datetime2 = SYSDATETIME()
declare @RebuildMonth int = 0
declare @RebuildYear int = 0
declare @InProcess bit = 0
declare @TotalRows bigint = 0
declare @RebuildTotalCompleted bigint = 0
declare @MaxDate Datetime2
-- select count(1),max(createddate) from [Case]
declare @TotalSQL nvarchar(200)
declare @ParmDefinition nvarchar(255)
SET @TotalSQL = 'select @TotalRowsOut = count(1), @MaxDateOut = Max(CreatedDate) from [' +  @EntityName +'] where isDeleted=''false'''
SET @ParmDefinition = N'@TotalRowsOut bigint OUTPUT,
                            @MaxDateOut datetime2 OUTPUT'
EXECUTE sp_executesql
@TotalSQL,
@ParmDefinition,
@TotalRowsOut = @TotalRows   OutPut,
@MaxDateOut = @MaxDate OUTPUT
PRINT @TotalRows
PRINT @MaxDate

select @lastrefreshdatetime=[LastRefreshDateTime],@RebuildMonth=RebuildMonth, @RebuildYear=RebuildYear, @InProcess = InProcess,@RebuildSearchIndex=[RebuildSearchIndex],@RebuildTotalCompleted=RebuildTotalCompleted from [Metadata_ElasticSearchRefreshTime] where DataObjectID=@EntityName

IF @lastrefreshdatetime is null
BEGIN
SET @RebuildSearchIndex=1
SET @InProcess = 0
INSERT INTO [dbo].[Metadata_ElasticSearchRefreshTime]
([DataObjectID]
,[LastRefreshDateTime]
,[RebuildSearchIndex]
,[NumberofExecutionsSinceRebuild]
,[RebuildMonth]
,[RebuildYear]
,[RebuildTotalCompleted]
,[InProcess])
VALUES
(@EntityName
,@SprocEntryTime
,@RebuildSearchIndex
,0
,MONTH(@MaxDate)
,YEAR(@MaxDate)
,0
,@InProcess)
END
-- Begin statements

IF @EntityName = 'Contact' and @RebuildSearchIndex  = 1
BEGIN
IF @InProcess=0 
BEGIN
--First Pass here Direction is descending
SET @InProcess=1
SET @RebuildMonth = Month(@MaxDate)
SET @RebuildYear = Year(@MaxDate)
END
ELSE IF @InProcess=1
BEGIN
--If you hit January drop to December of the prior year
IF @RebuildMonth=1 
BEGIN
SET @RebuildMonth = 12
SET @RebuildYear = @RebuildYear-1
END
ELSE
BEGIN
SET @RebuildMonth =@RebuildMonth-1
END
END

SELECT Entity.id as _id,Entity.*
FROM [MyDB].[dbo].[Entity] WITH (NOLOCK) 
and Month(contact.CreatedDate)=@RebuildMonth and Year(contact.CreatedDate) = @RebuildYear

UPDATE [dbo].[Metadata_ElasticSearchRefreshTime]
  SET [DataObjectID] = @EntityName
 ,[LastRefreshDateTime] = @SprocEntryTime
 ,[RebuildSearchIndex] = CASE WHEN ((@RebuildTotalCompleted + @@ROWCOUNT)<@TotalRows) THEN 1 ELSE 0 END
 ,[RebuildMonth] = @RebuildMonth 
 ,[RebuildYear] = @RebuildYear
 ,[PreviousRunTime] = @lastrefreshdatetime
 ,[NumberofExecutionsSinceRebuild]  = 0
 ,[InProcess] = CASE WHEN ((@RebuildTotalCompleted + @@ROWCOUNT)<@TotalRows) THEN 1 ELSE 0 END
 ,[RebuildTotalCompleted] = CASE WHEN (@RebuildTotalCompleted=0) THEN @@ROWCOUNT ELSE @RebuildTotalCompleted+@@ROWCOUNT END
WHERE DataObjectID=@EntityName 
END
ELSE IF @EntityName = 'Entity' 
BEGIN 
UPDATE [dbo].[Metadata_ElasticSearchRefreshTime]
  SET [DataObjectID] = @EntityName
 ,[LastRefreshDateTime] = @SprocEntryTime
 ,[RebuildSearchIndex] = 0
 ,PreviousRunTime = @lastrefreshdatetime
 ,NumberofExecutionsSinceRebuild = NumberofExecutionsSinceRebuild+1
WHERE DataObjectID=@EntityName

                 SELECT Entity.id as _id,Entity.*
         FROM [MyDB].[dbo].[Entity] WITH (NOLOCK) 
and ([Contact].LastModifiedDate >= @lastrefreshdatetime or [Contact].CreatedDate >=@lastrefreshdatetime)
END
END

Finally in Sense I create the river (i used integrated security):

PUT _river/Entity/_meta
{
    "type":"jdbc",
    "jdbc": {
        "url":"jdbc:sqlserver://dbserver.mydomain.com:1433;databaseName=MyDB;integratedSecurity=true;",
        "sql":{
        "callable" : true,
        "statement" : "{call ESTransport(?)}",
        "parameter" : "Entity"
        },
"schedule":"0/30 0-59 0-23 ? * *",
"autocommit":true,
        "index":"jdbc",
        "type":"Entity"
}
}



On Wednesday, April 29, 2015 at 7:42:33 AM UTC-5, Jörg Prante wrote:
Nice work, can you share the recipe with the community?

I could post it on the JDBC plugin wiki

Jörg

On Wed, Apr 29, 2015 at 1:56 PM, GWired <[hidden email]> wrote:
My theory is that i was overloading my ES VM's on initial loads or when doing large loads.

My cpu would jump to 99% and during the pulls it would fail silently and the river thought it got all the documents but didn't.

I have since rewritten my rivers using Stored procedures instead and it pulls much faster and CPU doesn't go to high.

The procedure keeps track of the created_at and updated_at for me instead of using the column strategy.  I also chunked my initial calls into logical batches.  Mine happen to be date based by month.  I couldn't figure out a way that made sense using row sizes or anything and month seemed just as good as any and it works.

This also solved my problem of when / ES is turned off.  It now will just start back up where it last left off because SQL server is storing that info instead of ES.  I also made it so I can rebuild at will.

Thanks Garrett



On Thursday, April 23, 2015 at 2:48:03 AM UTC-4, Jörg Prante wrote:
There are log messages at ES cluster side, you should look there why bulk indexing failed. 

Jörg

On Thu, Apr 23, 2015 at 5:45 AM, GWired <[hidden email]> wrote:
Found this in the logs:

[2015-04-22 22:01:25,063][ERROR][river.jdbc.BulkNodeClient] bulk [15] failed with 945 failed items, failure message = failure in bulk execution:



On Wednesday, April 22, 2015 at 7:53:25 PM UTC-5, GWired wrote:
Hi All,

I've just been informed that i'm off by up to 100k records or so in my jdbc river fed index.

I am using the column strategy using a createddate and lastmodified date.

Kibana is reporting an entirely different # than what i see reported in the DB..

Table A has 978634 in SQL, 934646 shown in Kibana.
Table B has 957327 in SQL, 876725 shown in Kibana.
Table C has 312826 in SQL, 238534 shown in Kibana

I see in the ES logs

Table A metrics: 979044 rows,
Table B metrics: 957591 rows
Table C metrics: 312827 rows,

These are the right numbers...well at least closer to right.

But if i do this using Sense:

GET jdbc/mytable/_count?q=*

It returns the same # as Kibana is return. 

This erring version is running on ES 1.5.1 with Kibana version 3.0

On another server with ES 1.5.0 and Kibana 3.0 it is working just fine #'s match up.

Any ideas?





--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5561753d-9553-4bc5-bea2-102b7e030396%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5c2016b5-ef5f-4f54-ac37-b2991eba0775%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ca6554c1-5cd9-46e2-af65-6c12a2bd23f9%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGkLaqcjFNeqwbuxNPk7NjUWJanr%3D5DDez7YFL-4KPaVw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: JDBC River missing documents??

GWired
Found a defect in the prior stored procedure.

If new data is being added while the index is being created and after it already did the current month it will go back in time forever.  Subsequently never getting to the point where it will start indexing the newer modified or created as it continues to go back in time.

I modified the Stored Procedure below to also calculate @MinDate to prevent this and stop if it goes past the min date....

 Create PROCEDURE [dbo].[ESTransport]
@EntityName as nvarchar(50)
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
declare @lastrefreshdatetime datetime2
declare @RebuildSearchIndex bit
declare @SprocEntryTime Datetime2 = SYSDATETIME()
declare @RebuildMonth int = 0
declare @RebuildYear int = 0
declare @InProcess bit = 0
declare @TotalRows bigint = 0
declare @RebuildTotalCompleted bigint = 0
declare @MaxDate Datetime2
declare @MinDate Datetime2
-- select count(1),max(createddate) from [Case]
declare @TotalSQL nvarchar(200)
declare @ParmDefinition nvarchar(255)
SET @TotalSQL = 'select @TotalRowsOut = count(1), @MaxDateOut = Max(CreatedDate), @MinDateOut=Min(CreatedDate) from [' +  @EntityName +'] where isDeleted=''false'''
SET @ParmDefinition = N'@TotalRowsOut bigint OUTPUT,
                            @MaxDateOut datetime2 OUTPUT,
@MinDateOut datetime2 OUTPUT'
EXECUTE sp_executesql
@TotalSQL,
@ParmDefinition,
@TotalRowsOut = @TotalRows   OutPut,
@MaxDateOut = @MaxDate OUTPUT,
@MinDateOut = @MinDate OUTPUT
PRINT @TotalRows
PRINT @MaxDate
PRINT @MinDate

select @lastrefreshdatetime=[LastRefreshDateTime],@RebuildMonth=RebuildMonth, @RebuildYear=RebuildYear, @InProcess = InProcess,@RebuildSearchIndex=[RebuildSearchIndex],@RebuildTotalCompleted=RebuildTotalCompleted from [Metadata_ElasticSearchRefreshTime] where DataObjectID=@EntityName

IF @lastrefreshdatetime is null
BEGIN
SET @RebuildSearchIndex=1
SET @InProcess = 0
INSERT INTO [dbo].[Metadata_ElasticSearchRefreshTime]
([DataObjectID]
,[LastRefreshDateTime]
,[RebuildSearchIndex]
,[NumberofExecutionsSinceRebuild]
,[RebuildMonth]
,[RebuildYear]
,[RebuildTotalCompleted]
,[InProcess])
VALUES
(@EntityName
,@SprocEntryTime
,@RebuildSearchIndex
,0
,MONTH(@MaxDate)
,YEAR(@MaxDate)
,0
,@InProcess)
END
-- Begin statements

IF @EntityName = 'Contact' and @RebuildSearchIndex  = 1
BEGIN
IF @InProcess=0 
BEGIN
--First Pass here Direction is descending
SET @InProcess=1
SET @RebuildMonth = Month(@MaxDate)
SET @RebuildYear = Year(@MaxDate)
END
ELSE IF @InProcess=1
BEGIN
--If you hit January drop to December of the prior year
IF @RebuildMonth=1 
BEGIN
SET @RebuildMonth = 12
SET @RebuildYear = @RebuildYear-1
END
ELSE
BEGIN
SET @RebuildMonth =@RebuildMonth-1
END
END

SELECT Entity.id as _id,Entity.*
FROM [MyDB].[dbo].[Entity] WITH (NOLOCK) 
and Month(contact.CreatedDate)=@RebuildMonth and Year(contact.CreatedDate) = @RebuildYear

UPDATE [dbo].[Metadata_ElasticSearchRefreshTime]
   SET [DataObjectID] = @EntityName
  ,[LastRefreshDateTime] = @SprocEntryTime
  ,[RebuildSearchIndex] = CASE WHEN ((@RebuildTotalCompleted + @@ROWCOUNT)<@TotalRows) THEN 1 ELSE 0 END
  ,[RebuildMonth] = @RebuildMonth 
  ,[RebuildYear] = @RebuildYear
  ,[PreviousRunTime] = @lastrefreshdatetime
  ,[NumberofExecutionsSinceRebuild]  = 0
  ,[InProcess] = CASE WHEN ((@RebuildTotalCompleted + @@ROWCOUNT)<@TotalRows) THEN 1 ELSE 0 END
  ,[RebuildTotalCompleted] = CASE WHEN (@RebuildTotalCompleted=0) THEN @@ROWCOUNT ELSE @RebuildTotalCompleted+@@ROWCOUNT END
 WHERE DataObjectID=@EntityName 
END
ELSE IF @EntityName = 'Entity' 
BEGIN 
UPDATE [dbo].[Metadata_ElasticSearchRefreshTime]
   SET [DataObjectID] = @EntityName
  ,[LastRefreshDateTime] = @SprocEntryTime
  ,[RebuildSearchIndex] = 0
  ,PreviousRunTime = @lastrefreshdatetime
  ,NumberofExecutionsSinceRebuild = NumberofExecutionsSinceRebuild+1
 WHERE DataObjectID=@EntityName

                 SELECT Entity.id as _id,Entity.*
         FROM [MyDB].[dbo].[Entity] WITH (NOLOCK) 
and ([Contact].LastModifiedDate >= @lastrefreshdatetime or [Contact].CreatedDate >=@lastrefreshdatetime)
END
END

Finally in Sense I create the river (i used integrated security):

PUT _river/Entity/_meta
{
    "type":"jdbc",
    "jdbc": {
        "url":"jdbc:sqlserver://dbserver.mydomain.com:1433;databaseName=MyDB;integratedSecurity=true;",
        "sql":{
        "callable" : true,
        "statement" : "{call ESTransport(?)}",
        "parameter" : "Entity"
        },
"schedule":"0/30 0-59 0-23 ? * *",
"autocommit":true,
        "index":"jdbc",
        "type":"Entity"
}
}

On Thursday, April 30, 2015 at 5:58:51 PM UTC-4, Jörg Prante wrote:
wow, thanks for sharing!

Best,

Jörg

On Thu, Apr 30, 2015 at 10:43 PM, GWired <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="eSM6LKzTYMkJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">garrett...@...> wrote:
The below will build a table in SQL to store Refresh times.  The first time it runs it will put in an entry and going backwards in time until all records are retrieved.  Once compete it  will retrieve based on fields lastmodifieddate and createddate. The time based chunks allow ES enough time to process based on your window.  Replace your tables time fields with lastmodifieddate and createddate.  This could be genericized to pass in the field names of LMD and CD.

In your SQL DB create a table to hold your refreshtime data:

ElasticSearchRefreshTime

CREATE TABLE [dbo].[ElasticSearchRefreshTime](
[id] [int] IDENTITY(1,1) NOT NULL,
[DataObjectID] [nvarchar](100) NOT NULL,
[LastRefreshDateTime] [datetime2](7) NULL,
[RebuildSearchIndex] [bit] NOT NULL CONSTRAINT [DF_Metadata_ElasticSearchRefreshTime_RebuildSearchIndex]  DEFAULT ((1)),
[PreviousRunTime] [datetime2](7) NULL,
[NumberofExecutionsSinceRebuild] [bigint] NOT NULL CONSTRAINT [DF_Metadata_ElasticSearchRefreshTime_NumberofExecutionsSinceRebuild]  DEFAULT ((0)),
[RebuildMonth] [int] NOT NULL CONSTRAINT [DF_Metadata_ElasticSearchRefreshTime_RebuildPageSize]  DEFAULT ((0)),
[RebuildYear] [int] NOT NULL CONSTRAINT [DF_Metadata_ElasticSearchRefreshTime_RebuildTransferPostion]  DEFAULT ((0)),
[RebuildTotalCompleted] [bigint] NOT NULL CONSTRAINT [DF_Metadata_ElasticSearchRefreshTime_RebuildTotalCompleted]  DEFAULT ((0)),
[InProcess] [bit] NOT NULL CONSTRAINT [DF_Metadata_ElasticSearchRefreshTime_InProcess]  DEFAULT ((0)),
 CONSTRAINT [PK_Metadata_ElasticSearchRefreshTime_1] PRIMARY KEY CLUSTERED 
(
[DataObjectID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

Then create your procedure:

Create PROCEDURE [dbo].[ESTransport]
@EntityName as nvarchar(50)
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
declare @lastrefreshdatetime datetime2
declare @RebuildSearchIndex bit
declare @SprocEntryTime Datetime2 = SYSDATETIME()
declare @RebuildMonth int = 0
declare @RebuildYear int = 0
declare @InProcess bit = 0
declare @TotalRows bigint = 0
declare @RebuildTotalCompleted bigint = 0
declare @MaxDate Datetime2
-- select count(1),max(createddate) from [Case]
declare @TotalSQL nvarchar(200)
declare @ParmDefinition nvarchar(255)
SET @TotalSQL = 'select @TotalRowsOut = count(1), @MaxDateOut = Max(CreatedDate) from [' +  @EntityName +'] where isDeleted=''false'''
SET @ParmDefinition = N'@TotalRowsOut bigint OUTPUT,
                            @MaxDateOut datetime2 OUTPUT'
EXECUTE sp_executesql
@TotalSQL,
@ParmDefinition,
@TotalRowsOut = @TotalRows   OutPut,
@MaxDateOut = @MaxDate OUTPUT
PRINT @TotalRows
PRINT @MaxDate

select @lastrefreshdatetime=[LastRefreshDateTime],@RebuildMonth=RebuildMonth, @RebuildYear=RebuildYear, @InProcess = InProcess,@RebuildSearchIndex=[RebuildSearchIndex],@RebuildTotalCompleted=RebuildTotalCompleted from [Metadata_ElasticSearchRefreshTime] where DataObjectID=@EntityName

IF @lastrefreshdatetime is null
BEGIN
SET @RebuildSearchIndex=1
SET @InProcess = 0
INSERT INTO [dbo].[Metadata_ElasticSearchRefreshTime]
([DataObjectID]
,[LastRefreshDateTime]
,[RebuildSearchIndex]
,[NumberofExecutionsSinceRebuild]
,[RebuildMonth]
,[RebuildYear]
,[RebuildTotalCompleted]
,[InProcess])
VALUES
(@EntityName
,@SprocEntryTime
,@RebuildSearchIndex
,0
,MONTH(@MaxDate)
,YEAR(@MaxDate)
,0
,@InProcess)
END
-- Begin statements

IF @EntityName = 'Contact' and @RebuildSearchIndex  = 1
BEGIN
IF @InProcess=0 
BEGIN
--First Pass here Direction is descending
SET @InProcess=1
SET @RebuildMonth = Month(@MaxDate)
SET @RebuildYear = Year(@MaxDate)
END
ELSE IF @InProcess=1
BEGIN
--If you hit January drop to December of the prior year
IF @RebuildMonth=1 
BEGIN
SET @RebuildMonth = 12
SET @RebuildYear = @RebuildYear-1
END
ELSE
BEGIN
SET @RebuildMonth =@RebuildMonth-1
END
                                       IF MONTH(@MinDate)>MONTH(@RebuildMonth) AND Year(@MinDate)>Year(@RebuildYear) 
       BEGIN 
   SET @InProcess=0
       END 
END

SELECT Entity.id as _id,Entity.*
FROM [MyDB].[dbo].[Entity] WITH (NOLOCK) 
and Month(contact.CreatedDate)=@RebuildMonth and Year(contact.CreatedDate) = @RebuildYear

UPDATE [dbo].[Metadata_ElasticSearchRefreshTime]
  SET [DataObjectID] = @EntityName
 ,[LastRefreshDateTime] = @SprocEntryTime
 ,[RebuildSearchIndex] = CASE WHEN ((@RebuildTotalCompleted + @@ROWCOUNT)<@TotalRows) THEN 1 ELSE 0 END
 ,[RebuildMonth] = @RebuildMonth 
 ,[RebuildYear] = @RebuildYear
 ,[PreviousRunTime] = @lastrefreshdatetime
 ,[NumberofExecutionsSinceRebuild]  = 0
 ,[InProcess] = CASE WHEN ((@RebuildTotalCompleted + @@ROWCOUNT)<@TotalRows) THEN 1 ELSE 0 END
 ,[RebuildTotalCompleted] = CASE WHEN (@RebuildTotalCompleted=0) THEN @@ROWCOUNT ELSE @RebuildTotalCompleted+@@ROWCOUNT END
WHERE DataObjectID=@EntityName 
END
ELSE IF @EntityName = 'Entity' 
BEGIN 
UPDATE [dbo].[Metadata_ElasticSearchRefreshTime]
  SET [DataObjectID] = @EntityName
 ,[LastRefreshDateTime] = @SprocEntryTime
 ,[RebuildSearchIndex] = 0
 ,PreviousRunTime = @lastrefreshdatetime
 ,NumberofExecutionsSinceRebuild = NumberofExecutionsSinceRebuild+1
WHERE DataObjectID=@EntityName

                 SELECT Entity.id as _id,Entity.*
         FROM [MyDB].[dbo].[Entity] WITH (NOLOCK) 
and ([Contact].LastModifiedDate >= @lastrefreshdatetime or [Contact].CreatedDate >=@lastrefreshdatetime)
END
END

Finally in Sense I create the river (i used integrated security):

PUT _river/Entity/_meta
{
    "type":"jdbc",
    "jdbc": {
        "url":"jdbc:sqlserver://dbserver.mydomain.com:1433;databaseName=MyDB;integratedSecurity=true;",
        "sql":{
        "callable" : true,
        "statement" : "{call ESTransport(?)}",
        "parameter" : "Entity"
        },
"schedule":"0/30 0-59 0-23 ? * *",
"autocommit":true,
        "index":"jdbc",
        "type":"Entity"
}
}



On Wednesday, April 29, 2015 at 7:42:33 AM UTC-5, Jörg Prante wrote:
Nice work, can you share the recipe with the community?

I could post it on the JDBC plugin wiki

Jörg

On Wed, Apr 29, 2015 at 1:56 PM, GWired <[hidden email]> wrote:
My theory is that i was overloading my ES VM's on initial loads or when doing large loads.

My cpu would jump to 99% and during the pulls it would fail silently and the river thought it got all the documents but didn't.

I have since rewritten my rivers using Stored procedures instead and it pulls much faster and CPU doesn't go to high.

The procedure keeps track of the created_at and updated_at for me instead of using the column strategy.  I also chunked my initial calls into logical batches.  Mine happen to be date based by month.  I couldn't figure out a way that made sense using row sizes or anything and month seemed just as good as any and it works.

This also solved my problem of when / ES is turned off.  It now will just start back up where it last left off because SQL server is storing that info instead of ES.  I also made it so I can rebuild at will.

Thanks Garrett



On Thursday, April 23, 2015 at 2:48:03 AM UTC-4, Jörg Prante wrote:
There are log messages at ES cluster side, you should look there why bulk indexing failed. 

Jörg

On Thu, Apr 23, 2015 at 5:45 AM, GWired <[hidden email]> wrote:
Found this in the logs:

[2015-04-22 22:01:25,063][ERROR][river.jdbc.BulkNodeClient] bulk [15] failed with 945 failed items, failure message = failure in bulk execution:



On Wednesday, April 22, 2015 at 7:53:25 PM UTC-5, GWired wrote:
Hi All,

I've just been informed that i'm off by up to 100k records or so in my jdbc river fed index.

I am using the column strategy using a createddate and lastmodified date.

Kibana is reporting an entirely different # than what i see reported in the DB..

Table A has 978634 in SQL, 934646 shown in Kibana.
Table B has 957327 in SQL, 876725 shown in Kibana.
Table C has 312826 in SQL, 238534 shown in Kibana

I see in the ES logs

Table A metrics: 979044 rows,
Table B metrics: 957591 rows
Table C metrics: 312827 rows,

These are the right numbers...well at least closer to right.

But if i do this using Sense:

GET jdbc/mytable/_count?q=*

It returns the same # as Kibana is return. 

This erring version is running on ES 1.5.1 with Kibana version 3.0

On another server with ES 1.5.0 and Kibana 3.0 it is working just fine #'s match up.

Any ideas?





--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/5561753d-9553-4bc5-bea2-102b7e030396%40googlegroups.com?utm_medium=email&amp;utm_source=footer" rel="nofollow" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/5561753d-9553-4bc5-bea2-102b7e030396%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/5561753d-9553-4bc5-bea2-102b7e030396%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/5561753d-9553-4bc5-bea2-102b7e030396%40googlegroups.com.

For more options, visit <a href="https://groups.google.com/d/optout" rel="nofollow" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/5c2016b5-ef5f-4f54-ac37-b2991eba0775%40googlegroups.com?utm_medium=email&amp;utm_source=footer" rel="nofollow" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/5c2016b5-ef5f-4f54-ac37-b2991eba0775%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/5c2016b5-ef5f-4f54-ac37-b2991eba0775%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/5c2016b5-ef5f-4f54-ac37-b2991eba0775%40googlegroups.com.

For more options, visit <a href="https://groups.google.com/d/optout" rel="nofollow" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="eSM6LKzTYMkJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/ca6554c1-5cd9-46e2-af65-6c12a2bd23f9%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/ca6554c1-5cd9-46e2-af65-6c12a2bd23f9%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/ca6554c1-5cd9-46e2-af65-6c12a2bd23f9%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/ca6554c1-5cd9-46e2-af65-6c12a2bd23f9%40googlegroups.com.

For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
Please update your bookmarks! We have moved to https://discuss.elastic.co/
---
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d3966730-1ad8-495a-b4aa-d433f8631923%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.